google/gemma-4-12B-it

Open original ↗

Captured source

source ↗
published May 23, 2026seen 5dcaptured 14hhttp 200method plaintask any-to-anylicense apache-2.0library transformersparams 12Bdownloads 676klikes 906

Hugging Face | GitHub | Launch Blog | Documentation

License: Apache 2.0 | Authors: Google DeepMind

> [!Note] > This model card is for the Gemma 4 12B Unified model, which is part of the Gemma 4 family of open models. Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders. This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution.

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in five distinct sizes: E2B, E4B, 12B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.

Gemma 4 introduces key capability and architectural advancements:

  • Reasoning – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
  • Extended Multimodalities – Processes Text, Image with variable aspect ratio and resolution support (all models), Video, and Audio (featured natively on the E2B, E4B, and 12B models).
  • Diverse & Efficient Architectures – Offers Dense and Mixture-of-Experts (MoE) variants of different sizes for scalable deployment.
  • Optimized for On-Device – Smaller models are specifically designed for efficient local execution on laptops and mobile devices.
  • Increased Context Window – The small models feature a 128K context window, while the medium models support 256K.
  • Enhanced Coding & Agentic Capabilities – Achieves notable improvements in coding benchmarks alongside native function-calling support, powering highly capable autonomous agents.
  • Native System Prompt Support – Gemma 4 introduces native support for the system role, enabling more structured and controllable conversations.

Models Overview

Gemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (12B, 26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding.

The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE).

Dense Models

| Property | E2B | E4B | 12B Unified | 31B Dense | | :---- | :---- | :---- | :---- | :---- | | Total Parameters | 2.3B effective (5.1B with embeddings) | 4.5B effective (8B with embeddings) | 11.95B | 30.7B | | Layers | 35 | 42 | 48 | 60 | | Sliding Window | 512 tokens | 512 tokens | 1024 tokens | 1024 tokens | | Context Length | 128K tokens | 128K tokens | 256K tokens | 256K tokens | | Vocabulary Size | 262K | 262K | 262K | 262K | | Supported Modalities | Text, Image, Audio | Text, Image, Audio | Text, Image, Audio | Text, Image | | Vision Encoder Parameters | *~150M* | *~150M* | - | *~550M* | | Audio Encoder Parameters | *~300M* | *~300M* | - | No Audio |

The "E" in E2B and E4B stands for "effective" parameters. The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total.

The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.

Mixture-of-Experts (MoE) Model

| Property | 26B A4B MoE | | :---- | :---- | | Total Parameters | 25.2B | | Active Parameters | 3.8B | | Layers | 30 | | Sliding Window | 1024 tokens | | Context Length | 256K tokens | | Vocabulary Size | 262K | | Expert Count | 8 active / 128 total and 1 shared | | Supported Modalities | Text, Image | | Vision Encoder Parameters | *~550M* |

The "A" in 26B A4B stands for "active parameters" in contrast to the total number of parameters the model contains. By only activating a 4B subset of parameters during inference, the Mixture-of-Experts model runs much faster than its 26B total might suggest. This makes it an excellent choice for fast inference compared to the dense 31B model since it runs almost as fast as a 4B-parameter model.

Benchmark Results

These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked in the table are for instruction-tuned models.

| | Gemma 4 31B | Gemma 4 26B A4B | Gemma 4 12B Unified | Gemma 4 E4B | Gemma 4 E2B | Gemma 3 27B (no think) | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | MMLU Pro | 85.2% | 82.6% | 77.2% | 69.4% | 60.0% | 67.6% | | AIME 2026 no tools | 89.2% | 88.3% | 77.5% | 42.5% | 37.5% | 20.8% | | LiveCodeBench v6 | 80.0% | 77.1% | 72.0% | 52.0% | 44.0% | 29.1% | | Codeforces ELO | 2150 | 1718 | 1659 | 940 | 633 |…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Major lab model, high downloads.