ModelIBM (Granite)IBM (Granite)published May 1, 2026seen 5d

ibm-granite/granite-switch-4.1-8b-preview

Open original ↗

Captured source

source ↗
published May 1, 2026seen 5dcaptured 15hhttp 200method plaintask text-generationlicense apache-2.0library transformersparams 9.6Bdownloads 626likes 27

Granite Switch 4.1 8B Preview

Model Summary: Granite Switch 4.1 8B Preview is a modular LLM built on IBM Granite 4.1 8B with embedded adapters from the Granite Libraries collection. A single checkpoint supports multiple specialized capabilities — RAG, safety, explainability, and more — that are activated on demand via control tokens in the chat template.

For full details on model composition and adapter configuration, see [BUILD.md](BUILD.md).

  • Base Model: ibm-granite/granite-4.1-8b (8B params, 128K context)
  • Adapters: 12 adapters from granitelib-rag-r1.0, granitelib-core-r1.0, and granitelib-guardian-r1.0
  • License: Apache 2.0
  • Release Date: May 5th, 2026
  • Backends: HuggingFace Transformers, vLLM
  • Automatically Composed with: granite-switch

Granite Switch is also available in granite-switch-4.1-3b-preview and granite-switch-4.1-30b-preview.

Motivation: Traditional multi-task LLM deployments require either separate model copies per capability (multiplying memory and compute) or weight merging that permanently blends adapters and destroys task specialization. Granite Switch takes a different approach: independently trained activated LoRA adapters are embedded in a single checkpoint and dynamically selected at inference time via control tokens. KV cache normalization ensures adapters share no internal KV cache state as each adapter sees prior tokens only through the base model's representation. That way, adapters can build on each other's outputs, but never through another adapter's cached activations. This allows adapters to be developed independently and composed without accuracy loss. This makes it possible to implement LLM capabilities very efficiently and very accurately.

Included Adapters

Granite Switch is best used with Mellea.

Core Library (ibm-granite/granitelib-core-r1.0)

Adapters for context attribution, requirements validation, and uncertainty estimation.

| Adapter | Description | |---|---| | **Requirement Check** | Binary yes/no evaluation of whether a response satisfies user-specified constraints (formatting, content, quality) | | **Context Attribution** | Identifies which context sentences influenced the response — contributive attribution ranked by importance | | **Uncertainty** | Calibrated confidence scores — an answer marked X% certain is approximately X% correct |

RAG Library (ibm-granite/granitelib-rag-r1.0)

Adapters for retrieval-augmented generation pipelines.

| Adapter | Stage | Description | |---|---|---| | **Query Rewrite** | Pre-retrieval | Decontextualizes multi-turn queries into standalone, retriever-friendly versions | | **Query Clarification** | Pre-retrieval | Detects underspecified or ambiguous queries and formulates clarification requests | | **Answerability** | Pre-generation | Determines if a query is answerable from available passages; prevents hallucinations | | **Hallucination Detection** | Post-generation | Outputs hallucination risk ranges for each sentence in a response | | **Citation Generation** | Post-generation | Generates passage-level citations for model responses |

Guardian Library (ibm-granite/granitelib-guardian-r1.0)

Adapters for safety, factuality, and policy compliance.

| Adapter | Description | |---|---| | **Guardian Core** | Detects safety risks: harm, jailbreaking, profanity, violence, sexual content, social bias, unethical behavior | | **Factuality Detection** | Assesses factual correctness of responses against provided context sources | | **Factuality Correction** | Corrects factual inaccuracies in long-form responses while preserving reasoning quality | | **Policy Guardrails** | Checks compliance against user-defined policies (compliant / non-compliant / ambiguous) |

---

Generation

git clone https://github.com/generative-computing/granite-switch.git
cd granite-switch

# Pick what you need:
pip install -e ".[compose]" # Compose models with adapters
pip install -e ".[hf]" # HuggingFace inference
pip install -e ".[vllm]" # vLLM inference
pip install -e ".[dev]" # Everything

Using with Mellea

Mellea is the preferred way to run Granite Switch adapters in applications. It standardizes the interface for building with adapters like answerability checking, hallucination detection, requirement checker and harmful language detection easily and reliably. Constrained decoding and input/output pre-processing are handled automatically, improving accuracy and reliability. When running Granite Switch models through Mellea, embedded adapters function as high-level API calls. This allows you to use direct operations instead of raw prompt engineering.

pip install mellea

Answerability check

from mellea.backends.openai import OpenAIBackend
from mellea.formatters import TemplateFormatter
from mellea.stdlib.components import Document, Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

SWITCH_MODEL_ID =…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction preview model release