ibm-granite/granite-guardian-3.2-8b-factuality-detection
Captured source
source ↗Granite Guardian 3.2 8B Factuality Detection
Model Summary
Granite Guardian 3.2 8B Factuality Detection is a model based on ibm-granite/granite-3.2-8b-instruct, fine-tuned to safely detect an LLM response as unfactual.
- Developers: IBM Research
- GitHub Repository: ibm-granite/granite-guardian
- Cookbook: Granite Guardian Factuality Detection Recipes
- Website: Granite Guardian Docs
- Paper: Granite Guardian FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
- Release Date: February, 2026
- License: Apache 2.0
Usage
Intended Use
Granite Guardian is useful for risk detection use-cases which are applicable across a wide-range of enterprise applications.
Granite-guardian-3.2-8b-factuality-detection takes an input consisting of an original response generated by a Large Language Model (LLM) and a context, and generates a label, meaning that the response is unfactual ("Yes") or factual ("No") according to the context provided.
Risk Definitions
The model is specifically designed to detect assistant messages containing only the following risk:
- Factuality: Assistant message is factually incorrect relative to the information provided in the context. This risk arises when the response includes a small fraction of atomic units such as claims or facts that are not supported by or directly contradicted by some part of the context. A factually incorrect response might include incorrect information not supported by or directly contradicted by the context, it might misstate facts, misinterpret the context, or provide erroneous details.
The detector manages both factual and unfactual cases.
This model is part of an ongoing research effort focused on post-generation mitigation and remains experimental and under active development. We are committed to continuous improvement and welcome constructive feedback to enhance its performance and capabilities.
Limitations
It is important to note that there is no built-in safeguard to guarantee that the detection output response will always be correct. As with other generative models, safety assurance relies on offline evaluations (see [Evaluations](#evaluations)), and we expect, but cannot ensure, that the label meets safety standards. Moreover, this model is specifically optimized for factuality risk. For comprehensive detection of a broader range of risks, users should utilize the latest Granite Guardian model.
Using Granite Guardian and Factuality Detection
Granite Guardian Cookbooks offers an excellent starting point for working with guardian models, providing a variety of examples that demonstrate how the models can be configured for different risk detection scenarios.
- Quick Start Guide provides steps to start using Granite Guardian for detecting risks in prompts (user message), responses (assistant message), RAG use cases, or agentic workflows.
- Factuality Detection Cookbook provides steps to start using Granite Guardian for detecting factuality in responses.
- Detailed Guide explores different risk dimensions in depth and shows how to assess custom risk definitions with Granite Guardian. For finer-grained control over token-level risk probabilities and thresholding, please also consult this cookbook.
Quickstart Example
The following code describes how to apply the Granite Guardian 3.2 8B Factuality Detection to safely detect assistant message.
The code checks if the assistant message contains the factuality risk. It extracts a "Yes" (i.e. unsafe) or "No" (i.e. safe) label and a confidence level from the model's output. If the response is detected as unsafe, it uses the Factuality Detector to generate a safer version of the assistant message.
import warnings
import os, re
import torch
import math
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
def get_probabilities(logprobs):
safe_token_prob = 1e-50
risky_token_prob = 1e-50
for gen_token_i in logprobs:
for token_prob in gen_token_i.values():
decoded_token = token_prob.decoded_token
if decoded_token.strip().lower() == safe_token.lower():
safe_token_prob += math.exp(token_prob.logprob)
if decoded_token.strip().lower() == risky_token.lower():
risky_token_prob += math.exp(token_prob.logprob)
probabilities = torch.softmax(
torch.tensor([math.log(safe_token_prob), math.log(risky_token_prob)]), dim=0
)
return probabilities
def parse_output(output):
label, prob_of_risk = None, None
if nlogprobs > 0:
logprobs = next(iter(output.outputs)).logprobs
if logprobs is not None:
prob = get_probabilities(logprobs)
prob_of_risk = prob[1]
output = next(iter(output.outputs)).text.strip()
res = re.search(r"^\w+", output, re.MULTILINE).group(0).strip()
if risky_token.lower() == res.lower():
label = risky_token
elif safe_token.lower() == res.lower():
label = safe_token
else:
print(f"Could not parse output")
label = "Failed"
return label, prob_of_risk.item()
def transform_dataset_for_chat_format(
context, response
):
messages = [
{"role": "context", "content": context},
{"role": "assistant", "content": response},
]
dataset = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
return dataset
# Prepare prompt
model_path_name = "ibm-granite/granite-guardian-3.2-8b-factuality-detection"
dtype = "bfloat16"
gpu_memory_utilization = 0.95
nlogprobs = 20
temperature = 0.0
max_tokens = 512
safe_token = "No"
risky_token = "Yes"
# Load models
model = LLM(
model=model_path_name,
tensor_parallel_size=1,
dtype=dtype,
gpu_memory_utilization=gpu_memory_utilization,
)…Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Low traction specialty model from IBM