What does this model signal mean?

IBM (Granite) published ibm-granite/granite-guardian-3.2-8b-factuality-detection. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license apache-2.0 · 28 HF downloads · Low traction specialty model from IBM. onlylabs links this event to 1 captured evidence page and 6 related model signals.

IBM (Granite) Model: ibm-granite/granite-guardian-3.2-8b-factuality-detection

Captured source

source ↗

Hugging Face/huggingface.co/ibm-granite/granite-guardian-3.2-8b-factuality-detection

ibm-granite/granite-guardian-3.2-8b-factuality-detection model card

Source ↗

published Nov 10, 2025seen 5dcaptured 11hhttp 200method plaintask text-generationlicense apache-2.0library transformersparams 8.2Bdownloads 28likes 4

Granite Guardian 3.2 8B Factuality Detection

Model Summary

Granite Guardian 3.2 8B Factuality Detection is a model based on ibm-granite/granite-3.2-8b-instruct, fine-tuned to safely detect an LLM response as unfactual.

Developers: IBM Research
GitHub Repository: ibm-granite/granite-guardian
Cookbook: Granite Guardian Factuality Detection Recipes
Website: Granite Guardian Docs
Paper: Granite Guardian FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
Release Date: February, 2026
License: Apache 2.0

Usage

Intended Use

Granite Guardian is useful for risk detection use-cases which are applicable across a wide-range of enterprise applications.

Granite-guardian-3.2-8b-factuality-detection takes an input consisting of an original response generated by a Large Language Model (LLM) and a context, and generates a label, meaning that the response is unfactual ("Yes") or factual ("No") according to the context provided.

Risk Definitions

The model is specifically designed to detect assistant messages containing only the following risk:

Factuality: Assistant message is factually incorrect relative to the information provided in the context. This risk arises when the response includes a small fraction of atomic units such as claims or facts that are not supported by or directly contradicted by some part of the context. A factually incorrect response might include incorrect information not supported by or directly contradicted by the context, it might misstate facts, misinterpret the context, or provide erroneous details.

The detector manages both factual and unfactual cases.

This model is part of an ongoing research effort focused on post-generation mitigation and remains experimental and under active development. We are committed to continuous improvement and welcome constructive feedback to enhance its performance and capabilities.

Limitations

It is important to note that there is no built-in safeguard to guarantee that the detection output response will always be correct. As with other generative models, safety assurance relies on offline evaluations (see [Evaluations](#evaluations)), and we expect, but cannot ensure, that the label meets safety standards. Moreover, this model is specifically optimized for factuality risk. For comprehensive detection of a broader range of risks, users should utilize the latest Granite Guardian model.

Using Granite Guardian and Factuality Detection

Granite Guardian Cookbooks offers an excellent starting point for working with guardian models, providing a variety of examples that demonstrate how the models can be configured for different risk detection scenarios.

Quick Start Guide provides steps to start using Granite Guardian for detecting risks in prompts (user message), responses (assistant message), RAG use cases, or agentic workflows.
Factuality Detection Cookbook provides steps to start using Granite Guardian for detecting factuality in responses.
Detailed Guide explores different risk dimensions in depth and shows how to assess custom risk definitions with Granite Guardian. For finer-grained control over token-level risk probabilities and thresholding, please also consult this cookbook.

Quickstart Example

The following code describes how to apply the Granite Guardian 3.2 8B Factuality Detection to safely detect assistant message.

The code checks if the assistant message contains the factuality risk. It extracts a "Yes" (i.e. unsafe) or "No" (i.e. safe) label and a confidence level from the model's output. If the response is detected as unsafe, it uses the Factuality Detector to generate a safer version of the assistant message.

import warnings
import os, re
import torch
import math
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

def get_probabilities(logprobs):
safe_token_prob = 1e-50
risky_token_prob = 1e-50
for gen_token_i in logprobs:
for token_prob in gen_token_i.values():
decoded_token = token_prob.decoded_token
if decoded_token.strip().lower() == safe_token.lower():
safe_token_prob += math.exp(token_prob.logprob)
if decoded_token.strip().lower() == risky_token.lower():
risky_token_prob += math.exp(token_prob.logprob)

probabilities = torch.softmax(
torch.tensor([math.log(safe_token_prob), math.log(risky_token_prob)]), dim=0
)

return probabilities

def parse_output(output):
label, prob_of_risk = None, None

if nlogprobs > 0:
logprobs = next(iter(output.outputs)).logprobs
if logprobs is not None:
prob = get_probabilities(logprobs)
prob_of_risk = prob[1]

output = next(iter(output.outputs)).text.strip()

res = re.search(r"^\w+", output, re.MULTILINE).group(0).strip()

if risky_token.lower() == res.lower():
label = risky_token
elif safe_token.lower() == res.lower():
label = safe_token
else:
print(f"Could not parse output")
label = "Failed"

return label, prob_of_risk.item()

def transform_dataset_for_chat_format(
context, response
):

messages = [
{"role": "context", "content": context},
{"role": "assistant", "content": response},
]

dataset = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)

return dataset

# Prepare prompt

model_path_name = "ibm-granite/granite-guardian-3.2-8b-factuality-detection"

dtype = "bfloat16"
gpu_memory_utilization = 0.95
nlogprobs = 20
temperature = 0.0
max_tokens = 512
safe_token = "No"
risky_token = "Yes"

# Load models
model = LLM(
model=model_path_name,
tensor_parallel_size=1,
dtype=dtype,
gpu_memory_utilization=gpu_memory_utilization,
)…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Low traction specialty model from IBM