ibm-granite/granite-guardian-3.2-5b-lora-factuality-correction
Captured source
source ↗Granite Guardian 3.2 5B Factuality Correction LoRA
Model Summary
Granite Guardian 3.2 5B Factuality Correction LoRA is a LoRA adapter for ibm-granite/granite-guardian-3.2-5b, designed to safely correct a Large Language Model (LLM) response if it is detected as unfactual by a detector like granite guardian.
- Developers: IBM Research
- GitHub Repository: ibm-granite/granite-guardian
- Cookbook: Granite Guardian Factuality Correction LoRA Recipes
- Website: Granite Guardian Docs
- Paper: Granite Guardian & FactReasoner
- Release Date: December, 2025
- License: Apache 2.0
Usage
Intended Use
Granite Guardian is useful for risk detection use-cases which are applicable across a wide-range of enterprise applications.
Granite Guardian 3.2 5B Factuality Correction LoRA takes an input consisting of an original response generated by a Large Language Model (LLM), and a given reliable context, and generates a factually viable correction via the ibm-granite/granite-guardian-3.2-5b-lora-factuality-correction.
Risk Definitions
The model is specifically designed to correct assistant messages containing only the following risk:
- Factuality: Assistant message is factually incorrect relative to the information provided in the context. This risk arises when the response includes a small fraction of atomic units such as claims or facts that are not supported by or directly contradicted by some part of the context. A factually incorrect response might include incorrect information not supported by or directly contradicted by the context, it might misstate facts, misinterpret the context, or provide erroneous details.
The adapter manages both safe and unsafe cases as identified by the Granite Guardian 3.2 5B model. If the assistant message is deemed unsafe, it will correct the response. If the assistant message is already safe, it does not return any correction, confirming that no correction was needed, and thus helping to save compute resources.
This model is part of an ongoing research effort focused on post-generation mitigation and remains experimental and under active development. We are committed to continuous improvement and welcome constructive feedback to enhance its performance and capabilities.
Limitations
It is important to note that there is no built-in safeguard to guarantee that the corrected response will always be safe. As with other generative models, safety assurance relies on offline evaluations (see [Evaluations](#evaluations)), and we expect, but cannot ensure, that the corrected_response meets safety standards. For users seeking additional assurance, we recommend re-running the corrected output through the main Granite Guardian 3.3 (GG3.3) model to verify that it is indeed safe.
Using Granite Guardian and Factuality Correction LoRA
Granite Guardian Cookbooks offers an excellent starting point for working with guardian models, providing a variety of examples that demonstrate how the models can be configured for different risk detection scenarios. Refer to Quick Start Guide and Detailed Guide to get ready with Granite Guardian scope of use.
Granite Guardian 3.2 5B Factuality Correction LoRA Cookbooks provide the steps to insert the LoRA adapter on top of Granite Guardian for factuality-based corrections. This correction-LoRA model takes an input consisting of a prompt and an original response, and generates a factually viable correction. The Granite Guardian 3.2 5B Factuality Correction LoRA Cookbooks also include factually correct and incorrect examples.
Quickstart Example
The following code describes how to apply the Granite Guardian 3.2 5B Factuality Correction LoRA to safely correct assistant message.
The code checks if the assistant message contains the factuality risk, using Granite Guardian 3.2 5B. It extracts a "Yes" (i.e. unsafe) or "No" (i.e. safe) label and a confidence level from the model's output. If the response is detected as unsafe, it uses the Factuality Correction LoRA adapter to generate a safer version of the assistant message.
import warnings
import os, re
import torch
import math
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest
warnings.filterwarnings("ignore")
os.environ["VLLM_LOGGING_LEVEL"] = "ERROR"
def get_probabilities(logprobs):
safe_token_prob = 1e-50
risky_token_prob = 1e-50
for gen_token_i in logprobs:
for token_prob in gen_token_i.values():
decoded_token = token_prob.decoded_token
if decoded_token.strip().lower() == safe_token.lower():
safe_token_prob += math.exp(token_prob.logprob)
if decoded_token.strip().lower() == risky_token.lower():
risky_token_prob += math.exp(token_prob.logprob)
probabilities = torch.softmax(
torch.tensor([math.log(safe_token_prob), math.log(risky_token_prob)]), dim=0
)
return probabilities
def parse_output(output):
label, prob_of_risk = None, None
if nlogprobs > 0:
logprobs = next(iter(output.outputs)).logprobs
if logprobs is not None:
prob = get_probabilities(logprobs)
prob_of_risk = prob[1]
output = next(iter(output.outputs)).text.strip()
res = re.search(r"^\w+", output, re.MULTILINE).group(0).strip()
confid = re.search(r' (.*?) ',…Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Notable release from IBM, specialized LoRA