ibm-granite/granite-guardian
Jupyter Notebook
Captured source
source ↗ibm-granite/granite-guardian
Description: The Granite Guardian models are designed to detect risks in prompts and responses.
Language: Jupyter Notebook
License: Apache-2.0
Stars: 152
Forks: 17
Open issues: 9
Created: 2024-10-11T14:28:45Z
Pushed: 2026-05-05T01:44:52Z
Default branch: main
Fork: no
Archived: no
README:
Granite Guardian
📌 What's New?
✨ April 2026: Granite-Guardian-4.1-8B introduces improved Bring Your Own Criteria (BYOC) support, enabling users to define arbitrary judging criteria beyond the pre-baked safety and hallucination detectors. The model can now faithfully evaluate complex, multi-part requirements such as formatting rules, length constraints, and domain-specific instructions.
✨ Sept 2025: 🏆 Granite-Guardian-3.3 has has secured the 3rd position on the LLM‑AggreFact benchmark, a comprehensive fact‑checking benchmark that consolidates 11 datasets on grounded factuality. Granite Guardian 3.3 8B also holds the #1 position on the REVEAL benchmark (a dataset that evaluates the correctness of reasoning chains generated by LLMs) which is one of the 11 dimensions in LLM-AggreFact. Additionally, while our Granite Guardian model is only 8B in parameter size, it outperforms much larger models such as gpt-4o and Mistral Large 2 on this benchmark.
✨ Sept 2025: Two new LoRA adapters for multi-risk detection and harm-correction are live!
✨ Aug 2025: Granite-Guardian-3.3 is live! 🤖 New hybrid thinking mode for better reasoning and improved bring-your-own-criteria functionality.
✨ Feb 2025: Granite-Guardian-3.2 is out! ⚙️ Adds two new model sizes, verbalized confidence, and two new risks. Updated notebooks included.
✨ Dec 2024: Granite-Guardian-3.1 has landed! 🛠️ Featuring updated notebooks, documentation, and results.
✨ Dec 2024: 📚 Check out the new technical report for Granite-Guardian-3.0.
Overview
The Granite Guardian family is a collection of models designed to judge if the input prompts and the output responses of an LLM based system meet specified criteria. The models come pre-baked with certain criteria including but not limited to: jailbreak attempts, profanity, and hallucinations related to tool calls and retrieval augmented generation in agent-based systems. Additionally, the models also allow users to bring their own criteria and tailor the judging behavior to specific use-cases.
Trained on instruction fine-tuned Granite languages models, these models can help with detection along many key dimensions catalogued in the IBM AI Risk Atlas. These models are trained on unique data comprising human annotations from socioeconomically diverse people and synthetic data informed by internal red-teaming. They outperform similar models on standard benchmarks.
Quick Links
- :books: Technical Report
Granite Guardian Collection
| Model Name | Model Link | Quickstart | Detailed Guide | |---|---|---|---| | Granite-Guardian-4.1-8B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.3-8B | 🤗 Link | 📕 Link | 📕 Link - Think 📕 Link - No Think| | Granite-Guardian-3.2-5B-lora-harm-categories | 🤗 Link | 📕 Link | | | Granite-Guardian-3.2-5B-lora-harm-correction | 🤗 Link | 📕 Link | | | Granite-Guardian-3.2-5B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.2-3B-A800M | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.1-8B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.1-2B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-HAP-125M | 🤗 Link | - | [📕…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New IBM repo with moderate stars.