RepoIBM (Granite)IBM (Granite)published Oct 11, 2024seen 5d

ibm-granite/granite-guardian

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Oct 11, 2024seen 5dcaptured 15hhttp 200method plain

ibm-granite/granite-guardian

Description: The Granite Guardian models are designed to detect risks in prompts and responses.

Language: Jupyter Notebook

License: Apache-2.0

Stars: 152

Forks: 17

Open issues: 9

Created: 2024-10-11T14:28:45Z

Pushed: 2026-05-05T01:44:52Z

Default branch: main

Fork: no

Archived: no

README:

Granite Guardian

📌 What's New?

April 2026: Granite-Guardian-4.1-8B introduces improved Bring Your Own Criteria (BYOC) support, enabling users to define arbitrary judging criteria beyond the pre-baked safety and hallucination detectors. The model can now faithfully evaluate complex, multi-part requirements such as formatting rules, length constraints, and domain-specific instructions.

Sept 2025: 🏆 Granite-Guardian-3.3 has has secured the 3rd position on the LLM‑AggreFact benchmark, a comprehensive fact‑checking benchmark that consolidates 11 datasets on grounded factuality. Granite Guardian 3.3 8B also holds the #1 position on the REVEAL benchmark (a dataset that evaluates the correctness of reasoning chains generated by LLMs) which is one of the 11 dimensions in LLM-AggreFact. Additionally, while our Granite Guardian model is only 8B in parameter size, it outperforms much larger models such as gpt-4o and Mistral Large 2 on this benchmark.

Sept 2025: Two new LoRA adapters for multi-risk detection and harm-correction are live!

Aug 2025: Granite-Guardian-3.3 is live! 🤖 New hybrid thinking mode for better reasoning and improved bring-your-own-criteria functionality.

✨ Feb 2025: Granite-Guardian-3.2 is out! ⚙️ Adds two new model sizes, verbalized confidence, and two new risks. Updated notebooks included.

✨ Dec 2024: Granite-Guardian-3.1 has landed! 🛠️ Featuring updated notebooks, documentation, and results.

✨ Dec 2024: 📚 Check out the new technical report for Granite-Guardian-3.0.

Overview

The Granite Guardian family is a collection of models designed to judge if the input prompts and the output responses of an LLM based system meet specified criteria. The models come pre-baked with certain criteria including but not limited to: jailbreak attempts, profanity, and hallucinations related to tool calls and retrieval augmented generation in agent-based systems. Additionally, the models also allow users to bring their own criteria and tailor the judging behavior to specific use-cases.

Trained on instruction fine-tuned Granite languages models, these models can help with detection along many key dimensions catalogued in the IBM AI Risk Atlas. These models are trained on unique data comprising human annotations from socioeconomically diverse people and synthetic data informed by internal red-teaming. They outperform similar models on standard benchmarks.

Quick Links

  • :books: Technical Report

Granite Guardian Collection

| Model Name | Model Link | Quickstart | Detailed Guide | |---|---|---|---| | Granite-Guardian-4.1-8B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.3-8B | 🤗 Link | 📕 Link | 📕 Link - Think 📕 Link - No Think| | Granite-Guardian-3.2-5B-lora-harm-categories | 🤗 Link | 📕 Link | | | Granite-Guardian-3.2-5B-lora-harm-correction | 🤗 Link | 📕 Link | | | Granite-Guardian-3.2-5B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.2-3B-A800M | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.1-8B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-3.1-2B | 🤗 Link | 📕 Link | 📕 Link | | Granite-Guardian-HAP-125M | 🤗 Link | - | [📕…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New IBM repo with moderate stars.