RepoIBM (Granite)IBM (Granite)published Apr 16, 2026seen 5d

ibm-granite/granite.debug-tools

Python

Open original ↗

Captured source

source ↗
published Apr 16, 2026seen 5dcaptured 14hhttp 200method plain

ibm-granite/granite.debug-tools

Description: Granite Debug Tools

Language: Python

License: Apache-2.0

Stars: 8

Forks: 2

Open issues: 1

Created: 2026-04-16T18:30:00Z

Pushed: 2026-06-10T14:27:20Z

Default branch: main

Fork: no

Archived: no

README:

Granite.Debug Tools

Granite.Debug is a suite of self-service debugging tools for Large Language Models (LLMs) that streamline issue detection, analysis, and resolution across diverse LLM workflows.

These tools help identify, evaluate, and resolve issues across fine-tuning workflows, benchmark analysis, and agent-based LLM interactions.

Available Tools

Selecting the Right Tool

| If I need to... | Then I should use... | | ---------------- | -------------------- | | Design scaffolded tasks to diagnose which skill-level capability is missing | [STaD](./STaD/) | | Benchmark LLM serving endpoints and local inference with an MCP-based tool | [perfbench](./perfbench/) | | Validate model behavior across inference engines (vLLM, llama.cpp, Ollama) | [runtimes-validator](./runtimes-validator/) |

STaD - Scaffolded Task Design

[STaD](./STaD/) is a framework for generating scaffolded variations of multi-step reasoning tasks to enable systematic LLM debugging, evaluation, and training.

Use STaD when you need to design scaffolded tasks to diagnose which skill-level capability is missing in your model.

perfbench - MCP server for Granite benchmarking

[perfbench](./perfbench/) is an MCP server that manages LLM benchmark runs as asynchronous subprocesses, wrapping five benchmark runners (vLLM, AIPerf, GuideLLM, llama-bench, Ollama) behind a unified tool interface.

Use perfbench when you need to benchmark LLM serving endpoints or local inference and want an agent-driven workflow via the Model Context Protocol.

runtimes-validator

[runtimes-validator](./runtimes-validator/) is a unified validation framework for running model checks across inference engines (vLLM, llama.cpp, Ollama). It provides a CLI (runtimes-validator) to run automated validation tests against Granite models deployed on different backends, supporting both managed (framework starts/stops the engine) and external (connect to a running engine) execution modes.

Use runtimes-validator when you need to validate that a Granite model behaves correctly across different inference engines.

Coming Soon

Additional debugging tools are being prepared for open-source release. Stay tuned!

Contributing

We welcome contributions! If you'd like to contribute to any of the tools in this repository, please open an issue or submit a pull request.

🚧 Work in Progress

This repository is actively evolving. We are continuously adding new debugging tools, expanding coverage, and refining existing functionality based on community feedback and ongoing research. Check back regularly for updates, and feel free to open an issue or discussion if you have suggestions or requests.

Notice

IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

Notability

notability 2.0/10

Routine new repo with very low traction