RepoIBM (Granite)IBM (Granite)published Feb 17, 2025seen 5d

ibm-granite/granite-vision-models

Jupyter Notebook

Open original ↗

Captured source

source ↗

ibm-granite/granite-vision-models

Language: Jupyter Notebook

License: Apache-2.0

Stars: 46

Forks: 10

Open issues: 5

Created: 2025-02-17T21:42:13Z

Pushed: 2026-04-29T22:17:48Z

Default branch: main

Fork: no

Archived: no

README:

:books: Granite Vision Paper | :bar_chart: ChartNet CVPR 2026 Paper | :hugs: HuggingFace Collection | :speech_balloon: Discussions Page

Granite Vision Models

Granite Vision is a family of multimodal vision‑language models designed to support enterprise‑grade document understanding tasks, including charts, tables, key‑value extraction, and structured image‑to‑text generation. This repository provides documentation, examples, and pointers to available model releases and datasets.

---

🚀 Latest Release: Granite‑Vision-4.1‑4B

Granite‑Vision-4.1‑4B is a vision‑language model tailored for enterprise document data extraction, delivered as a LoRA adapter on top of Granite-4.1-3B.

It supports:

  • Chart extraction — Chart‑to‑CSV, Chart‑to‑Summary, Chart‑to‑Code
  • Table extraction — JSON, HTML, and OTSL
  • Semantic KVP extraction — Schema‑guided extraction across diverse document layouts
  • Image‑to‑text — Natural‑language descriptions of images

Granite‑Vision-4.1‑4B preserves and extends Granite Vision 4 capabilities while providing more specialized extraction workflows.

---

📊 ChartNet Dataset

ChartNet is a million‑scale multimodal dataset created to support robust chart understanding tasks: ➡️ https://huggingface.co/datasets/ibm-granite/ChartNet

It includes:

  • 1.7M synthetic charts with aligned images, code, tables, summaries, and reasoning
  • 94,643 human‑verified charts
  • 2,000 human‑verified test samples
  • 24 chart types, across 6 plotting libraries

ChartNet uses a code‑guided synthesis pipeline, producing tightly aligned visual, numerical, and textual components. It was used during training for Granite‑Vision-4.1‑4B.

---

📚 Legacy Granite Vision Models

Older Granite Vision models remain available for users who rely on earlier releases:

  • Granite Vision 4 (3B)

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

  • Granite Vision 3.3 (2B)

https://huggingface.co/ibm-granite/granite-vision-3.3-2b

  • Granite Vision 3.1 (2B Preview)

https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview

  • Granite Vision 3.3 (GGUF‑converted)

https://huggingface.co/ibm-granite/granite-vision-3.3-2b-GGUF

---

License

All Granite Vision Models are distributed under [Apache 2.0](./LICENSE) license.

---

Would you like to provide feedback?

Please let us know your comments about our family of language models by visiting our Hugging Face model collection: https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800

Select the model repository you would like to provide feedback about, go to the Community tab, and click New discussion.

Alternatively, you may also post questions or comments on our GitHub discussions page: https://github.com/orgs/ibm-granite/discussions

---

Ethical Considerations and Limitations

The use of Large Vision and Language Models involves important risks, including bias, fairness concerns, misinformation, and challenges around autonomous decision‑making. Granite‑vision‑3.2‑2b is no exception.

Although alignment processes incorporate safety considerations, the model may sometimes produce inaccurate, biased, or unsafe responses. Smaller models in particular may exhibit increased susceptibility to hallucination, an active area of ongoing research.

We urge the community to deploy Granite Vision models responsibly, especially for document‑understanding tasks. More general vision tasks may carry higher risks of harmful or biased outputs.

To enhance safety, we recommend using Granite Vision models alongside Granite Guardian, a fine‑tuned model designed to detect and flag risks across dimensions from the IBM AI Risk Atlas.

---

Contributing

Issues and pull requests are welcome. Please open a GitHub issue to report bugs or suggest enhancements.

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New vision models from IBM, moderate stars.