RepoZhipu AI (GLM)Zhipu AI (GLM)published Feb 2, 2026seen 5d

zai-org/GLM-OCR

Python

Open original ↗

Captured source

source ↗
published Feb 2, 2026seen 5dcaptured 12hhttp 200method plain

zai-org/GLM-OCR

Description: GLM-OCR: Accurate × Fast × Comprehensive

Language: Python

License: Apache-2.0

Stars: 6935

Forks: 638

Open issues: 40

Created: 2026-02-02T12:59:43Z

Pushed: 2026-04-21T08:52:11Z

Default branch: main

Fork: no

Archived: no

README:

GLM-OCR

[中文阅读](README_zh.md)

👋 Join our WeChat and Discord community

📖 Check out the GLM-OCR technical report

📍 Use GLM-OCR's API

Model Introduction

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.

Key Features

  • State-of-the-Art Performance: Achieves a score of 94.62 on OmniDocBench V1.5, ranking #1 overall, and delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.
  • Optimized for Real-World Scenarios: Designed and optimized for practical business use cases, maintaining robust performance on complex tables, code-heavy documents, seals, and other challenging real-world layouts.
  • Efficient Inference: With only 0.9B parameters, GLM-OCR supports deployment via vLLM, SGLang, and Ollama, significantly reducing inference latency and compute cost, making it ideal for high-concurrency services and edge deployments.
  • Easy to Use: Fully open-sourced and equipped with a comprehensive SDK and inference toolchain, offering simple installation, one-line invocation, and smooth integration into existing production pipelines.

News & Updates

  • [2026.3.12] GLM-OCR SDK now supports agent-friendly Skill mode — just pip install glmocr + set API key, ready to use via CLI or Python with no GPU or YAML config needed. See: [GLM-OCR Skill](skills/glmocr/SKILL.md)
  • [2026.3.12] GLM-OCR Technical Report is now available. See: GLM-OCR Technical Report
  • [2026.2.12] Fine-tuning tutorial based on LLaMA-Factory is now available. See: [GLM-OCR Fine-tuning Guide](examples/finetune/README.md)

Download Model

| Model | Download Links | Precision | | ------- | --------------------------------------------------------------------------------------------------------------------------- | --------- | | GLM-OCR | 🤗 Hugging Face 🤖 ModelScope | BF16 |

GLM-OCR SDK

We provide an SDK for using GLM-OCR more efficiently and conveniently.

Install SDK

Choose the lightest installation that matches your scenario:

# Cloud / MaaS + local images / PDFs (fastest install)
pip install glmocr

# Self-hosted pipeline (layout detection)
pip install "glmocr[selfhosted]"

# Flask service support
pip install "glmocr[server]"

Install from source for development:

# Install from source
git clone https://github.com/zai-org/glm-ocr.git
cd glm-ocr
uv venv --python 3.12 --seed && source .venv/bin/activate
uv pip install -e .

Model Deployment

Two ways to use GLM-OCR:

Option 1: Zhipu MaaS API (Recommended for Quick Start)

Use the hosted cloud API – no GPU needed. The cloud service runs the complete GLM-OCR pipeline internally, so the SDK simply forwards your request and returns the result.

1. Get an API key from https://open.bigmodel.cn 2. Configure config.yaml:

pipeline:
maas:
enabled: true # Enable MaaS mode
api_key: your-api-key # Required

That's it! When maas.enabled=true, the SDK acts as a thin wrapper that:

  • Forwards your documents to the Zhipu cloud API
  • Returns the results directly (Markdown + JSON layout details)
  • No local processing, no GPU required

Input note (MaaS): the upstream API accepts file as a URL or a data:;base64,... data URI. If you have raw base64 without the data: prefix, wrap it as a data URI (recommended). The SDK will auto-wrap local file paths / bytes / raw base64 into a data URI when calling MaaS.

API documentation: https://docs.bigmodel.cn/cn/guide/models/vlm/glm-ocr

Option 2: Self-host with vLLM / SGLang

Deploy the GLM-OCR model locally for full control. The SDK provides the complete pipeline: layout detection, parallel region OCR, and result formatting.

Install the self-hosted extra first:

pip install "glmocr[selfhosted]"

##### Using vLLM

Install vLLM:

docker pull vllm/vllm-openai:v0.19.0-ubuntu2404

Or using with pip:

pip install -U "vllm>=0.19.0"

Launch the service:

pip install "transformers>=5.3.0"

vllm serve zai-org/GLM-OCR --port 8080 --speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' --served-model-name glm-ocr

>Note Add --max-model-len and --gpu-memory-utilization according to Your own machine to handle large image/pdf

##### Using SGLang

Install SGLang:

docker pull lmsysorg/sglang:v0.5.10

Or using with pip:

pip install "sglang>=0.5.10"

Launch the service:

SGLANG_ENABLE_SPEC_V2=1 sglang serve --model-path zai-org/GLM-OCR --port 8080 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --served-model-name glm-ocr

>Note Add --context-len and --mem-fraction-static according to Your own machine to handle large image/pdf

Option 3: Ollama/MLX

For specialized deployment scenarios, see the detailed guides:

  • [Apple Silicon with mlx-vlm](examples/mlx-deploy/README.md) - Optimized for Apple Silicon Macs
  • [Ollama Deployment](examples/ollama-deploy/README.md) - Simple local deployment with Ollama

Option 4: SDK Server + Client (GPU-less Client)

Deploy the SDK Server on a GPU machine, then use any machine as a client — no GPU needed on the client side. The client connects via the MaaS-compatible protocol, pointing api_url at your self-hosted server.

# Client config.yaml
pipeline:
maas:
enabled: true…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

High traction notable OCR model release