zai-org/GLM-5
Captured source
source ↗zai-org/GLM-5
Description: GLM-5: From Vibe Coding to Agentic Engineering
License: Apache-2.0
Stars: 3387
Forks: 372
Open issues: 32
Created: 2026-02-09T08:17:02Z
Pushed: 2026-05-15T05:06:07Z
Default branch: main
Fork: no
Archived: no
README:
GLM-5.1 & GLM-5
👋 Join our Wechat or Discord community.
📖 Check out the GLM-5.1 blog and GLM-5 Technical report.
📍 Use GLM-5.1 API services on Z.ai API Platform.
🔜 GLM-5.1 will be available on chat.z.ai in the coming days.
Introduction
GLM-5.1
GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

But the most meaningful leap goes beyond first-pass performance. Previous models—including GLM-5—tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn't help.
GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. We've found that the model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls. The longer it runs, the better the result.
GLM-5
We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.

GLM-5 is purpose-built for complex systems engineering and long-horizon agentic tasks. On our internal evaluation suite CC-Bench-V2, GLM-5 significantly outperforms GLM-4.7 across frontend, backend, and long-horizon tasks, narrowing the gap to Claude Opus 4.5.

On Vending Bench 2, a benchmark that measures long-term operational capability, GLM-5 ranks \#1 among open-source models. Vending Bench 2 requires the model to run a simulated vending machine business over a one-year horizon; GLM-5 finishes with a final account balance of $4,432, approaching Claude Opus 4.5 and demonstrating strong long-term planning and resource management.

Download Model
| Model | Download Links | Model Size | Precision | |-------------|-------------------------------------------------------------------------------------------------------------------------------------|------------|-----------| | GLM-5.1 | 🤗 Hugging Face 🤖 ModelScope | 744B-A40B | BF16 | | GLM-5.1-FP8 | 🤗 Hugging Face 🤖 ModelScope | 744B-A40B | FP8 | | GLM-5 | 🤗 Hugging Face 🤖 ModelScope | 744B-A40B | BF16 | | GLM-5-FP8 | 🤗 Hugging Face 🤖 ModelScope | 744B-A40B | FP8 |
Serve GLM-5 Series Locally
Prepare environment
vLLM, SGLang, xLLM and Ktransformers all support local deployment of GLM-5 series model, A simple deployment guide is provided here.
+ vLLM
Using Docker as:
docker pull vllm/vllm-openai:v0.20.2-cu129 docker pull vllm/vllm-openai:v0.20.2 # For CUDA 13.0
+ SGLang
Using Docker as:
docker pull lmsysorg/sglang:v0.5.11 docker pull lmsysorg/sglang:v0.5.11-cu130 # For CUDA 13.0
Deploy
+ vLLM
vllm serve zai-org/GLM-5.1-FP8 \ --tensor-parallel-size 8 \ --gpu-memory-utilization 0.85 \ --speculative-config.method mtp \ --speculative-config.num_speculative_tokens 3 \ --tool-call-parser glm47 \ --reasoning-parser glm45 \ --enable-auto-tool-choice \ --chat-template-content-format=string \ --served-model-name glm-5.1-fp8
Check the recipes for more details. >Note: When encounter Tool Call Parse issue with MTP enabled, please turn to vllm main branch to serve GLM-5.1.
+ SGLang
sglang serve \ --model-path zai-org/GLM-5.1-FP8 \ --tp-size 8 \ --tool-call-parser glm47 \ --reasoning-parser glm45 \ --speculative-algorithm EAGLE \ --speculative-num-steps 3 \ --speculative-eagle-topk 1 \ --speculative-num-draft-tokens 4 \ --mem-fraction-static 0.85 \ --served-model-name glm-5.1-fp8 \ --port 8000 \ --host 0.0.0.0
Check the sglang cookbook for more details.
+ xLLM
Please check the deployment guide here.
+ Ktransformers
Please check the deployment guide here.
Citation
If you find GLM-5 series model useful in your research, please cite our technical report:
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable model release with good traction.