What does this model signal mean?

OpenBMB (MiniCPM) published openbmb/MiniCPM5-1B-SFT. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license apache-2.0 · 13.2K HF downloads · A 1B parameter fine-tuned language model from OpenBMB.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

OpenBMB (MiniCPM) Model: openbmb/MiniCPM5-1B-SFT

Captured source

source ↗

Hugging Face/huggingface.co/openbmb/MiniCPM5-1B-SFT

openbmb/MiniCPM5-1B-SFT model card

Source ↗

published May 21, 2026seen Jun 6captured Jun 11http 200method plaintask text-generationlicense apache-2.0library transformersparams 1.1Bdownloads 13klikes 37

MiniCPM Tech Report | GitHub Repo | UltraData | MiniCPM Desk Pet | Online Demo

English | 中文

Highlights

We are releasing MiniCPM5-1B, the first model in the MiniCPM5 series. It is a dense 1B Transformer built for on-device, local deployment, and resource-constrained scenarios, reaching 1B-class open-source SOTA.

🏆 1B-class open-source SOTA: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.

!MiniCPM5-1B capability comparison by domain

🧠 Hybrid Reasoning: built-in ` chat template, switch via enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.

🛠️ Deployment / Fine-tuning Resources: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.

🐱 Desktop Pet: a local-LLM desktop pet driven by MiniCPM5-1B.

Model List

Use this directory to choose the model format that matches your runtime:

[MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) · ModelScope · BF16 final release (post-trained with RL + OPD)
[MiniCPM5-1B-SFT](https://huggingface.co/openbmb/MiniCPM5-1B-SFT) · ModelScope · BF16 SFT-only checkpoint (before RL / OPD) 👈 you are here
[MiniCPM5-1B-Base](https://huggingface.co/openbmb/MiniCPM5-1B-Base) · ModelScope · BF16 base checkpoint (pre-training only)
[MiniCPM5-1B-GGUF](https://huggingface.co/openbmb/MiniCPM5-1B-GGUF) · ModelScope · GGUF for llama.cpp / Ollama / LM Studio
[MiniCPM5-1B-MLX](https://huggingface.co/openbmb/MiniCPM5-1B-MLX) · ModelScope · MLX / 4bit for Apple Silicon

Model Information

MiniCPM5-1B has the following features:

Type: Causal Language Model
Architecture: Standard LlamaForCausalLM
Number of Parameters: 1,080,632,832
Number of Non-Embedding Parameters: 679,552,512
Number of Layers: 24
Number of Attention Heads (GQA): 16 for Q and 2 for KV
Context Length: 131,072

Introduction

MiniCPM5-1B is the first checkpoint in the MiniCPM5 series. It is designed for local assistants, coding agents, tool-use workflows, and reasoning scenarios where a compact model is preferred. The model keeps a small deployment footprint while providing native long-context support and both Think / No Think chat modes through the same checkpoint.

Evaluation Results

We compare MiniCPM5-1B with strong open-source models in the same size class, including LFM2.5-1.2B-Thinking, Qwen3-0.6B/think and Qwen3.5-0.8B/think. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.

!MiniCPM-5 1B Public Leaderboard

Training Recipe

The training of MiniCPM5-1B is a full-stack practice of [UltraData Tiered Data Management](https://arxiv.org/pdf/2602.09003), covering three stages: base training, mid-training, and post-training.

During base training, the model goes through stable training and decay training to build core language capability and training stability. It then enters mid-training to further strengthen target capabilities and adapt to the target data distribution. The training corpus is released alongside the model as Ultra-FineWeb, Ultra-FineWeb-L3, and UltraData-Math.

During post-training, we proceed in three steps: SFT, RL, and OPD. We first use 200B tokens of deep-thinking SFT and 200B tokens of hybrid-thinking SFT to establish deep-thinking, hybrid-thinking, and general chat abilities; the SFT data is released as UltraData-SFT-2605. We then train specialized RL teachers for math, code, closed-book QA, writing, and related domains, and use On-Policy Distillation (OPD) to distill these teachers back into one release model.

!MiniCPM5-1B Training Recipe

What does RL + OPD bring?

RL + OPD is a key part of MiniCPM5-1B post-training. On math, code and instruction-following tasks, RL + OPD raises the average score by ↑16 points while cutting the share of responses that hit the max-tokens budget by ↓29 percentage points. The figures below show the two-stage Reasoning RL pipeline, score gains, and the drop in overlong responses.

RL combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on DAPO-Math-17k, follows the minimalist recipe of JustRL, and further adds a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use TriviaQA, NQ-Open, LongWriter-Zero-RLData, synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.

!MiniCPM5-1B RL Two-stage Pipeline

OPD builds on Thinking Machines Lab's On-Policy Distillation and incorporates implementation improvements from...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Moderate traction SFT model release