RepoMeituan (LongCat)Meituan (LongCat)published Jan 14, 2026seen 5d

meituan-longcat/LongCat-Flash-Thinking-2601

Open original ↗

Captured source

source ↗

meituan-longcat/LongCat-Flash-Thinking-2601

License: MIT

Stars: 254

Forks: 11

Open issues: 4

Created: 2026-01-14T18:22:54Z

Pushed: 2026-05-09T10:22:01Z

Default branch: main

Fork: no

Archived: no

README:

LongCat-Flash-Thinking-2601

Tech Report 📄

Model Introduction

We introduce an updated version of LongCat-Flash-Thinking, a powerful and efficient Large Reasoning Model (LRM) with 560 billion total parameters, built upon an innovative Mixture-of-Experts (MoE) architecture. Beyond inheriting the domain-parallel training recipe in our previous version and maintaining highly competitive performance on traditional reasoning benchmarks, this update systematically strengthens agentic thinking capability through a carefully designed pipeline that combines environment scaling and subsequent task synthesis, followed by reliable and efficient large-scale and multi-environment reinforcement learning. To better adapt to the noise and uncertainty inherent in real-world agentic tasks, we conduct systematic analysis and curriculum training over multiple types and levels of environmental noise, enabling robust performance under imperfect conditions. As a result, LongCat-Flash-Thinking achieves not only top-tier benchmark performance in agentic tool use, agentic search, and tool-integrated reasoning, but also substantially improved generalization in arbitrary out-of-distribution real-world agentic scenarios. We further design dedicated evaluation protocols to assess the robustness and generalization ability. In addition, we introduce our Heavy Thinking Mode, which further enhances the model’s performance on extremely challenging tasks via intensive parallel thinking.

Key Features

🌟 Environment Scaling and Multi-Environment Reinforcement Learning

We construct a diverse set of high-quality environments that serve as a training playground for reinforcement learning, enabling the model to acquire high-level, generalizable agentic skills. Each environment contains over 60 tools organized in a dense dependency graph, providing sufficient complexity for diverse task construction and large-scale exploration. As the number of training environments increases, we observe consistent improvements on out-of-domain evaluations, indicating strengthened generalization.

  • High Quality Task Construction.

To ensure the quality of the training task set, we explicitly control both task complexity and diversity. Each task is defined over a connected subgraph sampled from a high-quality environment, and task complexity is controlled by requiring coordinated use of as many tools as possible within the sampled subgraph. The sampling probability of previously selected tools is progressively reduced to promote task diversity. We construct corresponding databases to ensure task executability, and each task is verified to admit at least one executable solution. However, when environments contain a large number of tools, maintaining consistency across databases becomes challenging and may lead to unverifiable tasks. Specialized strategies are designed to tackle this issue.

  • Multi-Environment Reinforcement Learning.

While maintaining the efficient asynchronous training and streaming rollout features, we further extend our reinforcement learning infrastructure (DORA) to support large-scale multi-environment agentic training, as required by our environment scaling protocol. Tasks from multiple environments are jointly organized within each training batch in a balanced manner, and are allocated different rollout budgets based on both their complexity and the current training state.

🌟 Robust Training against Noisy Environment

Since real-world agentic environments are inherently noisy and imperfect, training models only in idealized environments is insufficient and often results in limited robustness. To address this issue, we explicitly incorporate environmental imperfections into the model training process to enhance robustness. Specifically, we systematically analyze the major sources of real-world noise in agentic scenarios and then design an automatic pipeline to inject such noise into training environments. During reinforcement learning, we adopt a curriculum strategy that progressively increases both the type and the intensity of noise as training proceeds. Benefiting from our robust training, LongCat-Flash-Thinking develops strong resilience to environmental uncertainty and consistently achieves improved performance under imperfect conditions.

🌟 Heavy Thinking Mode

To push reasoning capability beyond current boundary, we established our Heavy Thinking Mode. Specifically, we decompose challenging problem solving into two complementary stages: parallel thinking and summarization, thus jointly scaling both reasoning depth and width. For reasoning width scaling, under Heavy Thinking Mode, multiple trajectories are independently generated in a parallel manner, enabling broad exploration of reasoning paths. Reasonably high inference temperature here is applied to ensure possible diversity. For reasoning depth scaling, the refined trajectories during the summarization stage can be recursively fed back into the summary model, forming an iterative reasoning loop that supports progressively deeper reasoning. An additional reinforcement learning stage is specifically tailored to train the summarization ability, thus further unlocking the potential of this mode.

We've launched Heavy Thinking Mode on the Longcat AI platform. Feel free to try it out: https://longcat.chat/.

Evaluation Results

| Benchmark | DeepSeek-V3.2-Thinking | Kimi-K2-Thinking | Qwen3-235B-A22B-Thinking-2507 | GLM-4.7-Thinking | Claude-Opus-4.5-Thinking | Gemini-3-Pro | GPT-5.2-Thinking-xhigh | LongCat-Flash-Thinking-2601 | |---------------|------------------------|------------------|-------------------------------|------------------|---------------------------|--------------|------------------------|------------------------------| | Architecture | MoE | MoE | MoE | MoE | - | - | - | MoE | | # Total Params | 671B | 1T | 235B | 355B | - | - | - | 560B | | # Activated Params | 37B | 32B | 22B | 32B | - | - | - | 27B | | Mathematical Reasoning w/ Tools | | | | | | | | | | AIME-25 (Avg@16) | 93.5* | 99.1† | 92.6* | 95.3* | 100.0 | 99.8 | 100.0 | 99.6 / 100.0‡ | | HMMT-25 (Avg@16) | 93.5* | 95.1† | 83.9* | 98.1* | 98.6 | 99.8 | 99.6 | 93.4 / 97.5‡ | |…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New model with modest community traction.