Meituan (LongCat) analysis

Thesis

Meituan LongCat is a full-spectrum neolab executing a deliberate model–system co-design strategy across text, image, video, audio, and omni modalities, anchored on a 560B Mixture-of-Experts backbone with dynamic activation. The lab's public evidence reveals three converging bets: (1) treating evaluation infrastructure as a first-class output — LongCat has released more bespoke benchmarks than most peers — to shape the evaluation landscape it competes in; (2) vertically integrating inference systems via open-source engine releases (SGLang-FluentLLM) and selective forks of DeepSeek's compute stack; and (3) a quiet pivot toward native multimodal architectures (LongCat-Next, DiNA) and API-gated trillion-parameter models (LongCat-2.0-Preview), signaling a shift from open-weight commodity to proprietary platform. The evidence pack contains no hiring data beyond a single internship program announcement, no direct public talking from lab leadership, and no partnership or revenue disclosures — consistent with a lab that ships artifacts prolifically but communicates sparingly outside Chinese-language channels.

Signal desks

Hiring

AGI Internship Program: Meituan announced a LongCat AI Model Internship Program to "cultivate AGI talent" W1. No specific role titles, locations, team functions, or headcount are cited in this pack. The absence of detailed job listings, engineering role descriptions, or senior research hire announcements is notable for a lab of this shipping velocity and suggests either opaque recruiting channels or reliance on academic partnerships.
Assessment: The single evidence point is too thin to map teams, hubs, or infra/data spend signals. The internship framing around "AGI talent" W1 is consistent with a lab building long-term research capacity rather than near-term product commercialization, but the evidence cannot support stronger conclusions.

Forks

DeepSeek inference stack (3 repos): LongCat forked deepseek-ai/FlashMLA E46, deepseek-ai/DeepGEMM E50, and deepseek-ai/DeepEP E51 in early February 2026. These are all compute-kernel and communication libraries for MLA attention, GEMM operations, and expert parallelism — consistent with a team optimizing inference for its own 560B MoE architecture.
FlashInfer (`flashinfer-ai/flashinfer`): Forked with custom feature/longcat_main branch including communication–computation fused kernel optimizations E52; directly referenced in the SGLang-FluentLLM release as part of the kernel stack P18.
Dao-AILab/fast-hadamard-transform: Forked E49, likely for quantization-aware inference optimizations.
Microsoft/mscclpp: Forked E47, suggesting NCCL-level communication optimization work.
Assessment: Forks cluster tightly around inference acceleration (MLA, GEMM, expert parallelism, communication libraries) for a large MoE model. No forks point to agent frameworks, data pipelines, or eval tooling — the lab's eval work appears entirely first-party built.

Releases

LongCat-Flash-Chat (Aug 2025): 560B MoE text-generation model, MIT license. 78,653 HF downloads, 535 likes E1. The lab's flagship chat model and highest-traction release.
LongCat-Flash-Thinking (Sep 2025): 560B large reasoning model with dynamic computation, MIT license. 64 downloads, 149 likes E7. An iterative follow-up, LongCat-Flash-Thinking-2601, shipped Jan 2026 with 4,686 downloads, 114 likes E8.
LongCat-Flash-Omni (Oct 2025): 560B any-to-any model, MIT license. 33 downloads, 113 likes E9.
LongCat-Video (Oct 2025): 13.6B DiT-based text-to-video model, MIT license, unifies Text-to-Video, Image-to-Video, and Video-Continuation in a single model W6. 4,273 GitHub stars P12, 2,863 HF downloads, 527 likes E2 — the lab's strongest community traction by stars.
LongCat-Audio-Codec (Oct 2025): Speech tokenizer/detokenizer for speech LLMs P5. 301 GitHub stars P5, 42 HF likes E14.
LongCat-Image family (Dec 2025–Feb 2026): LongCat-Image (text-to-image, Apache-2.0, 20,338 downloads, 247 likes E3), LongCat-Image-Dev (1,616 downloads, 49 likes E12), LongCat-Image-Edit (image-to-image, 26,570 downloads, 183 likes E5), LongCat-Image-Edit-Turbo (23,114 downloads, 70 likes E11).
LongCat-Flash-Lite (Jan 2026): ~69B smaller MoE variant, 1,653 downloads, 189 likes E4.
LongCat-Flash-Thinking-ZigZag (Jan 2026): 560B reasoning variant, 19 downloads, 32 likes E17.
LongCat-HeavyMode-Summary (Jan 2026): 560B, 12 downloads, 13 likes E23.
LongCat-Flash-Prover (Mar 2026): 560B model for formal mathematics in Lean4 via agentic tool-integrated reasoning, 145 downloads, 34 likes E16 P21.
LongCat-Next (Mar 2026): Native multimodal model (~74B, any-to-any) using DiNA discrete autoregressive architecture, MIT license. 788 downloads, 179 likes E6, 438 GitHub stars P22.
LongCat-AudioDiT (Mar 2026): TTS models at 1B and 3.5B scales P24 E10 E13.
WBench-weights (May 2026): Evaluation model weights, Apache-2.0, 9 likes E15.
LongCat-2.0-Preview (Apr 2026, via API): Trillion-parameter model, API-only, no open-source release, no technical report, no official announcement W5.

Talking

LongCat-Next as "physical world AI interaction": Public framing positions LongCat-Next as a native multimodal model for "AI that can perceive and interact with the real world" W2 W4, emphasizing the DiNA architecture that eliminates barriers between modalities.
WBench as world model evaluation standard: The lab frames WBench as "the first systematic multi-round benchmark for interactive video world models" — a "CT scanner" for diagnosing consistency in multi-step interactive simulation W3.
LongCat-Video as long-video framework: External analysis positions LongCat-Video as the only open-source framework treating long video as a first-class citizen via native Video-Continuation pretraining W6.
Quiet trillion-parameter launch: LongCat-2.0-Preview was rolled out silently on the API platform with no blog post, no technical report, and no open-source links — a departure from all prior LongCat releases W5. This shift to API-only distribution for the largest model is a notable strategic signal.
No cited evidence of lab leadership blog posts, interviews, policy statements, or community engagement outside the WeChat/Discord/X badges on every repo.

Shipping

LongCat's shipping cadence from August 2025 to May 2026 is among the most aggressive of any neolab tracked. They shipped at least 17 distinct model releases across five modalities (text, image, video, audio, omni), plus an audio codec, a formal prover model, and a native multimodal architecture . Release tempo accelerated in Jan–Mar 2026 with 8 model drops in 10 weeks, then shifted to infrastructure and benchmark releases (SGLang-FluentLLM P18, WBench E20, LARYBench E25, General365 E25) through May. The most significant strategic shift is the LongCat-2.0-Preview launch: trillion-parameter scale, API-only distribution, zero open-source artifacts — breaking with the MIT-licensed, GitHub+HuggingFace pattern of every prior release W5. This bifurcation suggests LongCat is separating a commoditized open-weight portfolio (Flash family) from a proprietary frontier tier (2.0 line) with gated access.

Research themes

1. Benchmark-as-capability-signal: LongCat has created at least 11 bespoke benchmarks: Meeseeks (instruction following with self-correction loops) P2, VitaBench/ICLR 2026 (LLM agent tool use) P7, R-HORIZON/ICLR 2026 (long-horizon reasoning breadth/depth) P8, UNO-Bench (omni-modal compositional law) P9, AMO-Bench (high school math competition) P11, SOP-Maze (business SOP execution) P6, PIEbench (multilingual regional QA) P15, General365 (K-12 general reasoning) P25, LARYBench (vision-to-action alignment) P26, MineExplorer (MLLM agent open-world exploration) P1, and WBench (interactive video world model evaluation) E20 W3. This volume suggests the lab is investing in shaping evaluation standards as a moat and recruiting signal.

2. Long-horizon and multi-step reasoning: R-HORIZON P8 and MineExplorer P1 both probe whether models degrade over extended trajectories with hidden prerequisites. The finding is consistent across both: "strong models handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories" P1; "even the most advanced LRMs suffer significant performance degradation when facing interdependent problems that span long reasoning horizons" P8.

3. Native multimodal architectures: LongCat-Next introduces the DiNA (Discrete Native Autoregressive) architecture that unifies vision and speech in a single discrete token space W2 P22. The LongCat-Next-inference repo details a three-stage pipeline (input encoding → core LLM inference → multimodal output generation) with unified hidden state interface and decoupled generation P23.

4. Model–system co-design for MoE inference: SGLang-FluentLLM is an open-sourced inference engine built on SGLang with speculative decoding (Eagle/MTP/PLD), CUDA graph fusion for target+verify+draft, layer-wise KVCache transfer with overlap scheduling, and Decode Radix Tree Cache P18. Kernel-level work includes FlashMLA FP8 KVCache optimizations, DeepGemm SwapAB, and communication-computation fused kernels in FlashInfer P18.

5. Formal mathematics and agentic reasoning: LongCat-Flash-Prover targets "Native Formal Reasoning in Lean4 for Mathematics Formalization and Proving through agentic tool-integrated reasoning" P21. DPT-Agent (ACL 2025) applies dual process theory to human-AI collaboration frameworks P13.

6. Video generation with temporal coherence: LongCat-Video is built on DiT architecture at 13.6B parameters, with native Video-Continuation as a pretraining task rather than inference-time stitching W6 P12. Includes LongCat-Video-Avatar 1.5 for avatar generation P12.

Hiring & scaling

The evidence for hiring is thin: only one reference to a LongCat AI Model Internship Program framed around "cultivating AGI talent" W1. No job descriptions, engineering role titles, team structures, location hubs, or compensation data appear in this pack. The lab's shipping output implies substantial engineering and research headcount across text, vision, audio, systems, and evaluation teams, but the evidence cannot substantiate scale, org chart, or geographic distribution. The internship program signal W1 paired with the lab's heavy academic benchmarking output (multiple ICLR 2026 acceptances P7 P8, ACL 2025 P13) suggests recruitment through academic pipelines and benchmark leadership rather than public job postings.

Category implications

Infrastructure: The SGLang-FluentLLM release P18 and DeepSeek fork cluster E46 E50 E51 E52 indicate LongCat is building a vertically integrated inference stack for large MoE models. Implications: (a) reduced dependency on commercial inference providers; (b) custom kernel work (FlashMLA FP8, DeepGemm SwapAB) suggests bespoke hardware utilization strategies that may not generalize to other model architectures; (c) the use of Dynamo for KVCache-aware scheduling P18 implies production-scale serving infrastructure beyond research demos.

Product: The LongCat-2.0-Preview API-only launch W5 signals a commercial product strategy distinct from the open-weight Flash line. The trillion-parameter scale and gated API access imply the lab is targeting enterprise or platform customers willing to pay for frontier capability, while using the open-source Flash family for developer ecosystem building and talent recruitment.

Research: The benchmark portfolio reveals a research organization that evaluates systematically before building. VitaBench is explicitly positioned as a "definitive benchmark for tool-use performance" and is cited by Qwen and ByteDance Seed teams P7. AMO-Bench tracks token efficiency alongside accuracy P11. General365 uses a held-out test set to detect contamination P25. These design choices imply an organization that treats evaluation as a strategic capability rather than an afterthought.

Hiring: Thin evidence precludes strong conclusions, but the internship program W1 and academic benchmark leadership (ICLR 2026 acceptances) suggest a university-talent pipeline strategy. The absence of public senior engineering roles may indicate reliance on internal Meituan engineering talent or quiet direct recruiting.

GTM: The bifurcation between open-weight MIT-licensed models (Flash-Chat, Flash-Thinking, LongCat-Video, LongCat-Image, LongCat-AudioDiT, LongCat-Next) and the API-gated LongCat-2.0-Preview W5 suggests a hybrid GTM: open-source for ecosystem adoption and talent signaling, proprietary API for revenue. No pricing, customer, or partnership data is cited.

Traction highlights

LongCat-Video: 4,273 GitHub stars, 669 forks P12 — the highest community traction of any LongCat repo by a wide margin.
LongCat-Flash-Chat: 1,339 GitHub stars P4, 78,653 HF downloads, 535 HF likes E1 — the highest download count in the portfolio.
LongCat-Image: 695 stars P14; Image-Edit variant reached 26,570 HF downloads E5; Image-Edit-Turbo reached 23,114 downloads E11.
LongCat-AudioDiT: 522 stars P24; combined 2,864 downloads across 1B and 3.5B variants E10 E13.
LongCat-Flash-Omni: 492 stars P10.
LongCat-Next: 438 stars P22, 179 HF likes E6 within ~2.5 months of release.
LongCat-Audio-Codec: 301 stars P5.
LongCat-Flash-Thinking: 285 stars P3.
LongCat-Flash-Thinking-2601: 254 stars P16.
VitaBench: 145 stars P7; cited by Qwen3.5 and ByteDance Seed2.0 P7; accepted ICLR 2026.
LARYBench: 150 stars P26.
AMO-Bench: 128 stars P11.
SGLang-FluentLLM: 83 stars P18.
WBench: 157 stars on launch E20; external coverage positioning it as first systematic multi-round interactive world model benchmark W3.