What does this model signal mean?

Arcee AI published arcee-ai/Trinity-Large-Thinking. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 8.5K HF downloads · Large reasoning language model by Arcee AI.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Arcee AI Model: arcee-ai/Trinity-Large-Thinking

Captured source

source ↗

Hugging Face/huggingface.co/arcee-ai/Trinity-Large-Thinking

arcee-ai/Trinity-Large-Thinking model card

Source ↗

published Apr 1, 2026seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherlibrary transformersparams 399Bdownloads 8.5klikes 185

Trinity-Large-Thinking

Introduction

Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family — a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. Built on Trinity-Large-Base and post-trained with extended chain-of-thought reasoning and agentic RL, Trinity-Large-Thinking delivers state-of-the-art performance on agentic benchmarks while maintaining strong general capabilities.

Trinity-Large-Thinking generates explicit reasoning traces wrapped in ... blocks before producing its final response. This thinking process is critical to the model's performance — thinking tokens must be kept in context for multi-turn conversations and agentic loops to function correctly.

Try it at chat.arcee.ai

More details on the training of Trinity Large are available in the technical report.

Key Highlights

Agentic-first design: Purpose-built for tool calling, multi-step planning, and agent workflows
State-of-the-art agentic performance: 94.7% on τ²-Bench, 91.9% on PinchBench, 98.2% on LiveCodeBench
Native reasoning traces: Extended chain-of-thought via ... blocks
Compatible with major agent frameworks: Works out of the box with OpenClaw and Hermes Agent
Ready to use on [OpenRouter](https://openrouter.ai/): No setup required — full reasoning and tool calling support via API

Model Variants

The Trinity Large family consists of four checkpoints:

Trinity-Large-Thinking (this release): Reasoning-optimized, agentic post-training with extended chain-of-thought
[Trinity-Large-Preview](https://huggingface.co/arcee-ai/Trinity-Large-Preview): Lightly post-trained, chat-ready instruct model (no reasoning_content).
[Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase): 10T-token pre-anneal pretraining checkpoint
[Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base): Full 17T-token pretrained foundation model with mid-training anneals

Architecture

Trinity-Large-Thinking shares the same sparse MoE architecture as Trinity-Large-Preview.

| Hyperparameter | Value | |:---|:---:| | Total parameters | ~398B | | Active parameters per token | ~13B | | Experts | 256 (1 shared) | | Active experts | 4 | | Routing strategy | 4-of-256 (1.56% sparsity) | | Dense layers | 6 | | Pretraining context length | 8,192 | | Context length after extension | 512k | | Architecture | Sparse MoE (AfmoeForCausalLM) |

Benchmarks

!Benchmark charts

| Benchmark | Trinity-Large-Thinking | Opus-4.6 | GLM-5 | MiniMax-M2.7 | Kimi-K2.5 | |---|---:|---:|---:|---:|---:| | IFBench | 52.3 | 53.1 | 72.3 | 75.7 | 70.2 | | GPQA-Diamond | 76.3 | 89.2 | 81.6 | 86.2 | 86.9 | | Tau2-Airline | 88.0 | 82.0 | 80.5 | 80.0 | 80.0 | | Tau2-Telecom | 94.7 | 92.1 | 98.2 | 84.8 | 95.9 | | PinchBench | 91.9 | 93.3 | 86.4 | 89.8 | 84.8 | | AIME25 | 96.3 | 99.8 | 93.3 | 80.0 | 96.3 | | BCFLv4 | 70.1 | 77.0 | 70.8 | 70.6 | 68.3 | | MMLU-Pro | 83.4 | 89.1 | 85.8 | 80.8 | 87.1 | | SWE-bench Verified* | 63.2 | 75.6 | 72.8 | 75.4 | 70.8 |

*All models evaluated in mini-swe-agent-v2

Thinking-in-Context: Important Usage Note

Trinity-Large-Thinking produces reasoning traces inside ... blocks before generating its final response.

This means:

1. Multi-turn conversations: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns. 2. Agentic loops: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves reasoning in the message history between steps. 3. Context window management: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.

How thinking works

The model reasons internally before producing its response. When served via vLLM, the reasoning is separated into a dedicated field in the API response:

// API response structure
{
"message": {
"role": "assistant",
"reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price ...` tags during tokenization, maintaining the model's chain-of-thought across turns.

**What happens if reasoning is omitted entirely?** The model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve `reasoning_content` and use `""` instead of `null` for content on tool-call turns.

For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and Python/TypeScript examples, see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces).

## Training Configuration

### Pretraining

- Training tokens: 17 trillion
- Data partner: [Datology](https://www.datologyai.com/)

### Posttraining

- Instruction tuning and agentic RL with extended chain-of-thought
- Trained on tool-calling trajectories, multi-step agent tasks, and reasoning chains

### Infrastructure

- Hardware: 2,048 NVIDIA B300 GPUs
- Parallelism: HSDP + Expert Parallelism
- Compute partner: [Prime Intellect](https://www.primeintellect.ai/)

## Usage

### Running our model

- [vLLM](#vllm) (recommended for agentic deployments)
- [Transformers](#transformers)
- [API](#api)

### vLLM

Supported in vLLM 0.11.1+. For agentic use with both reasoning and tool calling:

vllm serve arcee-ai/Trinity-Large-Thinking \ --dtype bfloat16 \ --reasoning-parser deepseek_r1 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder

**Recommended inference settings**: `temperature=0.45–0.6`, `top_p=0.95`, `top_k=50`

This configuration:
- `--reasoning-parser deepseek_r1` — Parses `...` reasoning blocks and exposes them via the `reasoning_content` field in the API response
- `--tool-call-parser qwen3_coder` — Parses structured...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

13.5k downloads for a model release, moderate traction