Arcee AI's Trinity-Large-Thinking Is Now the Strongest Open Reasoning Model Built Outside China

Arcee AI released Trinity-Large-Thinking on April 1, 2026 — a 399-billion-parameter sparse Mixture-of-Experts reasoning model under the Apache 2.0 license. That combination of scale, open weights, and a permissive license is effectively unprecedented outside Chinese frontier labs.

The numbers are hard to ignore. Trinity-Large-Thinking ranks second on PinchBench, behind only Claude Opus 4.6. It costs $0.90 per million output tokens — roughly 96% cheaper than Opus 4.6 — and fits 262,000 input tokens with up to 80,000 output tokens. Despite the enormous parameter count, only around 13 billion activate per token, so inference remains feasible on current hardware.

What “thinking” means here

The model uses an extended chain-of-thought mechanism baked into its training, not bolted on as a prompt wrapper. It’s designed specifically for long-horizon agentic work: multi-turn tool calls, repository-level coding, and complex reasoning loops that break under shorter context windows. Most open models optimized for coding or reasoning cap out at single-turn tasks. Trinity-Large-Thinking is built to run a full agent workflow from start to finish without losing coherence.

Arcee trained it in a single 33-day run on 2,048 NVIDIA B300 Blackwell GPUs. The compute cost was approximately $20 million — nearly half the company’s total funding to date. That’s a concentrated bet for a startup competing against labs with billions in capital.

Why the Apache license matters

Most powerful open-weight models ship under custom licenses that restrict commercial use, fine-tuning, or redistribution. Apache 2.0 has none of those caveats. Enterprises can deploy Trinity-Large-Thinking in production, modify it, and use it in commercial products without negotiating terms. That’s the same freedom that made Llama 2’s restricted license controversial and made Mistral’s MIT-licensed models popular immediately.

The weights are available on Hugging Face. Arcee also runs the model through its own API at the $0.90 per million output token rate.

The competitive context

The “strongest open model outside China” qualifier matters because DeepSeek R2 and Qwen models have quietly led open-weight benchmarks for months. Trinity-Large-Thinking is Arcee’s argument that US-built open models can close that gap. Whether it holds as Chinese labs continue releasing is an open question — but it’s a credible stake in the ground.

For teams currently paying Anthropic or OpenAI for reasoning tasks, Trinity-Large-Thinking is worth a serious evaluation. The benchmark gap to frontier proprietary models is narrow. The cost gap is not.