Devstral 2: Open-Weights Coding Model for Cline
Mistral AI's Devstral 2 hits 72.2% on SWE-bench Verified and integrates directly into Cline. Free during launch, then 7x cheaper than Claude Sonnet on real-world coding tasks.
TL;DR
- Mistral AI released Devstral 2, a 123B open-source model hitting 72.2% on SWE-bench Verified
- Matches closed-model performance on agentic coding tasks while being 7x cheaper than Claude Sonnet
- Free during launch period; integrates directly into Cline with one-click setup
What Dropped
Mistral AI released Devstral 2 this week—a 123B open-weights model purpose-built for agentic coding workflows. It comes in two sizes: Devstral 2 (123B) and Devstral Small 2 (24B), both with 256K context windows and permissive open-source licenses. The model is free during the launch period, then $0.40/$2.00 per million tokens (input/output) for Devstral 2, and $0.10/$0.30 for the smaller variant.
The Dev Angle
Devstral 2 scores 72.2% on SWE-bench Verified—near parity with Claude Sonnet 4.5—while being up to 7x more cost-efficient on real-world tasks. The 256K context window enables architecture-level reasoning across large codebases, meaning the model can track state across multiple files, understand framework dependencies, and retry with corrections when something fails. Human evaluations show it beating DeepSeek V3.2 with a 42.8% win rate.
Tool-calling success rates match the best closed models, which translates to fewer failed attempts and cleaner execution in multi-step workflows. Devstral Small 2 (24B) scores 68.0% on SWE-bench Verified and runs on consumer hardware including GeForce RTX, making it viable for local deployment or resource-constrained environments.
Setup in Cline is trivial: open Settings, select Cline as your API provider, pick mistralai/devstral-2512:free from the model dropdown. Done. The model handles multi-file edits and systematic refactoring particularly well.
Should You Care?
If you're running Cline and want a powerful open-source alternative to closed models, Devstral 2 is worth testing immediately—it's free right now. The cost advantage ($0.40/$2.00 vs. Claude's higher rates) compounds fast on long-running agentic tasks.
If you need local inference or fine-tuning for proprietary codebases, the open-weights approach means you can run completely private infrastructure. If you're already happy with your current model and cost isn't a concern, there's no urgent reason to switch.
The real opportunity: compact models that match larger competitors change the deployment equation. You can now run serious agentic coding on accessible hardware instead of massive cloud clusters. For teams building internal AI tooling, this opens up self-hosted options that weren't viable six months ago.
Source: Cline