Devstral Small 2

Devstral Small 2 is a 24B-parameter coding model from Mistral AI that scores 68% on SWE-bench Verified and runs on a single 24GB GPU or a Mac M-series with 32GB unified memory — making it the first cloud-grade coding agent most developers can realistically self-host. This guide covers three setup paths: Ollama for beginners, vLLM for production teams, and llama.cpp for CPU-only or low-VRAM machines. What Is Devstral Small 2? Devstral Small 2 is Mistral AI’s open-weight coding specialist, released December 10, 2025 under the Apache 2.0 license. With 24 billion parameters and a 256K-token context window, it achieves 68.0% on SWE-bench Verified — a real-world benchmark measuring a model’s ability to resolve open GitHub issues autonomously. That puts it on par with models up to five times its parameter count, including closed-source proprietary systems. Because it ships under Apache 2.0, you can run it locally with no API fees, no data leaving your machine, and no usage restrictions — even in commercial projects. The model is fine-tuned specifically on agentic coding workflows: reading multi-file codebases, writing patches, running tool calls, and self-correcting from test failures. Devstral Small 2 outperforms Qwen 3 Coder Flash (30B) despite being a smaller model, and its larger sibling Devstral 2 (123B) hits 72.2%, compared to Claude Sonnet 4.5’s 77.2% — at up to 7x lower cost per coding task. For teams or individuals who need a capable coding agent without cloud dependency, Devstral Small 2 is the most practical choice available today. ...