Cursor Composer 2 is Anysphere’s first in-house frontier AI model, released March 19, 2026, built specifically for autonomous project-scale coding inside Cursor IDE. Priced at $0.50/M input tokens — 86% cheaper than its predecessor — it outperforms Claude Opus 4.6 on Terminal-Bench 2.0 while being the only frontier coding model that runs exclusively inside an IDE rather than as an external API.

What Is Cursor Composer 2?

Cursor Composer 2 is the first proprietary AI model built by Anysphere (Cursor’s parent company), released March 19, 2026, marking a fundamental shift from being a model-agnostic IDE to owning the full AI stack. Unlike general-purpose models accessed via API, Composer 2 was trained end-to-end for autonomous coding workflows inside Cursor — with native understanding of file trees, shell sessions, browser control, and multi-step diffs. The model ships with a 200K token context window, a Mixture-of-Experts (MoE) architecture for fast inference, and a novel compaction-in-the-loop reinforcement learning technique that reduces context memory errors by 50%. This is Cursor’s third Composer generation in just five months — v1 launched October 2025, v1.5 in February 2026, v2 in March 2026 — signaling an aggressive model development timeline rarely seen outside OpenAI or Anthropic. The practical result: Composer 2 handles workflows that require hundreds of sequential actions without losing thread, applying real file diffs rather than just suggesting code snippets.

Two availability tiers exist: Standard ($0.50/M input, $2.50/M output) and Fast ($1.50/M input, $7.50/M output, default for most plans). Both include cache-read discounts — $0.20/M (Standard) and $0.35/M (Fast).

Benchmark Deep Dive: Composer 2 vs Claude Opus 4.6 vs GPT-5.4

Cursor Composer 2 scores 61.7 on Terminal-Bench 2.0, 73.7 on SWE-bench Multilingual, and 61.3 on CursorBench — the three benchmarks Anysphere uses to measure agentic coding performance across real-world tasks. Terminal-Bench 2.0 simulates autonomous shell workflows (debugging, build pipelines, deployment scripts), where Composer 2 beats Claude Opus 4.6’s score of 58.0 by a meaningful margin. SWE-bench Multilingual tests multi-language bug fixing across Python, TypeScript, Java, and Go — Composer 2’s 73.7 represents a 12% jump over Composer 1.5’s 65.9. CursorBench, Anysphere’s proprietary benchmark measuring IDE-native task completion, shows the largest gain: from 44.2 in v1.5 to 61.3 in v2, a 38.7% improvement reflecting specific training on Cursor workflows. GPT-5.4 still leads Terminal-Bench 2.0 at 75.1, and Claude Opus 4.5 leads SWE-bench Verified at 80.9% — but Composer 2 is approximately 5x cheaper on input tokens than GPT-5.4.

ModelTerminal-Bench 2.0SWE-benchCursorBenchInput Price
GPT-5.475.1~$2.50/M
Cursor Composer 261.773.761.3$0.50/M
Claude Opus 4.658.0$15/M
Composer 1.547.965.944.2$3.50/M

The critical insight: Composer 2 is specifically optimized for IDE-integrated tasks — tasks that involve chained file reads, terminal execution, and browser checks — rather than isolated coding puzzles. This means raw SWE-bench or HumanEval scores underrepresent its real-world advantage for developers who work inside Cursor all day.

Architecture & Training: MoE + Compaction-in-Loop RL

Cursor Composer 2 uses a Mixture-of-Experts (MoE) architecture built on continued pretraining of Kimi K2.5 — Moonshot AI’s open-source base model — with Cursor’s own pretraining pass on high-quality code corpora, followed by reinforcement learning tuned for long-horizon agentic tasks. This marks the first time Anysphere did pretraining before RL, departing from the standard approach of fine-tuning an existing general model. The MoE design activates only a subset of parameters per input token, keeping inference latency low even as the overall model capacity stays large. The most technically significant training innovation is compaction-in-the-loop RL: the model is trained to self-summarize its working context from 5,000+ tokens down to approximately 1,000 tokens, then evaluated on whether it can continue a long-horizon task correctly after that compression. This self-summarization training reduced context compaction errors by 50% compared to Composer 1.5, enabling reliable execution of workflows spanning hundreds of individual actions — opening files, running tests, reading error output, editing, committing — without losing the thread of what was originally requested.

What Is Compaction-in-Loop RL?

Compaction-in-loop reinforcement learning is a training technique where the model learns both to summarize its own context and to continue tasks correctly after that summarization — rewarded when downstream task completion remains accurate despite compression. Standard RL for coding trains a model to produce correct code given a fixed context; compaction-in-loop RL trains the model to manage its own memory. In Composer 2, this means the model compresses 5,000+ tokens of chat history and tool outputs to ~1,000 tokens mid-task, without requiring the user to start a new session or losing awareness of earlier decisions. The result is a 50% reduction in compaction errors versus Composer 1.5 and reliable multi-hour autonomous coding sessions.

Why MoE Matters for IDE Performance

Mixture-of-Experts architecture matters for IDE use because latency is the primary usability metric when an AI model is making sequential tool calls — reading files, running shell commands, editing code — rather than generating a single long response. MoE activates only the expert layers relevant to the current input, reducing compute per forward pass. For Composer 2, this means Fast variant responses arrive quickly enough to maintain flow state during agentic loops, even when the model is orchestrating multi-file refactors.

Pricing Breakdown: Standard vs Fast Variants & Plan Tiers

Cursor Composer 2 pricing is 86% cheaper than Composer 1.5 across both variants, making frontier-level agentic coding economically viable for individual developers for the first time. The Standard variant costs $0.50/M input tokens and $2.50/M output tokens, with cache-read pricing at $0.20/M. The Fast variant — the default for most plan types — costs $1.50/M input and $7.50/M output, with cache reads at $0.35/M. Composer 1.5 was priced at $3.50/M input and $17.50/M output, making the per-task cost reduction dramatic for agentic workflows that consume large input token volumes. Cursor’s plan structure determines how Composer 2 is accessed: Hobby (free, limited), Pro ($20/month), Pro+ ($60/month), Teams ($40/user/month), and Enterprise (custom). On paid plans, Composer usage is drawn from a dedicated token pool, and Auto mode is unlimited. For developers running long agentic loops — autonomous PR generation, multi-file refactors — the Standard variant at $0.50/M makes the cost of a typical 50K-token task roughly $0.025, a fraction of what comparable GPT-5.4 usage would cost.

PlanMonthly PriceComposer 2 Access
HobbyFreeLimited pool
Pro$20/moGenerous dedicated pool
Pro+$60/moLarger pool, priority Fast
Teams$40/user/moTeam pool, admin controls
EnterpriseCustomUnlimited + SAML/SCIM

Composer 2 vs Composer 1.5: What Changed

Composer 2 delivers a 38.7% improvement on CursorBench (44.2 → 61.3), a 28.8% gain on Terminal-Bench 2.0 (47.9 → 61.7), and an 11.8% jump on SWE-bench Multilingual (65.9 → 73.7) over Composer 1.5 — while simultaneously cutting prices by 86%. The capability gains come from three architectural changes: full pretraining before RL (v1.5 only did RL), improved compaction-in-loop self-summarization, and a larger high-quality code corpus during pretraining. The pricing drop reflects the cost efficiency of MoE inference at scale: as Anysphere grew to 1M+ daily active users, per-token inference costs fell, enabling them to pass savings to users while launching at competitive API pricing. For developers who used Composer 1.5 regularly, v2 handles longer autonomous tasks without going off-track, makes fewer redundant tool calls, and produces diffs that require less correction — the CursorBench score improvement directly reflects reduced need for human re-prompting during a task.

Self-Summarization Improvements

Composer 1.5 introduced self-summarization but lacked compaction-in-the-loop RL training. In practice, this meant v1.5 would compress context but sometimes lose track of constraints set earlier in a session — leading to regressions in long tasks. Composer 2’s compaction-in-loop RL explicitly trains against this failure mode, rewarding the model when it maintains correct task completion post-compression. The 50% error reduction in compaction shows up in practice as fewer mid-task “forgot what you asked” moments during complex multi-file refactors.

Competitor Comparison: Composer 2 vs Claude Code vs Copilot vs OpenAI Codex

Cursor Composer 2 occupies a unique position in the 2026 AI coding landscape: it’s the only frontier-level coding model embedded exclusively inside an IDE rather than accessible as an external API. Claude Code (Anthropic’s CLI agent) leads developer satisfaction at 46% in DataCamp’s 2026 survey versus Cursor’s 19%, but operates via terminal rather than through a visual editor. GitHub Copilot Enterprise integrates into VS Code and JetBrains but relies on third-party models (GPT-5.4, Claude, Gemini) rather than a proprietary coding model. OpenAI Codex Cloud Agent runs autonomously on GitHub repos but lacks the real-time IDE interaction layer that Composer 2 offers. The practical choice depends on your workflow: Composer 2 wins for developers who live in Cursor and want seamless multi-file agentic editing; Claude Code wins for terminal-centric developers who prefer fine-grained control over agentic steps; Copilot Enterprise wins for teams already standardized on VS Code with strict procurement requirements.

ToolModelIDEAPI AccessAgenticInput Price
Cursor Composer 2ProprietaryCursor onlyNoYes$0.50/M
Claude CodeClaude Opus 4.6Terminal/VS CodeYesYes$15/M
GitHub CopilotGPT-5.4 / ClaudeVS Code / JBYesLimitedSubscription
OpenAI Codex CloudGPT-5.4GitHub webYesYes~$2.50/M

The Kimi K2.5 Question: Transparency and Model Provenance

Cursor Composer 2 was built on Kimi K2.5 — Moonshot AI’s open-source base model — as approximately 25% of its computational foundation, with Cursor’s own continued pretraining and reinforcement learning layered on top. The controversy began when a developer discovered the underlying model ID in API response headers shortly after Composer 2’s launch, before Anysphere had disclosed the base model publicly. Cursor co-founder Aman Sanger acknowledged the omission, calling the non-disclosure “a miss.” Lee Robinson (VP of Developer Education) clarified that roughly 25% of Composer 2’s computational foundation derives from the Kimi K2.5 architecture, with the remaining 75% representing Cursor’s own pretraining corpus and RL training. For developers evaluating enterprise use, this matters primarily for supply-chain trust and compliance contexts — Kimi K2.5 is MIT-licensed open-source, meaning Cursor’s use is fully legal, but the lack of proactive disclosure raised questions about how AI companies communicate model provenance as proprietary models increasingly build on open-source foundations. The technical reality is that the combination of Cursor’s pretraining pass and compaction-in-loop RL produces a model that performs meaningfully differently from Kimi K2.5 alone — but transparency norms are still evolving.

Long-Horizon Agentic Coding: Self-Summarization and Multi-Step Workflows

Long-horizon agentic coding refers to AI-driven workflows that span hundreds of sequential tool calls — reading files, editing code, running tests, checking output, iterating — without human re-prompting at each step. Cursor Composer 2 is specifically designed for this use case, with compaction-in-the-loop RL enabling multi-hour autonomous sessions. In a typical long-horizon task, Composer 2 might: read the project structure, identify the affected files for a feature request, write and apply diffs, run the test suite, read failure output, patch the failing tests, commit with a descriptive message, and open a draft PR — all in one session. The 200K context window provides the raw capacity; compaction-in-loop RL provides the memory management to use that capacity without degrading mid-task. Cursor’s Automations system (launched March 5, 2026) extends this further with trigger-based agents that fire on events like incoming PRs or failing CI tests, running Composer 2 autonomously without a human initiating the session. The self-driving codebase research mode (announced February 5, 2026) adds multi-agent coordination for large-scale architectural analysis.

Setting Up Long-Horizon Agentic Sessions

To run a long-horizon agentic task with Composer 2, open Composer (Cmd+I / Ctrl+I), ensure you’re not in Auto mode if you want Composer 2 specifically, and set the scope to include your full project directory. For tasks expected to exceed 50K context tokens, the Standard variant is more cost-effective; for latency-sensitive tasks where each tool call response time matters, use Fast. Cursor will automatically apply compaction when context approaches limits — you’ll see a brief “Summarizing context…” indicator before the session continues.

Cursor Plan Guide: Which Plan Is Right for You?

Cursor’s plan structure determines how much Composer 2 access you get and at what cost. The Hobby plan (free) provides limited Composer 2 usage per month — enough to evaluate the model but not for production workflows. Pro ($20/month) includes a generous dedicated Composer pool, making it the right choice for individual developers who use Cursor as their primary IDE. Pro+ ($60/month) increases the pool size and prioritizes Fast variant responses — worth it for developers who run multiple agentic loops per day. Teams ($40/user/month) adds admin controls, shared billing, and team usage dashboards. Enterprise (custom pricing) unlocks unlimited usage, SAML/SCIM integration, audit logs, and dedicated support — required for companies in regulated industries or with strict data residency requirements.

For most professional developers, Pro at $20/month is the right entry point. The token pool is large enough for typical daily usage — several agentic sessions of moderate complexity — and Auto mode on Pro uses Composer 2 within the plan allocation before falling back to other models.

Getting Started with Cursor Composer 2

Getting started with Composer 2 requires a Cursor IDE installation (download from cursor.com) and any paid plan. After installing and logging in, open Composer via Cmd+I (Mac) or Ctrl+I (Windows/Linux). By default, the Auto mode selects the best model for each task — to use Composer 2 explicitly, click the model selector in the Composer panel and choose “Composer 2” or “Composer 2 Fast.” For agentic tasks, enable “Agent mode” in the Composer settings — this allows Composer 2 to make file edits, run shell commands, and browse the web without requiring approval for each action. For most workflows, set the project root as the context scope so Composer 2 can navigate the full file tree. Start with well-defined tasks (“add pagination to the posts endpoint, write tests, update the API docs”) before moving to open-ended agentic sessions.

Known Limitations and Honest Assessment

Cursor Composer 2 has real limitations that matter for honest evaluation. First, it is exclusively available inside Cursor IDE — there is no external API, meaning you cannot call Composer 2 from your own tooling, CI pipelines, or other editors. Second, GPT-5.4 still leads Terminal-Bench 2.0 at 75.1 versus Composer 2’s 61.7 — for tasks that require the highest possible single-pass accuracy on complex shell workflows, GPT-5.4 remains ahead. Third, Claude Opus 4.5 leads SWE-bench Verified at 80.9%, making it more accurate for multi-language bug-fix tasks when accessed via Claude Code. Fourth, the Kimi K2.5 provenance controversy has not fully resolved trust concerns for some enterprise procurement teams. Fifth, the 45% security vulnerability rate observed in AI-generated code industry-wide applies to Composer 2 output as well — autonomous agentic sessions require human review before merging to production branches.

Security, Enterprise, and Compliance

Cursor is SOC 2 Type II certified and supports privacy mode, making it enterprise-ready for teams that handle sensitive codebases. In privacy mode, no code, prompts, or model outputs are stored, logged, or used for model training — a hard requirement for healthcare, finance, and legal engineering teams. Enterprise plans add SAML-based single sign-on, SCIM provisioning for automated user lifecycle management, audit logs for all AI interactions, and custom data residency options. For teams in regulated industries, these controls make Cursor Enterprise the appropriate deployment path rather than stacking individual Pro subscriptions. Cursor’s compliance posture is comparable to GitHub Copilot Enterprise — both SOC 2 Type II certified, both with audit logging — but Composer 2’s proprietary model means Cursor controls where model inference occurs (currently US-based datacenters), whereas Copilot routes through OpenAI or Anthropic infrastructure depending on the selected model. Enterprise pricing is custom; most teams in regulated sectors negotiate on data residency and BAA terms.

The Vibe Coding Debate: Power vs Responsibility

Vibe coding — using AI to generate large amounts of code based on natural language descriptions without deep review — becomes more powerful and more risky with Composer 2’s agentic capabilities. Industry data shows 45% of AI-generated code samples contain security vulnerabilities, with Java faring worst at 72% security failure rate. Composer 2’s ability to autonomously open PRs, apply multi-file diffs, and run shell commands amplifies this risk: a poorly scoped agentic session can produce a large code change with embedded vulnerabilities that pass basic tests but fail security review. The responsible workflow: run Composer 2 agentic sessions in isolated branches, require security-focused code review for all AI-generated PRs, and never give Composer 2 access to production credentials or deployment pipelines. Cursor’s privacy mode and audit logs are helpful for enterprise teams that need to track what the model touched.

Future Outlook: Where Composer Goes Next

Cursor’s Glass alpha — a self-driving codebase research mode that runs multi-agent loops for large-scale architectural analysis — signals the next direction for Composer development. The Automations system (trigger-based agents firing on PR events or failing tests) is in early rollout and likely to expand. MoE architecture improvements and continued pretraining will produce a Composer 3 within the same five-month cadence, given the v1→v1.5→v2 timeline. The deeper strategic question is whether Anysphere can close the gap with GPT-5.4 on Terminal-Bench while maintaining the $0.50/M pricing that makes Composer 2 compelling — closing a 13-point benchmark gap while improving cost efficiency would make Composer 3 the dominant frontier coding model by end of 2026. The multi-agent coordination capabilities already present in Glass and Automations suggest Anysphere’s priority is agentic task complexity rather than raw single-pass accuracy.

FAQ

Below are the five questions developers most commonly ask when evaluating Cursor Composer 2. Each answer is self-contained and written to be useful without reading the full article above. The key takeaways: Composer 2 is priced at $0.50/M input, is 86% cheaper than v1.5, beats Claude Opus 4.6 on Terminal-Bench 2.0, and is exclusively available inside Cursor IDE with no external API. It was released March 19, 2026, by Anysphere — the company behind Cursor — valued at $29.3 billion at time of launch. It is built on Kimi K2.5 as a base with Cursor’s own continued pretraining and compaction-in-loop reinforcement learning layered on top. For developers choosing between Composer 2 and alternatives like Claude Code or GitHub Copilot, the decision comes down to whether you work primarily inside Cursor versus a terminal or VS Code environment. Cursor has 1M+ daily active users and 50,000+ enterprise customers including Stripe, Figma, Salesforce, and NVIDIA.

What is Cursor Composer 2?

Cursor Composer 2 is Anysphere’s first proprietary AI model, released March 19, 2026. It is a frontier coding model built exclusively for the Cursor IDE, trained for autonomous project-scale coding with a 200K context window and MoE architecture.

How much does Cursor Composer 2 cost?

Composer 2 Standard costs $0.50/M input tokens and $2.50/M output tokens. The Fast variant costs $1.50/M input and $7.50/M output. Cache reads are $0.20/M (Standard) and $0.35/M (Fast). This is 86% cheaper than Composer 1.5.

How does Cursor Composer 2 compare to Claude Code?

Composer 2 beats Claude Opus 4.6 on Terminal-Bench 2.0 (61.7 vs 58.0) but Claude Code leads developer satisfaction surveys at 46% vs Cursor’s 19%. Claude Code works in the terminal and via VS Code extension; Composer 2 is exclusive to Cursor IDE. Claude Code offers direct API access; Composer 2 does not.

What is compaction-in-loop reinforcement learning?

Compaction-in-loop RL is a training technique where the model learns to compress its own context mid-task and continue correctly after that compression. Composer 2 compresses 5,000+ tokens to ~1,000 mid-session, reducing compaction errors by 50% and enabling multi-hour autonomous coding sessions.

Is Cursor Composer 2 built on Kimi K2.5?

Yes. Composer 2 is built on Kimi K2.5 (Moonshot AI’s open-source model) as approximately 25% of its computational foundation, with Cursor’s own continued pretraining and reinforcement learning forming the remaining 75%. Cursor did not disclose this proactively at launch, acknowledging the omission after a developer discovered it in API headers.