Amp Code Review 2026: Sourcegraph's Autonomous Coding Agent Tested

Fri, 08 May 2026 00:00:00 +0000

Amp Code Review 2026: Sourcegraph’s Autonomous Agent Explained

Sourcegraph’s Amp has crossed a threshold that most AI coding tools are still approaching: it operates as a genuinely autonomous agent, not a glorified autocomplete engine. Within the first two months of 2026, over 40,000 development teams adopted Amp as their primary agentic coding workflow — a growth rate that puts it firmly in the same conversation as Cursor and Claude Code. Amp plans multi-step tasks, edits files across your entire codebase, runs tests, interprets output, and iterates — without requiring you to break down every instruction into atomic prompts. Built on the foundation Sourcegraph developed for enterprise code intelligence, Amp ships as both a VS Code extension and a standalone CLI, giving developers full flexibility over where and how they work. The 200K token context window means Amp can hold an entire service’s worth of code in working memory simultaneously, which matters enormously once you start tackling refactors that span dozens of files. This review tests Amp’s real capabilities in 2026: what it does well, where it still has rough edges, and who should actually be using it.

Multi-Model Architecture: How Amp Routes Between Claude, Gemini, and GPT-5

Amp’s most technically distinctive feature is its multi-model routing layer, and in benchmark testing it demonstrably outperforms single-model agents on complex, mixed-task workloads by routing each subtask to the model best suited for it. The three-model stack currently in use is Claude Opus 4.5 for UI and front-end tasks, Gemini 3 for core code generation, and GPT-5 for deep reasoning and architectural analysis. This is not just marketing copy — Amp determines which model handles each part of a task automatically, based on the nature of the instruction and the type of output needed. For a user asking Amp to “refactor the authentication module and update the API documentation,” the routing engine may simultaneously run the refactor through Gemini 3 while channeling the documentation rewrite through Claude Opus 4.5. The 200K token context window operates across all three model backends, ensuring that context coherence is preserved even as different models handle different slices of the work. This architecture is a direct response to the reality that no single model dominates every dimension of software development in 2026 — reasoning quality, code generation speed, and instruction-following all favor different providers depending on the task. Amp’s bet is that orchestrating the best available model per subtask produces better overall results than committing to a single provider, and the evidence from production use supports that thesis.

Amp Tab and Oracle Mode: The Two Killer Features

Amp Tab is available for free to all users, and in developer surveys conducted in early 2026 it ranked as the most accurate compiler-error-aware autocomplete engine among free-tier offerings, beating GitHub Copilot Free and Cody Lite on correctness across a 1,200-task benchmark suite. Amp Tab is not a naive next-token predictor — it reads the compiler or linter’s current error state, factors in the file’s imports and surrounding type context, and generates completions that are already syntactically and semantically valid. For TypeScript developers who have spent years dismissing AI autocomplete as “smart guessing,” Amp Tab tends to change the calculus fast. Oracle Mode operates at the opposite end of the abstraction spectrum. Where Amp Tab works at cursor-level granularity, Oracle Mode is invoked when you need Amp to reason about your system at an architectural level — reviewing the overall design of a service, suggesting decomposition strategies for a monolith, or identifying coupling risks before a migration. Oracle Mode uses GPT-5 with high reasoning depth, producing the kind of structured, evidence-anchored analysis that used to require booking time with a senior architect. The combination of free granular completion through Amp Tab and on-demand architectural reasoning through Oracle Mode covers both the micro and macro of the software development process in a single tool, which is what makes the pairing notable rather than just the individual features in isolation.

Parallel Subagents: Running Multiple Refactors Simultaneously

Amp’s subagent system is the capability that most convincingly separates it from tools that describe themselves as “agentic” without running genuinely concurrent work. In a real-world refactor of a 180,000-line Go codebase, Amp spawned seven parallel subagents that simultaneously updated interface definitions, adjusted call sites, rewrote unit tests, and regenerated mocks — completing in eleven minutes what a single-threaded agent would have needed over an hour to process sequentially. Each subagent operates with its own isolated context window, its own access to shell commands and file editing tools, and its own execution thread. Amp’s orchestration layer handles dependency resolution: if subagent three needs the output of subagent one before it can proceed, the scheduler waits for the prerequisite before dispatching the dependent task. You can encourage parallel execution explicitly by phrasing instructions to mention subagents or by describing independent workstreams, but Amp will often decompose tasks into parallel branches on its own when it identifies that the subtasks have no mutual dependencies. This is the feature that makes Amp particularly compelling for large-scale migration projects — converting a service from one database ORM to another, updating all API endpoint signatures after a version bump, or propagating a security patch across a monorepo with dozens of packages. The constraint is that subagent parallelism works best when tasks are genuinely independent; Amp’s dependency tracking is solid, but it is not infallible, and on tightly coupled codebases it is worth reviewing the execution plan before letting Amp run.

Amp vs Cursor vs Claude Code: Which Agentic Tool Wins?

The honest answer in 2026 is that these three tools are built for different developer profiles, and declaring a universal winner obscures more than it clarifies. Cursor remains IDE-first by design: if you want the richest graphical interface, the most polished inline autocomplete experience, and a VS Code fork that feels native, Cursor Composer 2 (released April 2026, hitting 73.7 on SWE-bench Multilingual) is genuinely excellent. What Cursor does not do well is run unattended: independent testing consistently shows Cursor stalling at ambiguous decision points when left to execute overnight without supervision, whereas Amp and Claude Code both handle ambiguity more gracefully with either sensible defaults or explicit pause-and-ask behavior. Claude Code scores highest on single-model reasoning depth, achieving a 72.5% resolution rate on SWE-bench Verified as of March 2026, and it is around 5.5 times more token-efficient than Cursor for equivalent tasks. Its limitation is the single-model constraint: Claude Code routes everything through one model, which means performance ceilings are hard, and the $20/month base price applies before you factor in API token costs. Amp sits between the two on IDE polish but ahead of both on parallel execution and model flexibility. Its free tier has no hard token caps, which alone makes it the default choice for developers who want to explore agentic workflows without committing budget. For teams — not just individual developers — Amp’s collaboration features (thread sharing, leaderboards, AGENTS.md project rules) give it a structural advantage that neither Cursor nor Claude Code currently matches. If your team is running large-scale refactors, multi-file migrations, or any workflow where parallel workstreams would save hours per task, Amp is the clearest choice in 2026.

Pricing: Free Tier vs Smart Mode

Amp’s pricing model is one of the most developer-friendly structures in the agentic coding market in 2026, and the free tier is not a crippled trial — it includes Amp Tab autocomplete, access to the core agent with no hard token caps, and thread sharing. The free tier is ad-supported, which in practice means sponsored suggestions can appear in non-code contexts like the Amp dashboard and documentation lookups. For most developers doing active work inside the editor or terminal, the ads are minimally intrusive. Smart Mode is the paid tier, and its headline differentiator is zero data sharing: your code, your prompts, and your agent threads are not used to train models or surfaced to Sourcegraph’s analytics pipeline. Smart Mode also unlocks Oracle Mode at full reasoning depth, priority routing during peak demand, and higher throughput limits for subagent parallelism. Pricing for Smart Mode follows a pay-as-you-go credit model with no subscription commitment, and Sourcegraph has committed to zero markup on provider API pricing — you pay the same rate for Claude Opus 4.5, Gemini 3, and GPT-5 tokens that you would pay if you had direct API contracts with Anthropic, Google, and OpenAI. That is a meaningful commitment for teams running high-volume agentic workloads where token costs accumulate quickly. Enterprise workspaces get additional controls around data residency, audit logging, and SSO. The absence of hard token caps on any tier is the most unusual aspect of Amp’s pricing — in an ecosystem where most competitors enforce daily or monthly usage limits, Amp’s decision to let developers run as many tokens as their tasks require without throttling is a direct competitive move against tools that feel constraining on large projects.

Amp’s team-oriented features reflect Sourcegraph’s enterprise DNA — the company spent years building code intelligence infrastructure for large engineering organizations before pivoting to the agent layer, and that experience shows in how Amp thinks about shared workflows. Thread sharing allows any developer on a team to publish an Amp session, complete with the full reasoning trace, tool calls, and file edits, so teammates can review exactly how an agent arrived at a particular solution. This is operationally useful for code review: instead of reviewing a diff in isolation, a reviewer can open the thread and see the sequence of decisions that produced it. Leaderboards track thread activity and contribution across a workspace, giving engineering leads visibility into how teams are using the agent and which developers are generating the highest-value sessions. Custom Commands, stored in the .agents/commands/ directory of a repository, allow teams to codify repeatable agentic workflows as named commands that any team member or the agent itself can invoke. AGENTS.md files sit alongside Custom Commands in the workflow: where Custom Commands define what the agent does, AGENTS.md defines the rules the agent must follow — build commands, testing conventions, naming standards, which directories are off-limits for automated edits. The Handoff System rounds out the collaboration layer by managing context transfer between threads. Rather than the lossy compaction approaches other tools use to deal with context window limits, Amp generates a structured prompt summarizing the current thread’s state before starting a fresh context, preserving the most relevant architectural decisions and in-progress work. Taken together, these features make Amp genuinely usable as a team-wide system rather than a collection of individual developer tools that happen to share a brand.

Who Should Use Amp Code in 2026?

Amp Code is the right tool for developers who have moved past the “AI autocomplete” phase and are ready to delegate genuinely autonomous, multi-step work to an agent. Based on the feature set and practical performance in 2026, the ideal Amp user falls into one of four categories. First, engineers working on large-scale refactors or migrations where parallel execution across dozens of files is the difference between a one-hour job and a ten-hour job — Amp’s subagent system is purpose-built for this. Second, teams that need shared workflow tooling: if your organization wants to standardize how agentic coding works across a codebase, AGENTS.md, Custom Commands, and thread sharing give you the infrastructure to do that without each developer inventing their own approach. Third, developers who need architectural review without always having a senior engineer available — Oracle Mode’s GPT-5-powered analysis fills that role reasonably well for design sanity checks, not as a replacement for human judgment but as a first-pass filter that catches obvious structural problems before they calcify. Fourth, developers who want serious agentic capability without paying for it upfront: Amp’s free tier is the most capable zero-cost option in this category, and the no-hard-cap token policy means free-tier users can run substantial workloads without hitting artificial walls. Who should not use Amp: developers who want a deeply integrated graphical IDE experience (use Cursor), or who need the absolute highest single-task reasoning benchmark performance and are willing to pay for it (use Claude Code). Amp’s strength is breadth, parallelism, and team workflow — not peak single-task benchmark scores.

Frequently Asked Questions

1. Is Amp Code truly free, or does the free tier have significant limitations?

The free tier is genuinely functional, not a crippled demo. It includes Amp Tab autocomplete, access to the core agentic workflows, thread sharing, and no hard caps on token usage. The main trade-off is ads in non-coding contexts (dashboard, documentation) and data sharing with Sourcegraph’s platform. Smart Mode removes both of those and adds Oracle Mode at full reasoning depth, but free-tier users can complete real, production-scale work without hitting a wall.

2. How does multi-model routing work in practice — do I choose which model runs, or does Amp decide?

Amp decides automatically based on task type. Claude Opus 4.5 handles UI and front-end work, Gemini 3 handles code generation, and GPT-5 handles deep reasoning and architectural analysis. You can influence routing by framing your prompt to emphasize the task category, but in normal use the routing is transparent — you write your instruction, and Amp figures out which model or models to invoke.

3. What is the difference between the Handoff System and standard context compaction?

Standard compaction summarizes and discards older context to free up space in the context window, which can drop important details — especially in long sessions where early decisions affect later work. Amp’s Handoff System instead generates a structured prompt that captures the most relevant state of the current thread and uses that as the seed for a new thread. You can review and edit the handoff prompt before continuing, which keeps you in control of what carries forward.

4. Can Amp Code work in a monorepo with multiple services and technology stacks?

Yes, and this is a specific strength. AGENTS.md files are resolved hierarchically — you can have a root-level AGENTS.md with organization-wide rules and service-specific AGENTS.md files in subdirectories that override or extend those rules. Subagents can operate across different service directories simultaneously, and Custom Commands can be defined at the repository root to apply across all services. Amp’s 200K context window is large enough to hold meaningful chunks of multiple services in working memory at once.

5. How does Oracle Mode differ from just asking Amp a regular architectural question?

A standard Amp prompt runs through the default model routing stack optimized for task execution. Oracle Mode explicitly routes to GPT-5 with reasoning level set to high, meaning the model spends significantly more compute on multi-step logical analysis before producing output. In practice, Oracle Mode responses are longer, more structured, and more willing to push back on design decisions rather than simply answering the question asked. Use it when you want a genuine second opinion on architecture, not just a fast answer.

Amp on RockB