<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Autonomous-Coding on RockB</title><link>https://baeseokjae.github.io/tags/autonomous-coding/</link><description>Recent content in Autonomous-Coding on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Tue, 21 Apr 2026 13:05:10 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/autonomous-coding/index.xml" rel="self" type="application/rss+xml"/><item><title>Cursor Background Agents Guide 2026: Run Autonomous Coding Tasks in the Background</title><link>https://baeseokjae.github.io/posts/cursor-background-agents-2026/</link><pubDate>Tue, 21 Apr 2026 13:05:10 +0000</pubDate><guid>https://baeseokjae.github.io/posts/cursor-background-agents-2026/</guid><description>Complete guide to Cursor background agents in 2026: setup, pricing, Computer Use, best practices, and when to use them vs Claude Code or Codex.</description><content:encoded><![CDATA[<p>Cursor background agents let you fire off a coding task — a bug fix, test suite, refactor, or feature — and walk away while a cloud VM handles it asynchronously, returning a pull request when it&rsquo;s done. Unlike in-editor Agent Mode that runs interactively beside you, background agents run in parallel on isolated remote machines, freeing you to work on something else entirely.</p>
<h2 id="what-are-cursor-background-agents">What Are Cursor Background Agents?</h2>
<p>Cursor background agents are cloud-hosted autonomous coding workers that run on dedicated virtual machines outside your local editor. Each agent receives a task description, checks out your repository, executes file edits using its own model and toolchain, and opens a pull request with the results — entirely without you watching. This is the architectural break from traditional AI coding assistants: instead of a synchronous conversation where you approve every step, you submit a task once and the agent works asynchronously in a remote sandbox. As of early 2026, Cursor reports that 35% of their internal merged PRs are created by background agents — a figure that signals how much trust the company itself places in the workflow. The agents support custom Dockerfiles, multi-platform access (desktop, web, mobile, Slack, GitHub), and, since February 24, 2026, full Computer Use capabilities including browser access, video recording, and remote desktop screenshots. The key architectural components are: contextual codebase awareness (the agent reads your repo before starting), task planning (it reasons about scope before editing), and conflict avoidance (it isolates to a git worktree so parallel agents never collide).</p>
<h3 id="background-agents-vs-agent-mode-the-core-difference">Background Agents vs Agent Mode: The Core Difference</h3>
<p>Background agents run remotely on cloud VMs, work asynchronously without user supervision, and deliver output as a PR. Agent Mode runs locally inside your editor session, operates interactively with you in the loop, and applies edits directly to your workspace. Choose background agents when the task is well-defined and parallelizable. Choose Agent Mode when you need exploratory back-and-forth, complex debugging, or tasks that require your architectural judgment at each step.</p>
<h2 id="how-to-set-up-and-use-cursor-background-agents">How to Set Up and Use Cursor Background Agents</h2>
<p>Setting up Cursor background agents takes under five minutes from the Cursor editor and requires a Pro plan or higher. Navigate to <strong>cursor.com/agents</strong> or open the Background Agents panel in the editor (the cloud icon in the sidebar). Connect your GitHub or GitLab account, disable Privacy Mode for the target repository, select the repo, write a task description, choose your model, and click Submit. The agent clones your repo into an isolated cloud VM, plans its approach using codebase-aware search tools, executes file changes, runs tests if you specified a test command in the acceptance criteria, and opens a PR for your review. You can monitor progress in real time from the Agents dashboard — each agent shows its current step, active tool calls, token usage, and any errors encountered. If a task goes in the wrong direction, you can cancel mid-run from the dashboard. Tasks can also be triggered from the Cursor mobile app or the Slack integration, which is useful for delegating work during code review sessions, standups, or when you&rsquo;re away from your desk. Multiple agents can run simultaneously, each in its own isolated git worktree.</p>
<h3 id="writing-an-effective-task-description">Writing an Effective Task Description</h3>
<p>The quality of your task description determines 80% of the output quality. Background agents cannot ask clarifying questions mid-task the way interactive Agent Mode can. A strong prompt includes: the specific goal (&ldquo;Add unit tests for the <code>UserAuthService</code> class covering all public methods&rdquo;), the acceptance criteria (&ldquo;All tests must pass with <code>npm test</code>&rdquo;), any constraints (&ldquo;Do not modify the existing method signatures&rdquo;), and context pointers (&ldquo;See <code>src/auth/UserAuthService.ts</code> and <code>src/__tests__/</code> for existing patterns&rdquo;). Vague prompts like &ldquo;improve the tests&rdquo; lead to premature completion — the agent decides it&rsquo;s done when it isn&rsquo;t. Specific, verifiable goals with explicit acceptance criteria get dramatically better results.</p>
<h3 id="enabling-privacy-mode-considerations">Enabling Privacy Mode Considerations</h3>
<p>Background agents require disabling Cursor&rsquo;s Privacy Mode because the agent needs to send your code to remote infrastructure. For regulated industries (healthcare, finance, government), verify with your compliance team before enabling cloud agents. Privacy Mode disabling is a per-workspace setting; you can leave it enabled for sensitive repos and disabled for open-source or internal tooling work.</p>
<h2 id="cloud-agents-with-computer-use-the-february-2026-upgrade">Cloud Agents with Computer Use: The February 2026 Upgrade</h2>
<p>On February 24, 2026, Cursor launched Cloud Agents with Computer Use, extending background agents with a full browser environment, video recording of every session, and remote desktop access for visual verification tasks. This upgrade means agents can now run end-to-end tests that require a real browser — loading your app, clicking buttons, taking screenshots of UI states, and verifying visual regressions. Each Computer Use session produces a video recording that you can review after the agent completes, giving you proof of what the agent actually did rather than just what it claimed to do. This capability closes a major gap compared to purely code-only agents: a background agent can now write a Playwright test, run it in a headless browser, capture a screenshot of the result, and include that screenshot in the PR description. The practical result is that agents can handle full-stack UI validation tasks that previously required developer attention.</p>
<h3 id="what-computer-use-enables-that-code-only-agents-cannot">What Computer Use Enables That Code-Only Agents Cannot</h3>
<p>Computer Use unlocks tasks requiring visual confirmation: screenshot regression testing, UI component validation, form submission flows, OAuth redirect sequences, and any workflow that depends on a real browser rendering engine. Before February 2026, background agents could only validate through CLI tools and test runners. With Computer Use, agents can verify that a CSS change doesn&rsquo;t break the mobile layout, that a login form redirects correctly, or that a new dashboard component renders without console errors — then attach screenshot evidence to the PR.</p>
<h2 id="pricing-and-real-costs-plans-max-mode-and-budget-planning">Pricing and Real Costs: Plans, MAX Mode, and Budget Planning</h2>
<p>Cursor background agents always run in MAX mode, which applies a 20% surcharge on top of the underlying model credit cost. That surcharge adds up quickly: daily background agent users typically spend $60–$100/month in total Cursor costs according to Morph&rsquo;s 2026 analysis. Understanding the tier structure is essential before scaling usage. The <strong>Hobby</strong> plan has no background agent access. <strong>Pro</strong> at $20/month includes agents but overages are common once you exceed the included credits. <strong>Pro+</strong> at $60/month is a better fit for solo developers who run multiple agents per day. <strong>Ultra</strong> at $200/month is described as &ldquo;best value&rdquo; for heavy users, offering the highest credit allocation before per-credit overages kick in. <strong>Teams</strong> at $40/user/month works for organizations with shared credit pools. A practical budget rule: if you&rsquo;re running 3–5 background agent tasks per day, start with Pro+ and monitor credit consumption for the first week before committing to a tier.</p>
<h3 id="the-max-mode-surcharge-explained">The MAX Mode Surcharge Explained</h3>
<p>MAX mode ensures agents use the most capable model available for the task rather than defaulting to a cheaper model to save credits. While this produces better results, the 20% surcharge is applied before you see credit consumption in the dashboard. To control costs, write tightly scoped tasks that complete in fewer agent steps — a 1,000-step agent run costs more than two 400-step runs for equivalent work. Set task scope carefully: background agents don&rsquo;t stop when they &ldquo;feel&rdquo; done if there&rsquo;s more code to touch, so overly broad prompts can spiral into large credit charges.</p>
<h2 id="when-background-agents-shine--and-when-they-dont">When Background Agents Shine — and When They Don&rsquo;t</h2>
<p>Cursor background agents excel at tasks that are parallelizable, well-defined, have verifiable completion criteria, and don&rsquo;t require real-time architectural input. The best use cases from the 2026 developer workflow evidence are: (1) <strong>Test generation</strong> — write all missing unit tests for a module, following existing patterns; (2) <strong>Bug fixes with clear reproduction steps</strong> — &ldquo;this function throws TypeError when input is null, fix it and add a test&rdquo;; (3) <strong>Code migrations</strong> — &ldquo;update all API calls from v2 to v3 schema across the codebase&rdquo;; (4) <strong>Documentation</strong> — &ldquo;write JSDoc for all exported functions in <code>src/utils/</code>&rdquo;; (5) <strong>Pattern-following features</strong> — &ldquo;add a new endpoint <code>/api/v1/orders</code> that follows the same pattern as <code>/api/v1/users</code>&rdquo;. Background agents underperform on exploratory debugging without a reproduction case, architectural decisions that require codebase-wide judgment, tasks with ambiguous success criteria, and anything where the agent needs to ask you a question mid-task.</p>
<h3 id="decision-framework-background-vs-interactive">Decision Framework: Background vs Interactive</h3>
<p>Use this quick decision filter: Is the task fully specifiable in writing? → If yes, background agent is viable. Does completion have an objective pass/fail test? → If yes, background agent is preferred. Does the task require real-time architectural feedback? → If yes, use Agent Mode. Is the task exploratory (&ldquo;figure out why X is slow&rdquo;)? → Use Agent Mode. Is speed critical (need results in 2 minutes)? → Use Agent Mode for small tasks since background agents have startup overhead.</p>
<h2 id="cursor-background-agents-vs-claude-code">Cursor Background Agents vs Claude Code</h2>
<p>Cursor background agents and Claude Code represent two different philosophies in AI-assisted development. Cursor background agents run on remote cloud VMs, work asynchronously, operate with 70K–120K usable context tokens, and deliver output as pull requests. Claude Code runs locally in your terminal, operates as an interactive conversation, offers up to 200K token context, and modifies your local files directly. The usable context gap matters for large monorepos: Claude Code can hold more of your codebase in context at once, which helps on tasks that span many files or require understanding deeply interconnected systems. Cursor background agents compensate with parallelism — you can run five agents at once across five separate tasks, something Claude Code&rsquo;s interactive model doesn&rsquo;t support natively. Claude Code also excels at exploratory work: &ldquo;figure out why this test flakes&rdquo; is a conversation, not a specification. Cursor background agents excel at specification-first work where you can write down exactly what done looks like before starting.</p>
<table>
  <thead>
      <tr>
          <th>Dimension</th>
          <th>Cursor Background Agents</th>
          <th>Claude Code</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Execution location</td>
          <td>Remote cloud VM</td>
          <td>Local terminal</td>
      </tr>
      <tr>
          <td>Interaction model</td>
          <td>Async, fire-and-forget</td>
          <td>Interactive conversation</td>
      </tr>
      <tr>
          <td>Output format</td>
          <td>Pull request</td>
          <td>Local file edits</td>
      </tr>
      <tr>
          <td>Context window</td>
          <td>70K–120K tokens</td>
          <td>Up to 200K tokens</td>
      </tr>
      <tr>
          <td>Parallelism</td>
          <td>Native (multiple agents)</td>
          <td>Requires worktrees</td>
      </tr>
      <tr>
          <td>Best for</td>
          <td>Well-defined, parallelizable tasks</td>
          <td>Exploratory, context-heavy work</td>
      </tr>
      <tr>
          <td>Privacy</td>
          <td>Requires Privacy Mode off</td>
          <td>Fully local</td>
      </tr>
      <tr>
          <td>Internet access</td>
          <td>Yes (Computer Use)</td>
          <td>No by default</td>
      </tr>
  </tbody>
</table>
<h2 id="cursor-background-agents-vs-openai-codex">Cursor Background Agents vs OpenAI Codex</h2>
<p>Cursor background agents and OpenAI Codex CLI (the agentic version launched in 2025) both target async autonomous coding but differ significantly in architecture and capability. Cursor background agents support multiple model backends, include Computer Use with browser access, allow custom Dockerfiles, and integrate natively with the Cursor editor ecosystem. OpenAI Codex agents use the <code>codex-1</code> model exclusively, do not have internet access by default (network is disabled in sandboxes), and focus purely on code-level changes without visual verification. Cursor background agents also have a programmatic-free limitation (no public API for triggering agents from CI/CD pipelines), while Codex can be invoked via the OpenAI API. For teams with existing OpenAI API integrations, Codex offers better pipeline automation. For teams already using Cursor as their primary editor, background agents offer a superior end-to-end experience with Computer Use and multi-model support.</p>
<table>
  <thead>
      <tr>
          <th>Feature</th>
          <th>Cursor Background Agents</th>
          <th>OpenAI Codex Agents</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Internet access</td>
          <td>Yes (Computer Use)</td>
          <td>No (disabled by default)</td>
      </tr>
      <tr>
          <td>Model options</td>
          <td>Multiple (GPT-4o, Claude, etc.)</td>
          <td>codex-1 only</td>
      </tr>
      <tr>
          <td>Browser testing</td>
          <td>Yes (video + screenshots)</td>
          <td>No</td>
      </tr>
      <tr>
          <td>Custom environment</td>
          <td>Dockerfile support</td>
          <td>Sandbox configuration</td>
      </tr>
      <tr>
          <td>Programmatic API</td>
          <td>No public API</td>
          <td>Via OpenAI API</td>
      </tr>
      <tr>
          <td>Editor integration</td>
          <td>Native Cursor</td>
          <td>CLI / API-first</td>
      </tr>
      <tr>
          <td>Cost model</td>
          <td>Credits + 20% MAX surcharge</td>
          <td>Per-token API pricing</td>
      </tr>
  </tbody>
</table>
<h2 id="best-practices-for-background-agent-prompts">Best Practices for Background Agent Prompts</h2>
<p>Writing effective background agent prompts requires a different mindset than writing interactive Agent Mode prompts. The agent cannot interrupt you to ask questions, so every ambiguity in your prompt becomes a judgment call the agent makes on its own — often incorrectly. Follow this structure for reliable results: <strong>Goal</strong> (one sentence, specific outcome), <strong>Files/Scope</strong> (exact paths or patterns to touch), <strong>Constraints</strong> (what not to change), <strong>Acceptance Criteria</strong> (how to verify done), <strong>Reference Patterns</strong> (link to similar existing code). For example: &ldquo;Goal: Add rate limiting to the <code>/api/v1/auth/login</code> endpoint. Files: <code>src/api/auth/login.ts</code>, <code>src/middleware/</code>. Constraint: Do not change the request/response schema. Acceptance: <code>npm test</code> passes and <code>curl</code> to the endpoint returns 429 after 5 requests/minute. Reference: See <code>src/middleware/rateLimiter.ts</code> for existing pattern.&rdquo; This structure eliminates the top three failure modes: scope creep, schema changes, and untested implementations.</p>
<h3 id="using-plan-mode-before-background-agent-submission">Using Plan Mode Before Background Agent Submission</h3>
<p>Before submitting a task to a background agent, use Cursor&rsquo;s Plan Mode (Shift+Tab in Agent Mode) to let the agent research your codebase and outline its approach. Review the plan, correct misunderstandings, and then use that plan as the task description for the background agent. This two-step workflow catches the most expensive failure mode: an agent that confidently executes the wrong approach for 200 steps. Saving plans to <code>.cursor/plans/</code> also creates team documentation of intent that survives the agent session.</p>
<h2 id="customizing-agent-environments-with-dockerfiles">Customizing Agent Environments with Dockerfiles</h2>
<p>Cursor background agents support custom Dockerfiles for configuring the VM environment before the agent starts work. This is critical for projects with native dependencies, specific Node.js/Python versions, database services, or build tools not in the default image. Place a <code>Dockerfile</code> in your repository root or specify a path in the agent configuration. The agent builds the image, starts the container, and runs your setup scripts before executing the task. A practical example for a TypeScript/Postgres project:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> node:20-slim</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apt-get update <span style="color:#f92672">&amp;&amp;</span> apt-get install -y postgresql-client<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /workspace</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> package*.json ./<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> npm ci<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>For monorepos with multiple services, create service-specific Dockerfiles and point the agent at the relevant one per task. Custom environments dramatically reduce &ldquo;works in agent, fails in CI&rdquo; scenarios by ensuring the agent runs in the same environment as your tests.</p>
<h2 id="running-multiple-agents-in-parallel-with-worktrees">Running Multiple Agents in Parallel with Worktrees</h2>
<p>Cursor background agents use native git worktrees to isolate parallel agent sessions — each agent works in its own branch and file tree, eliminating merge conflicts between concurrent tasks. You can submit five background agent tasks simultaneously and each will work independently. The practical workflow: submit Task A (add unit tests for module X), Task B (fix bug in module Y), and Task C (update documentation) all at once. Review the three PRs when they&rsquo;re done. This parallel execution model is background agents&rsquo; strongest advantage over interactive tools. A single developer can effectively manage 10–20 background agent tasks in a day, reviewing PRs rather than directing each agent step-by-step. The bottleneck shifts from &ldquo;time to write code&rdquo; to &ldquo;time to review code,&rdquo; which is a significantly more scalable constraint.</p>
<h2 id="privacy-security-and-enterprise-considerations">Privacy, Security, and Enterprise Considerations</h2>
<p>Background agents send your code to Cursor&rsquo;s cloud infrastructure, which means Privacy Mode must be disabled for cloud agent use. For enterprise teams, Cursor&rsquo;s Teams plan ($40/user/month) includes organization-wide settings, SSO, and dedicated infrastructure options. Before enabling background agents for regulated codebases, verify: (1) your data processing agreement with Cursor covers cloud agent processing; (2) secrets are not stored in plain text in the repository (use <code>.env</code> files excluded from git, not hardcoded credentials); (3) your security team has reviewed the expanded attack surface from remote VM access. The security surface area includes the VM itself, the git credentials provided to the agent, and any API keys the agent accesses during testing. Use repository-scoped credentials and rotate them after large agent sessions on sensitive codebases.</p>
<h2 id="known-limitations-in-2026">Known Limitations in 2026</h2>
<p>Cursor background agents have several documented limitations that affect real-world usage. <strong>Premature completion</strong> is the most common issue: an agent decides a task is &ldquo;done&rdquo; when it has made some progress, even if the full acceptance criteria aren&rsquo;t met. This is most frequent with vague prompts. <strong>Cost surprises</strong> happen when a task scope is broader than expected and the agent continues executing across hundreds of extra steps — always set a mental budget per task and check the Agents dashboard mid-run for long tasks. <strong>No programmatic API</strong> means you cannot trigger background agents from CI/CD pipelines, GitHub Actions, or custom scripts — they must be submitted via the Cursor UI or mobile app. <strong>Privacy Mode requirement</strong> blocks enterprise use cases where code must stay fully local. <strong>Context window ceiling</strong> at 70K–120K tokens limits performance on very large monorepos compared to local tools with full context access.</p>
<h2 id="real-world-use-cases-what-teams-actually-delegate">Real-World Use Cases: What Teams Actually Delegate</h2>
<p>The highest-value background agent use cases observed in 2026 deployment patterns are: <strong>Test coverage fills</strong> — agents write tests for code with low coverage following existing patterns; <strong>API migration</strong> — updating call sites from a deprecated API to a new one across hundreds of files; <strong>Documentation generation</strong> — writing or updating JSDoc/docstrings across a module; <strong>Lint/formatting fixes</strong> — applying a new ESLint or Prettier config to the entire codebase; <strong>Security patches</strong> — applying a known fix pattern (like escaping a SQL parameter) across all affected query sites. These tasks share a key property: they&rsquo;re tedious, rule-based, and don&rsquo;t require architectural judgment — exactly where background agents outperform and human developers underperform. Teams at Fortune 500 companies (Cursor claims 50%+ adoption) use background agents most heavily for the backlog of technical debt tasks that never make sprint planning.</p>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Can I use Cursor background agents on the Hobby plan?</strong>
No. Background agents require at least the Pro plan ($20/month). The Hobby plan has no background agent access. For regular background agent use, Pro+ ($60/month) or Ultra ($200/month) are better fits due to higher credit allocations.</p>
<p><strong>Do Cursor background agents work on private repositories?</strong>
Yes, background agents work with private repositories. You need to connect your GitHub/GitLab account with appropriate read/write permissions. Note that Privacy Mode must be disabled for cloud agents to access your code.</p>
<p><strong>How long does a typical background agent task take?</strong>
Simple bug fixes or small test additions typically complete in 5–15 minutes. Larger tasks like writing a full test suite for a module or performing a multi-file migration can take 30–90 minutes. Tasks with Computer Use (browser testing) add overhead from browser startup and screenshot capture.</p>
<p><strong>Can background agents run tests automatically?</strong>
Yes. Background agents can run your test suite as part of the task. Include the test command in your prompt&rsquo;s acceptance criteria (&ldquo;run <code>npm test</code> and all tests must pass&rdquo;). Agents with Computer Use can also run browser-based E2E tests and capture screenshots as verification.</p>
<p><strong>What&rsquo;s the difference between a background agent failing and succeeding poorly?</strong>
A failing agent returns an error and stops — you&rsquo;re charged for the work done to that point. A poorly succeeding agent opens a PR that doesn&rsquo;t meet your criteria, which is harder to catch. This is why verifiable acceptance criteria in your prompt (specific test commands, exact output expectations) are critical — they give the agent a reliable definition of done rather than relying on its own judgment.</p>
]]></content:encoded></item></channel></rss>