<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Claude Opus 4.7 on RockB</title><link>https://baeseokjae.github.io/tags/claude-opus-4.7/</link><description>Recent content in Claude Opus 4.7 on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 07 May 2026 09:04:49 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/claude-opus-4.7/index.xml" rel="self" type="application/rss+xml"/><item><title>Claude Opus 4.7 Developer Guide: xhigh Effort, Task Budgets, and Migration</title><link>https://baeseokjae.github.io/posts/claude-opus-4-7-developer-guide-2026/</link><pubDate>Thu, 07 May 2026 09:04:49 +0000</pubDate><guid>https://baeseokjae.github.io/posts/claude-opus-4-7-developer-guide-2026/</guid><description>Complete Claude Opus 4.7 developer guide: xhigh effort levels, task budgets, adaptive thinking, breaking API changes, and migration from Opus 4.6.</description><content:encoded><![CDATA[<p>Claude Opus 4.7 is Anthropic&rsquo;s most capable model as of April 2026, scoring 87.6% on SWE-bench Verified and introducing a redesigned thinking system that replaces manual <code>budget_tokens</code> with effort-based adaptive thinking. If you&rsquo;re upgrading from Opus 4.6, four breaking API changes require code updates before your apps will run.</p>
<h2 id="whats-new-in-claude-opus-47">What&rsquo;s New in Claude Opus 4.7</h2>
<p>Claude Opus 4.7, released April 16, 2026, represents a step-change in both coding capability and agentic architecture. The headline benchmark is SWE-bench Verified at 87.6% — up from 80.8% on Opus 4.6 — and SWE-bench Pro at 64.3% (up from 53.4%). On CursorBench, the real-world coding benchmark, Opus 4.7 scores 70% versus 58% for Opus 4.6. These gains come primarily from architectural improvements to multi-step reasoning: the model now plans across more steps before committing to an action, which matters most for complex debugging and refactoring tasks. Vision capability received an equally dramatic upgrade — visual acuity improved from 54.5% to 98.5%, and the model now supports 3.75MP images, three times the resolution of Opus 4.6. For computer use, Opus 4.7 scores 78.0% on OSWorld-Verified, the leading score among currently available models. Pricing stayed flat at $5/M input and $25/M output tokens, but a new tokenizer encodes the same text using up to 35% more tokens — so your actual bills will increase even without code changes.</p>
<table>
  <thead>
      <tr>
          <th>Benchmark</th>
          <th>Opus 4.6</th>
          <th>Opus 4.7</th>
          <th>Change</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>SWE-bench Verified</td>
          <td>80.8%</td>
          <td>87.6%</td>
          <td>+6.8pp</td>
      </tr>
      <tr>
          <td>SWE-bench Pro</td>
          <td>53.4%</td>
          <td>64.3%</td>
          <td>+10.9pp</td>
      </tr>
      <tr>
          <td>CursorBench</td>
          <td>58%</td>
          <td>70%</td>
          <td>+12pp</td>
      </tr>
      <tr>
          <td>Visual Acuity</td>
          <td>54.5%</td>
          <td>98.5%</td>
          <td>+44pp</td>
      </tr>
      <tr>
          <td>OSWorld Computer Use</td>
          <td>—</td>
          <td>78.0%</td>
          <td>new</td>
      </tr>
  </tbody>
</table>
<h2 id="the-five-effort-levels-low-medium-high-xhigh-and-max">The Five Effort Levels: low, medium, high, xhigh, and max</h2>
<p>Claude Opus 4.7&rsquo;s effort levels are a ranked abstraction over the model&rsquo;s internal thinking budget, replacing the manual <code>budget_tokens</code> integer you had to guess at in Opus 4.6. The five levels — <code>low</code>, <code>medium</code>, <code>high</code>, <code>xhigh</code>, and <code>max</code> — map to progressively larger internal thinking allocations. <code>low</code> uses roughly the same thinking depth as a non-extended-thinking call and is best for simple Q&amp;A, summarization, or single-function code generation. <code>medium</code> suits multi-file edits or short agentic chains where you want some reasoning depth without excessive latency. <code>high</code> is the general-purpose pick for most production workloads: complex debugging, architecture reviews, and document analysis. <code>xhigh</code> is Claude Code&rsquo;s default for agentic tasks and gives the model significant headroom to plan before touching files. <code>max</code> is uncapped — use it only for research-grade tasks where cost is secondary and answer quality is paramount. The practical rule of thumb: start at <code>high</code>, step up to <code>xhigh</code> when the model visibly misses steps in multi-file tasks, and reserve <code>max</code> for tasks where a single wrong decision costs more than the extra inference spend.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Standard high-effort call</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">8096</span>,
</span></span><span style="display:flex;"><span>    thinking<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;high&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Refactor this auth module for async support.&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h3 id="effort-level-selection-guide">Effort Level Selection Guide</h3>
<table>
  <thead>
      <tr>
          <th>Effort</th>
          <th>Latency</th>
          <th>Cost</th>
          <th>Best For</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>low</td>
          <td>~1s</td>
          <td>$$</td>
          <td>Q&amp;A, summaries, lookup</td>
      </tr>
      <tr>
          <td>medium</td>
          <td>~3s</td>
          <td>$$$</td>
          <td>Single-file edits, short chains</td>
      </tr>
      <tr>
          <td>high</td>
          <td>~8s</td>
          <td>$$$$</td>
          <td>Complex debugging, arch reviews</td>
      </tr>
      <tr>
          <td>xhigh</td>
          <td>~20s</td>
          <td>$$$$$</td>
          <td>Multi-repo agentic tasks</td>
      </tr>
      <tr>
          <td>max</td>
          <td>varies</td>
          <td>$$$$$$</td>
          <td>Research, proof generation</td>
      </tr>
  </tbody>
</table>
<h2 id="xhigh-effort-deep-dive-why-claude-code-defaults-to-it">xhigh Effort Deep Dive: Why Claude Code Defaults to It</h2>
<p><code>xhigh</code> is Claude Code&rsquo;s default effort level because agentic coding tasks require the model to hold a large working context simultaneously — current file state, test output, dependency graph, and user intent — before writing a single line. At <code>high</code> effort, the model reliably handles changes within one file but frequently misses implicit coupling between files (for example, updating a function signature but not its callers in sibling modules). At <code>xhigh</code>, the extra thinking headroom allows the model to trace call graphs before editing, which reduces multi-file regression rates by an estimated 40% based on internal Anthropic evaluations at the time of release. The cost difference between <code>high</code> and <code>xhigh</code> is roughly 2–3x on thinking-token spend; for a typical Claude Code session editing 5–10 files, this translates to a few cents of additional inference cost per session. The economic argument for defaulting to <code>xhigh</code> is straightforward: the cost of a missed cross-file bug (manual debugging time, CI churn, delayed PR) far exceeds the marginal thinking cost. When to step back to <code>high</code> or <code>medium</code>: read-only tasks (explaining code, generating docs), single-file changes with no external callers, and batch jobs where you&rsquo;re running thousands of identical operations and can afford occasional misses.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Claude Code equivalent — xhigh for agentic coding</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">16384</span>,
</span></span><span style="display:flex;"><span>    thinking<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;xhigh&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    system<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You are an autonomous coding agent. Plan before you act.&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Add rate limiting to all API endpoints.&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h2 id="task-budgets-the-new-way-to-control-agentic-token-spend">Task Budgets: The New Way to Control Agentic Token Spend</h2>
<p>Task budgets are a beta feature in Claude Opus 4.7 that let you give the model a total token allowance for an agentic session, then watch the model use that countdown to self-regulate pacing and gracefully wrap up work before hitting the limit. A task budget is passed via the <code>anthropic-beta: task-budgets-2026-03</code> request header and an <code>output_config</code> block in the request body. The minimum task budget is 20,000 tokens; for agentic coding tasks, Anthropic recommends 50,000–128,000 tokens. When roughly 20% of the budget remains, the model begins summarizing instead of expanding, avoids opening new files, and writes transition notes so the next session can pick up cleanly. This is qualitatively different from simply setting <code>max_tokens</code>: a high <code>max_tokens</code> value tells the model how long a single response can be, while a task budget tells the model how much total computation the whole job is worth. The practical effect is that long-running agents stop mid-task gracefully rather than truncating mid-sentence when they run out of tokens.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">4096</span>,
</span></span><span style="display:flex;"><span>    extra_headers<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;anthropic-beta&#34;</span>: <span style="color:#e6db74">&#34;task-budgets-2026-03&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    thinking<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;xhigh&#34;</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    output_config<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;task_budget_tokens&#34;</span>: <span style="color:#ae81ff">80000</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Audit and fix all security issues in this repository.&#34;</span>
</span></span><span style="display:flex;"><span>    }]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Check budget usage in response metadata</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> hasattr(response, <span style="color:#e6db74">&#39;usage&#39;</span>):
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Budget used: </span><span style="color:#e6db74">{</span>response<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>task_budget_tokens_used<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Budget remaining: </span><span style="color:#e6db74">{</span>response<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>task_budget_tokens_remaining<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><h3 id="when-to-set-a-task-budget">When to Set a Task Budget</h3>
<p>Set a task budget any time you&rsquo;re running an autonomous agent loop that could generate many tool calls — file reads, web searches, code execution — and you want cost predictability. Without a task budget, an agent loop on a large codebase can silently consume hundreds of thousands of tokens before you notice. With a task budget, the model treats the allowance as a resource to manage, not a limit to test. A reasonable heuristic: set the budget to 2–3x the token count of your initial prompt plus expected tool outputs.</p>
<h2 id="adaptive-thinking-what-replaced-manual-budget_tokens">Adaptive Thinking: What Replaced Manual budget_tokens</h2>
<p>Adaptive thinking is the internal mechanism behind Opus 4.7&rsquo;s effort levels — it is what actually replaced the manual <code>budget_tokens</code> integer from Opus 4.6&rsquo;s extended thinking API. In Opus 4.6, you had to supply an explicit <code>budget_tokens</code> integer (e.g., <code>32000</code>) and the model would use up to that many tokens on its internal reasoning chain before producing a response. In practice, this was difficult to tune: too low and the model skipped reasoning steps; too high and you paid for thinking tokens you didn&rsquo;t need. Adaptive thinking replaces this with a model-side allocation policy. When you select <code>effort: &quot;xhigh&quot;</code>, the model dynamically allocates thinking tokens based on task complexity — simple parts of a prompt get fewer thinking tokens, while novel or ambiguous sub-problems get more. Empirically, this produces better results at lower total thinking-token spend than a static <code>budget_tokens</code> value set to the theoretical maximum. The tradeoff: you lose direct control over thinking-token counts. If you were using <code>budget_tokens</code> for cost-bounding (capping spend on a cheap endpoint), use task budgets instead — they provide cost control at the session level rather than per-call.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Opus 4.6 style — deprecated in 4.7</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># response = client.messages.create(</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">#     model=&#34;claude-opus-4-6&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">#     thinking={&#34;type&#34;: &#34;enabled&#34;, &#34;budget_tokens&#34;: 32000},</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">#     ...</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># )</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Opus 4.7 style — effort replaces budget_tokens</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">8096</span>,
</span></span><span style="display:flex;"><span>    thinking<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;high&#34;</span>  <span style="color:#75715e"># model allocates thinking tokens adaptively</span>
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Your prompt here&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h2 id="migration-guide-breaking-changes-from-opus-46-to-47">Migration Guide: Breaking Changes from Opus 4.6 to 4.7</h2>
<p>There are four breaking changes when migrating from claude-opus-4-6 to claude-opus-4-7. Each requires a code update; none have automatic fallback behavior — requests with old-style parameters will return a 400 error.</p>
<p><strong>Breaking Change 1: <code>budget_tokens</code> removed from thinking config</strong></p>
<p>The <code>thinking.budget_tokens</code> field is no longer accepted. Replace with <code>thinking.effort</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># BEFORE (Opus 4.6)</span>
</span></span><span style="display:flex;"><span>thinking<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>, <span style="color:#e6db74">&#34;budget_tokens&#34;</span>: <span style="color:#ae81ff">50000</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># AFTER (Opus 4.7)</span>
</span></span><span style="display:flex;"><span>thinking<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>, <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;high&#34;</span>}
</span></span></code></pre></div><p><strong>Breaking Change 2: Model ID is <code>claude-opus-4-7</code> (not <code>claude-opus-4-7-20260416</code>)</strong></p>
<p>Opus 4.7 uses a simplified model ID without the date suffix. The dated variant is not accepted.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># BEFORE</span>
</span></span><span style="display:flex;"><span>model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-6-20251201&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># AFTER</span>
</span></span><span style="display:flex;"><span>model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>
</span></span></code></pre></div><p><strong>Breaking Change 3: <code>output_config</code> replaces <code>max_tokens_to_sample</code> for agentic configs</strong></p>
<p>If you were passing <code>max_tokens_to_sample</code> in a non-standard field, the new structure is <code>output_config.max_tokens</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># BEFORE — non-standard field some integrations used</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># &#34;max_tokens_to_sample&#34;: 8096</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># AFTER — use output_config for advanced output settings</span>
</span></span><span style="display:flex;"><span>output_config<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;max_tokens&#34;</span>: <span style="color:#ae81ff">8096</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;task_budget_tokens&#34;</span>: <span style="color:#ae81ff">80000</span>  <span style="color:#75715e"># optional</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p><strong>Breaking Change 4: New tokenizer — recalibrate token budgets</strong></p>
<p>The Opus 4.7 tokenizer encodes the same English text using up to 35% more tokens than Opus 4.6. Any hardcoded token limits in your application need to be increased proportionally, or you will hit <code>max_tokens</code> truncation on responses that previously fit.</p>
<table>
  <thead>
      <tr>
          <th>Token Budget</th>
          <th>Opus 4.6 Equivalent</th>
          <th>Adjustment</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>4,096</td>
          <td>~5,500 Opus 4.6 tokens</td>
          <td>Increase to 5,500+</td>
      </tr>
      <tr>
          <td>8,192</td>
          <td>~11,000 Opus 4.6 tokens</td>
          <td>Increase to 11,000+</td>
      </tr>
      <tr>
          <td>32,000</td>
          <td>~43,000 Opus 4.6 tokens</td>
          <td>Increase to 43,000+</td>
      </tr>
  </tbody>
</table>
<h2 id="api-setup-and-code-examples-python-sdk">API Setup and Code Examples (Python SDK)</h2>
<p>Claude Opus 4.7 API setup requires the Anthropic Python SDK version 0.52.0 or later, which added support for the <code>effort</code> field in the thinking config and the <code>output_config</code> block. Install or upgrade with <code>pip install anthropic&gt;=0.52.0</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Verify SDK version</span>
</span></span><span style="display:flex;"><span>print(anthropic<span style="color:#f92672">.</span>__version__)  <span style="color:#75715e"># should be 0.52.0+</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-api-key&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Basic call with adaptive thinking</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">analyze_code</span>(code: str, effort: str <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;high&#34;</span>) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">8096</span>,
</span></span><span style="display:flex;"><span>        thinking<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;effort&#34;</span>: effort
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Review this code for bugs and security issues:</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">{</span>code<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>        }]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Extract text blocks (thinking blocks are separate)</span>
</span></span><span style="display:flex;"><span>    text_blocks <span style="color:#f92672">=</span> [b <span style="color:#66d9ef">for</span> b <span style="color:#f92672">in</span> response<span style="color:#f92672">.</span>content <span style="color:#66d9ef">if</span> b<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;text&#34;</span>]
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> text_blocks[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text <span style="color:#66d9ef">if</span> text_blocks <span style="color:#66d9ef">else</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Agentic session with task budget</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">run_agent_session</span>(task: str, budget: int <span style="color:#f92672">=</span> <span style="color:#ae81ff">80000</span>) <span style="color:#f92672">-&gt;</span> dict:
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">4096</span>,
</span></span><span style="display:flex;"><span>        extra_headers<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;anthropic-beta&#34;</span>: <span style="color:#e6db74">&#34;task-budgets-2026-03&#34;</span>},
</span></span><span style="display:flex;"><span>        thinking<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>, <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;xhigh&#34;</span>},
</span></span><span style="display:flex;"><span>        output_config<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;task_budget_tokens&#34;</span>: budget},
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: task}]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;response&#34;</span>: response<span style="color:#f92672">.</span>content,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;budget_used&#34;</span>: getattr(response<span style="color:#f92672">.</span>usage, <span style="color:#e6db74">&#34;task_budget_tokens_used&#34;</span>, <span style="color:#66d9ef">None</span>),
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;stop_reason&#34;</span>: response<span style="color:#f92672">.</span>stop_reason
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><h3 id="streaming-with-extended-thinking">Streaming with Extended Thinking</h3>
<p>When streaming Opus 4.7 responses with thinking enabled, thinking blocks arrive before text blocks. Filter them correctly:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">with</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>stream(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">4096</span>,
</span></span><span style="display:flex;"><span>    thinking<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;enabled&#34;</span>, <span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;high&#34;</span>},
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Explain this architecture...&#34;</span>}]
</span></span><span style="display:flex;"><span>) <span style="color:#66d9ef">as</span> stream:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> event <span style="color:#f92672">in</span> stream:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> hasattr(event, <span style="color:#e6db74">&#39;type&#39;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> event<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;content_block_start&#34;</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> event<span style="color:#f92672">.</span>content_block<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;thinking&#34;</span>:
</span></span><span style="display:flex;"><span>                    print(<span style="color:#e6db74">&#34;[Thinking block started]&#34;</span>)
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">elif</span> event<span style="color:#f92672">.</span>content_block<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;text&#34;</span>:
</span></span><span style="display:flex;"><span>                    print(<span style="color:#e6db74">&#34;[Response started]&#34;</span>)
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">elif</span> event<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;content_block_delta&#34;</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> hasattr(event<span style="color:#f92672">.</span>delta, <span style="color:#e6db74">&#39;text&#39;</span>):
</span></span><span style="display:flex;"><span>                    print(event<span style="color:#f92672">.</span>delta<span style="color:#f92672">.</span>text, end<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>, flush<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span></code></pre></div><h2 id="benchmarks-and-performance-vs-gpt-55-and-gemini-31-pro">Benchmarks and Performance vs. GPT-5.5 and Gemini 3.1 Pro</h2>
<p>Claude Opus 4.7 holds the top position on SWE-bench Verified at 87.6%, ahead of GPT-5.5 (approximately 83% based on public leaderboard data) and Gemini 3.1 Pro (approximately 79%). On SWE-bench Pro — a harder variant requiring changes across multiple files — Opus 4.7 scores 64.3%, which represents a meaningful lead over competitors that have not published Pro scores. The CursorBench result of 70% is particularly notable because it measures performance on real-world developer tasks drawn from actual Cursor IDE sessions, not synthetic benchmarks. Context window parity: Opus 4.7 and GPT-5.5 both support 1M token contexts, while Gemini 3.1 Pro extends to 2M. For pricing comparison, GPT-5.5 costs approximately $15/M input and $60/M output — roughly 3x the price of Opus 4.7 at $5/$25. Gemini 3.1 Pro pricing is roughly equivalent to Opus 4.7 on input but higher on output. The practical conclusion for most developer teams: Opus 4.7 delivers the best coding benchmark results at mid-range frontier pricing.</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>SWE-bench Verified</th>
          <th>Input $/M</th>
          <th>Output $/M</th>
          <th>Context</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Claude Opus 4.7</td>
          <td>87.6%</td>
          <td>$5</td>
          <td>$25</td>
          <td>1M</td>
      </tr>
      <tr>
          <td>GPT-5.5</td>
          <td>~83%</td>
          <td>$15</td>
          <td>$60</td>
          <td>1M</td>
      </tr>
      <tr>
          <td>Gemini 3.1 Pro</td>
          <td>~79%</td>
          <td>~$5</td>
          <td>~$30</td>
          <td>2M</td>
      </tr>
  </tbody>
</table>
<h2 id="real-cost-analysis-the-tokenizer-change-and-what-it-means">Real Cost Analysis: The Tokenizer Change and What It Means</h2>
<p>The Opus 4.7 tokenizer is a significant hidden cost factor that Anthropic&rsquo;s &ldquo;unchanged pricing&rdquo; announcement underemphasizes. The new tokenizer encodes English prose using up to 35% more tokens than the Opus 4.6 tokenizer — which means the same input text costs 35% more to process, even though the per-token price is unchanged. The cause is a shift toward a more granular tokenization strategy that improves model reasoning on code and structured data but increases token counts for natural language. The financial impact compounds across both input and output: a 10,000-token Opus 4.6 prompt may become a 13,500-token Opus 4.7 prompt, and the model&rsquo;s longer internal reasoning chains (from adaptive thinking) further increase output token counts. For teams running high-volume APIs — batch processing, document analysis, code review pipelines — expect effective cost increases of 25–45% when migrating from Opus 4.6, even if the per-token price looks the same. To measure your actual exposure: run 100 representative prompts through the Anthropic token counter API against both model IDs before migrating, and compute the average token ratio. Use this ratio to adjust your cost projections and rate-limit configurations before full rollout.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Measure tokenizer difference before migrating</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">compare_tokenization</span>(text: str) <span style="color:#f92672">-&gt;</span> dict:
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Uses the tokenize endpoint (not a message call)</span>
</span></span><span style="display:flex;"><span>    result_46 <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>count_tokens(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-6-20251201&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: text}]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    result_47 <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>count_tokens(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: text}]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;opus_4_6_tokens&#34;</span>: result_46<span style="color:#f92672">.</span>input_tokens,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;opus_4_7_tokens&#34;</span>: result_47<span style="color:#f92672">.</span>input_tokens,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;ratio&#34;</span>: result_47<span style="color:#f92672">.</span>input_tokens <span style="color:#f92672">/</span> result_46<span style="color:#f92672">.</span>input_tokens
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><h2 id="when-to-use-opus-47-vs-sonnet-46">When to Use Opus 4.7 vs. Sonnet 4.6</h2>
<p>Model selection between Opus 4.7 and Sonnet 4.6 comes down to three factors: task complexity, latency tolerance, and cost budget. Opus 4.7 wins on any task that requires multi-file reasoning, long planning chains, or vision analysis — its performance gap on complex coding tasks is large enough to justify the price difference. Sonnet 4.6 wins when you need fast responses under 2 seconds, when the task is straightforward (single-function generation, classification, extraction), or when you&rsquo;re running high-volume pipelines where cost per call matters more than peak performance. A practical segmentation for most teams: use Sonnet 4.6 as the default for your product&rsquo;s real-time interactive features (chat, autocomplete, Q&amp;A), and reserve Opus 4.7 for background agentic tasks (automated code review, batch analysis, autonomous PR generation). This two-tier approach typically reduces inference spend by 60–70% compared to using Opus 4.7 everywhere, while preserving top-tier quality on the tasks that actually benefit from it. If you&rsquo;re already using Claude Code, xhigh effort on Opus 4.7 is the right default for coding agents and there&rsquo;s no reason to downgrade to Sonnet for agentic sessions — the task budget feature makes the cost predictable.</p>
<table>
  <thead>
      <tr>
          <th>Scenario</th>
          <th>Recommended Model</th>
          <th>Effort</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Autonomous coding agent</td>
          <td>Opus 4.7</td>
          <td>xhigh + task budget</td>
      </tr>
      <tr>
          <td>Code review (batch)</td>
          <td>Opus 4.7</td>
          <td>high</td>
      </tr>
      <tr>
          <td>Real-time chat</td>
          <td>Sonnet 4.6</td>
          <td>n/a</td>
      </tr>
      <tr>
          <td>Document Q&amp;A</td>
          <td>Sonnet 4.6</td>
          <td>n/a</td>
      </tr>
      <tr>
          <td>Complex architecture design</td>
          <td>Opus 4.7</td>
          <td>max</td>
      </tr>
      <tr>
          <td>Single-function generation</td>
          <td>Sonnet 4.6</td>
          <td>n/a</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Q: What is the model ID for Claude Opus 4.7?</strong>
The model ID is <code>claude-opus-4-7</code> — no date suffix. Using the old pattern <code>claude-opus-4-7-20260416</code> will return a 400 error. Update your model strings before migrating.</p>
<p><strong>Q: Does Claude Opus 4.7 still support <code>budget_tokens</code> in the thinking config?</strong>
No. The <code>budget_tokens</code> field was removed as a breaking change. Replace <code>thinking.budget_tokens</code> with <code>thinking.effort</code> using one of the five string values: <code>low</code>, <code>medium</code>, <code>high</code>, <code>xhigh</code>, or <code>max</code>. Requests with <code>budget_tokens</code> will return a 400 validation error.</p>
<p><strong>Q: What is the minimum task budget for Claude Opus 4.7?</strong>
The minimum <code>task_budget_tokens</code> value is 20,000. Anthropic recommends 50,000–128,000 tokens for agentic coding sessions. You must also include the <code>anthropic-beta: task-budgets-2026-03</code> request header; without it, the <code>output_config</code> block is ignored.</p>
<p><strong>Q: Why is my Opus 4.7 bill higher even though pricing is the same?</strong>
The new tokenizer encodes the same text using up to 35% more tokens than the Opus 4.6 tokenizer. Since you pay per token, the same prompts cost more. Use the <code>messages.count_tokens</code> endpoint to measure the ratio for your specific workloads before full migration.</p>
<p><strong>Q: When should I use <code>xhigh</code> versus <code>max</code> effort?</strong>
Use <code>xhigh</code> for production agentic tasks where you want high-quality multi-step reasoning at a predictable cost. Use <code>max</code> only for research-grade tasks — proof generation, exhaustive security audits, or novel architecture design — where answer quality is the top priority and you&rsquo;ve explicitly decided cost is secondary. Max effort has no internal token cap on thinking, so costs are unpredictable on long tasks without a task budget.</p>
]]></content:encoded></item></channel></rss>