<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Reasoning on RockB</title><link>https://baeseokjae.github.io/tags/reasoning/</link><description>Recent content in Reasoning on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 30 Apr 2026 09:04:26 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/reasoning/index.xml" rel="self" type="application/rss+xml"/><item><title>GPT-5.4 API Developer Guide 2026: 1M Context, Computer Use, and 5 Reasoning Levels</title><link>https://baeseokjae.github.io/posts/gpt-5-4-api-developer-guide-2026/</link><pubDate>Thu, 30 Apr 2026 09:04:26 +0000</pubDate><guid>https://baeseokjae.github.io/posts/gpt-5-4-api-developer-guide-2026/</guid><description>Complete GPT-5.4 API guide: 1M token context, 5 reasoning effort levels, native computer use, pricing tiers, and migration from gpt-4o/gpt-5.2.</description><content:encoded><![CDATA[<p>GPT-5.4 is OpenAI&rsquo;s most capable general-purpose model as of 2026, combining a 1,050,000-token context window, native computer use at 75% OSWorld accuracy, and five tunable reasoning effort levels in a single Chat Completions API drop-in. Released March 5, 2026, it replaces gpt-5.2 for most production workloads with no endpoint change required.</p>
<h2 id="what-is-gpt-54-release-date-model-variants-and-whats-new">What Is GPT-5.4? Release Date, Model Variants, and What&rsquo;s New</h2>
<p>GPT-5.4 is OpenAI&rsquo;s flagship general-purpose language model released on March 5, 2026, and it represents the first mainline model to combine frontier reasoning, native computer control, and a 1-million-token context window in a single architecture. Unlike earlier specialized variants — o3 for reasoning or gpt-5.2 for general use — GPT-5.4 integrates GPT-5.3-codex coding capabilities directly, making it a unified backbone for agentic, analytical, and conversational workloads. On launch day, it scored 75.0% on the OSWorld-Verified computer use benchmark, surpassing the human expert baseline of 72.4% — a first for any general-purpose model. On knowledge work (GDPval), GPT-5.4 matches or outperforms industry professionals in 83% of comparisons across 44 occupations. There are two production variants: <strong>gpt-5.4</strong> (the standard model, priced at $2.50/$15 per million input/output tokens) and <strong>gpt-5.4-pro</strong> (optimized for high-stakes enterprise tasks at $30/$180 per million input/output tokens). Both share the same API surface and context window; the pro variant allocates more compute budget per inference by default.</p>
<h3 id="model-variants-at-a-glance">Model Variants at a Glance</h3>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Input Price</th>
          <th>Output Price</th>
          <th>Best For</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>gpt-5.4</td>
          <td>$2.50/M tokens</td>
          <td>$15/M tokens</td>
          <td>General API workloads, agents</td>
      </tr>
      <tr>
          <td>gpt-5.4-pro</td>
          <td>$30/M tokens</td>
          <td>$180/M tokens</td>
          <td>High-stakes enterprise, legal, finance</td>
      </tr>
      <tr>
          <td>gpt-5.4-mini</td>
          <td>See pricing page</td>
          <td>See pricing page</td>
          <td>Cost-sensitive high-volume use</td>
      </tr>
  </tbody>
</table>
<p>The <strong>gpt-5.4-mini</strong> variant is aimed at cost-sensitive, high-volume tasks like classification, extraction, and routing — where full reasoning headroom is unnecessary. For the majority of production API use, gpt-5.4 with tuned <code>reasoning.effort</code> is the right starting point.</p>
<h2 id="gpt-54-api-access-endpoints-authentication-and-drop-in-migration">GPT-5.4 API Access: Endpoints, Authentication, and Drop-In Migration</h2>
<p>GPT-5.4 is a drop-in replacement for any Chat Completions API caller — you change the <code>model</code> string and gain new capabilities immediately. The endpoint, authentication headers, message format, and streaming protocol are identical to gpt-4o and gpt-5.2. Organizations already on gpt-4o can migrate in under five minutes by updating a single environment variable. For gpt-5.2 users, the migration is equally straightforward: all previous parameters (<code>temperature</code>, <code>top_p</code>, <code>tool_choice</code>, <code>response_format</code>) continue to work unchanged. New GPT-5.4-specific parameters (<code>reasoning.effort</code>, <code>computer_use</code>) are optional — omitting them returns behavior equivalent to a standard chat completion. This backward compatibility is intentional: OpenAI&rsquo;s published migration guide classifies gpt-5.4 as a drop-in upgrade, not a breaking API change. Authentication uses the same <code>Authorization: Bearer sk-...</code> header and OpenAI SDK versions ≥ 2.0.0 include native GPT-5.4 support.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> openai <span style="color:#f92672">import</span> OpenAI
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OpenAI()  <span style="color:#75715e"># reads OPENAI_API_KEY from environment</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,           <span style="color:#75715e"># only change needed from gpt-4o</span>
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Explain the 1M context window use cases.&#34;</span>}
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>print(response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message<span style="color:#f92672">.</span>content)
</span></span></code></pre></div><p>For teams using the Node.js SDK:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-typescript" data-lang="typescript"><span style="display:flex;"><span><span style="color:#66d9ef">import</span> <span style="color:#a6e22e">OpenAI</span> <span style="color:#66d9ef">from</span> <span style="color:#e6db74">&#34;openai&#34;</span>;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">const</span> <span style="color:#a6e22e">client</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">new</span> <span style="color:#a6e22e">OpenAI</span>();
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">const</span> <span style="color:#a6e22e">response</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> <span style="color:#a6e22e">client</span>.<span style="color:#a6e22e">chat</span>.<span style="color:#a6e22e">completions</span>.<span style="color:#a6e22e">create</span>({
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">model</span><span style="color:#f92672">:</span> <span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">messages</span><span style="color:#f92672">:</span> [{ <span style="color:#a6e22e">role</span><span style="color:#f92672">:</span> <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#a6e22e">content</span><span style="color:#f92672">:</span> <span style="color:#e6db74">&#34;What changed in GPT-5.4?&#34;</span> }],
</span></span><span style="display:flex;"><span>});
</span></span></code></pre></div><p>Both examples require zero structural changes beyond the model name.</p>
<h2 id="the-5-reasoning-effort-levels-explained-none-to-xhigh-with-code-examples">The 5 Reasoning Effort Levels Explained (none to xhigh) with Code Examples</h2>
<p>The <code>reasoning.effort</code> parameter is GPT-5.4&rsquo;s most important new knob for production developers, offering five distinct levels — <code>none</code>, <code>low</code>, <code>medium</code>, <code>high</code>, and <code>xhigh</code> — that directly control the model&rsquo;s internal chain-of-thought budget, latency, and per-token cost. At <code>none</code>, the model skips explicit reasoning steps and behaves like a fast instruct model; at <code>xhigh</code>, it allocates maximum compute to multi-step reasoning, which can cost 3–5x more than the same request at <code>low</code>. This replaces the blunt <code>temperature</code> dial for controlling response quality: you now tune <em>reasoning depth</em> rather than sampling randomness. For most production routing tasks, <code>low</code> or <code>medium</code> is sufficient. For code synthesis, complex analysis, or agentic planning where accuracy is critical, <code>high</code> or <code>xhigh</code> pays for itself by reducing error-correction loops. OpenAI benchmarks show that xhigh produces measurably better SWE-bench results (57.7%) than medium or low settings on the same model weights.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Classification / routing — low effort, fast and cheap</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>    reasoning<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;low&#34;</span>},
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Classify this support ticket: &#39;My invoice is wrong&#39;&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Complex code synthesis — xhigh effort</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>    reasoning<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;effort&#34;</span>: <span style="color:#e6db74">&#34;xhigh&#34;</span>},
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Refactor this 800-line Python module to async...&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h3 id="choosing-the-right-effort-level">Choosing the Right Effort Level</h3>
<table>
  <thead>
      <tr>
          <th>Task Type</th>
          <th>Recommended Effort</th>
          <th>Relative Cost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Classification, routing, extraction</td>
          <td>none / low</td>
          <td>1x</td>
      </tr>
      <tr>
          <td>Summarization, Q&amp;A over documents</td>
          <td>medium</td>
          <td>1.5–2x</td>
      </tr>
      <tr>
          <td>Code review, debugging, analysis</td>
          <td>high</td>
          <td>2–3x</td>
      </tr>
      <tr>
          <td>Complex agentic planning, synthesis</td>
          <td>xhigh</td>
          <td>3–5x</td>
      </tr>
  </tbody>
</table>
<p>A practical production strategy: default to <code>medium</code>, then gate <code>xhigh</code> behind a fallback path triggered when <code>medium</code> returns low-confidence outputs. This pattern cuts mean inference cost by 40–60% compared to running all requests at <code>xhigh</code>.</p>
<h3 id="latency-expectations-by-effort-level">Latency Expectations by Effort Level</h3>
<p><code>none</code> and <code>low</code> typically return first tokens in under 1 second for short prompts. <code>xhigh</code> on a complex prompt may take 10–30 seconds before streaming begins. Design your UX accordingly — streaming with <code>stream: true</code> is strongly recommended for <code>high</code> and <code>xhigh</code> requests so users see incremental output.</p>
<h2 id="1-million-token-context-window-how-it-works-and-when-to-use-it">1 Million Token Context Window: How It Works and When to Use It</h2>
<p>GPT-5.4 supports up to 1,050,000 input tokens in a single request, making it the first general-purpose model capable of ingesting an entire large codebase, a full book, or months of agent conversation history in one API call. The context window is priced in two tiers: input tokens up to 272,000 are billed at the standard rate ($2.50/M for gpt-5.4), while tokens beyond that threshold cost 2x input and 1.5x output. This means loading a 500K-token codebase costs roughly $3.00 in input fees at standard pricing — a viable alternative to building complex chunking and retrieval pipelines for many teams. The practical effect is that tasks previously requiring RAG infrastructure (embedding databases, chunking logic, retrieval tuning) can now be handled with a single structured prompt. That said, long-context requests still increase latency proportionally, and the model&rsquo;s effective attention over very long inputs degrades for needles buried deep in a haystack — retrieval-augmented approaches remain superior for precise lookup across millions of documents.</p>
<h3 id="practical-1m-context-use-cases">Practical 1M Context Use Cases</h3>
<p><strong>Full codebase analysis:</strong> Load an entire monorepo (300K–600K tokens) and ask GPT-5.4 to trace a bug across files, identify dead code, or generate a migration plan. No chunking, no embedding index.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> os
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Load all Python files in a repo</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">load_repo</span>(path: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    files <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> root, _, fnames <span style="color:#f92672">in</span> os<span style="color:#f92672">.</span>walk(path):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">for</span> fname <span style="color:#f92672">in</span> fnames:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> fname<span style="color:#f92672">.</span>endswith(<span style="color:#e6db74">&#34;.py&#34;</span>):
</span></span><span style="display:flex;"><span>                fpath <span style="color:#f92672">=</span> os<span style="color:#f92672">.</span>path<span style="color:#f92672">.</span>join(root, fname)
</span></span><span style="display:flex;"><span>                content <span style="color:#f92672">=</span> open(fpath)<span style="color:#f92672">.</span>read()
</span></span><span style="display:flex;"><span>                files<span style="color:#f92672">.</span>append(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;### </span><span style="color:#e6db74">{</span>fpath<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">{</span>content<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join(files)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>repo_content <span style="color:#f92672">=</span> load_repo(<span style="color:#e6db74">&#34;./my-project&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;system&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;You are a senior code reviewer.&#34;</span>},
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Review this codebase for security issues:</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">{</span>repo_content<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>}
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p><strong>Long agent session replay:</strong> Persist full conversation history across days or weeks of an agentic workflow and let GPT-5.4 reason over its own prior decisions. This eliminates the need for external memory stores in many agent architectures.</p>
<p><strong>Multi-document contract analysis:</strong> Load 50+ contract PDFs (converted to text) in one request and ask the model to identify conflicting clauses across documents — a task that was previously impractical without custom orchestration.</p>
<h3 id="when-not-to-use-1m-context">When Not to Use 1M Context</h3>
<ul>
<li>When you need sub-second latency — large contexts add seconds of prefill time</li>
<li>When the same document set is queried repeatedly — a vector index is more economical than re-sending 500K tokens each time</li>
<li>When the relevant information is localized — use targeted retrieval and a shorter context for 10x lower cost</li>
</ul>
<h2 id="native-computer-use-api-architecture-osworld-75-benchmark-and-getting-started">Native Computer Use API: Architecture, OSWorld 75% Benchmark, and Getting Started</h2>
<p>GPT-5.4&rsquo;s computer use capability is a first-class API feature that enables the model to observe a desktop environment through screenshots and issue mouse clicks, keyboard input, and application commands — without requiring custom automation code like Selenium or Playwright. At 75.0% on the OSWorld-Verified benchmark (surpassing the 72.4% human expert baseline), GPT-5.4 is the most capable general-purpose model for desktop automation as of 2026. The architecture is straightforward: you send the model a screenshot (base64 or URL), describe a task, and receive a structured action response that your client executes. The model then receives the updated screenshot and continues the loop until the task is complete. GPT-5.4 processes screenshots at full resolution with native vision — no downscaling or OCR preprocessing required — which significantly improves accuracy on small UI elements like checkboxes, dropdown menus, and text fields that trip up lower-resolution approaches.</p>
<h3 id="computer-use-api-quick-start">Computer Use API Quick Start</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> base64
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> PIL <span style="color:#f92672">import</span> ImageGrab
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_screenshot</span>() <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    img <span style="color:#f92672">=</span> ImageGrab<span style="color:#f92672">.</span>grab()
</span></span><span style="display:flex;"><span>    img<span style="color:#f92672">.</span>save(<span style="color:#e6db74">&#34;/tmp/screen.png&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;/tmp/screen.png&#34;</span>, <span style="color:#e6db74">&#34;rb&#34;</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> base64<span style="color:#f92672">.</span>b64encode(f<span style="color:#f92672">.</span>read())<span style="color:#f92672">.</span>decode()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">computer_use_step</span>(screenshot_b64: str, task: str, history: list) <span style="color:#f92672">-&gt;</span> dict:
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>        tools<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;computer_use&#34;</span>}],
</span></span><span style="display:flex;"><span>        tool_choice<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;required&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>history <span style="color:#f92672">+</span> [
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;image_url&#34;</span>, <span style="color:#e6db74">&#34;image_url&#34;</span>: {<span style="color:#e6db74">&#34;url&#34;</span>: <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;data:image/png;base64,</span><span style="color:#e6db74">{</span>screenshot_b64<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>}},
</span></span><span style="display:flex;"><span>                    {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;text&#34;</span>, <span style="color:#e6db74">&#34;text&#34;</span>: task}
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Example: automate a browser form</span>
</span></span><span style="display:flex;"><span>history <span style="color:#f92672">=</span> [{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;system&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;You control a desktop. Complete tasks by clicking and typing.&#34;</span>}]
</span></span><span style="display:flex;"><span>task <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Open Chrome, navigate to example.com, and fill in the contact form with test data.&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>screenshot <span style="color:#f92672">=</span> get_screenshot()
</span></span><span style="display:flex;"><span>action <span style="color:#f92672">=</span> computer_use_step(screenshot, task, history)
</span></span><span style="display:flex;"><span>print(action)  <span style="color:#75715e"># {&#34;tool_calls&#34;: [{&#34;type&#34;: &#34;computer_use&#34;, &#34;action&#34;: &#34;click&#34;, &#34;x&#34;: 450, &#34;y&#34;: 230}]}</span>
</span></span></code></pre></div><h3 id="what-computer-use-replaces">What Computer Use Replaces</h3>
<p>GPT-5.4&rsquo;s computer use scores above human baseline on OSWorld tasks that previously required brittle selector-based automation (Selenium, Playwright) or specialized RPA tools (UiPath, Automation Anywhere). For exploratory or exception-heavy workflows — where the UI state varies and hard-coded selectors break — the vision-driven approach is substantially more robust. For high-volume, predictable workflows with stable UIs, traditional automation remains faster and cheaper.</p>
<h2 id="full-tool-support-overview-function-calling-tool-search-mcp-hosted-shell">Full Tool Support Overview: Function Calling, Tool Search, MCP, Hosted Shell</h2>
<p>GPT-5.4 supports the complete OpenAI tool ecosystem: function calling (same syntax as gpt-4o), tool search for dynamic capability discovery, Model Context Protocol (MCP) for structured external integrations, hosted shell for sandboxed code execution, and computer use for desktop automation. All tools can be combined in a single request — the model reasons over which tools to invoke and in what order, making it a capable orchestrator for multi-step agentic pipelines. Tool search is new in gpt-5.4: rather than defining all available tools upfront, you can expose a tool catalog and let the model retrieve and call relevant tools dynamically, significantly reducing prompt size for large tool ecosystems.</p>
<p>Function calling syntax is unchanged from previous models:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;function&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;function&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;get_weather&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Get current weather for a location&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;parameters&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;location&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;City name&#34;</span>},
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;unit&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;enum&#34;</span>: [<span style="color:#e6db74">&#34;celsius&#34;</span>, <span style="color:#e6db74">&#34;fahrenheit&#34;</span>]}
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;location&#34;</span>]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-5.4&#34;</span>,
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>tools,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;What&#39;s the weather in Tokyo?&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>MCP integration allows GPT-5.4 to connect to external systems (databases, APIs, file systems) through a standardized protocol, enabling persistent agent sessions with consistent tool interfaces across turns.</p>
<h2 id="gpt-54-api-pricing-deep-dive-standard-vs-pro-vs-long-context-tiers">GPT-5.4 API Pricing Deep Dive: Standard vs. Pro vs. Long-Context Tiers</h2>
<p>GPT-5.4 pricing has three dimensions that developers must account for: model tier (standard vs. pro), context length (standard vs. long-context surcharge), and output volume. Standard gpt-5.4 costs $2.50 per million input tokens and $15 per million output tokens — roughly 2x the cost of gpt-4o at launch but with substantially higher capability. GPT-5.4-pro costs $30/$180 per million input/output tokens, a 12x premium over standard, justified for regulated industries (legal, finance, healthcare) where higher accuracy reduces downstream error costs. The long-context surcharge applies beyond 272,000 input tokens: input pricing doubles to $5/M and output pricing rises to $22.50/M for gpt-5.4 standard. For gpt-5.4-pro, long-context rates scale proportionally. This pricing structure rewards efficient context management — trimming system prompts, compressing conversation history, and using retrieval to avoid unnecessary long-context billing is directly profitable at scale.</p>
<h3 id="cost-estimation-examples">Cost Estimation Examples</h3>
<table>
  <thead>
      <tr>
          <th>Workload</th>
          <th>Model</th>
          <th>Tokens</th>
          <th>Estimated Cost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Simple Q&amp;A</td>
          <td>gpt-5.4</td>
          <td>1K in / 500 out</td>
          <td>~$0.010</td>
      </tr>
      <tr>
          <td>Document summary (50 pages)</td>
          <td>gpt-5.4</td>
          <td>40K in / 2K out</td>
          <td>~$0.130</td>
      </tr>
      <tr>
          <td>Full codebase review (400K tokens)</td>
          <td>gpt-5.4</td>
          <td>400K in / 5K out</td>
          <td>~$2.075</td>
      </tr>
      <tr>
          <td>Enterprise contract analysis</td>
          <td>gpt-5.4-pro</td>
          <td>200K in / 10K out</td>
          <td>~$7.80</td>
      </tr>
  </tbody>
</table>
<p>For high-volume workloads (&gt;10M tokens/day), OpenAI offers volume pricing tiers — contact OpenAI sales for enterprise pricing.</p>
<h3 id="when-to-choose-gpt-54-vs-gpt-54-pro">When to Choose gpt-5.4 vs. gpt-5.4-pro</h3>
<p>Choose <strong>gpt-5.4</strong> for: API integrations, code generation, agentic workflows, content generation, and any workload where errors are recoverable.</p>
<p>Choose <strong>gpt-5.4-pro</strong> for: legal document review, financial modeling, medical information synthesis, or any domain where a single error has significant downstream cost exceeding the $30/M input premium.</p>
<h2 id="gpt-54-vs-claude-opus-46-vs-gemini-31-pro-benchmark-comparison">GPT-5.4 vs. Claude Opus 4.6 vs. Gemini 3.1 Pro: Benchmark Comparison</h2>
<p>GPT-5.4 leads on computer use and knowledge work benchmarks but faces strong competition from Claude Opus 4.6 (Anthropic) and Gemini 3.1 Pro (Google) on coding and long-document reasoning tasks. On the OSWorld-Verified benchmark, GPT-5.4 scores 75.0% — Claude Opus 4.6 scores approximately 68% and Gemini 3.1 Pro reaches roughly 62%, making GPT-5.4 the clear leader for desktop automation. On SWE-bench Pro (real-world software engineering), GPT-5.4 at xhigh reasoning scores 57.7%, which is competitive with but not dominant over Claude Opus 4.6&rsquo;s reported 55–58% range. On GDPval (knowledge work across 44 occupations), GPT-5.4 leads at 83% versus the mid-70s for competitors. For developers choosing a flagship model, the decision typically hinges on use case: GPT-5.4 for computer use and knowledge work, Claude Opus 4.6 for long-document fidelity and nuanced instruction following.</p>
<h3 id="benchmark-comparison-table">Benchmark Comparison Table</h3>
<table>
  <thead>
      <tr>
          <th>Benchmark</th>
          <th>GPT-5.4</th>
          <th>Claude Opus 4.6</th>
          <th>Gemini 3.1 Pro</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>OSWorld (computer use)</td>
          <td><strong>75.0%</strong></td>
          <td>~68%</td>
          <td>~62%</td>
      </tr>
      <tr>
          <td>SWE-bench Pro (coding)</td>
          <td><strong>57.7%</strong></td>
          <td>~56%</td>
          <td>~53%</td>
      </tr>
      <tr>
          <td>GDPval (knowledge work)</td>
          <td><strong>83.0%</strong></td>
          <td>~76%</td>
          <td>~74%</td>
      </tr>
      <tr>
          <td>Human baseline (OSWorld)</td>
          <td>72.4%</td>
          <td>—</td>
          <td>—</td>
      </tr>
  </tbody>
</table>
<p>Note: Claude Opus 4.6 and Gemini 3.1 Pro figures are approximations based on published benchmarks as of Q1 2026; official numbers may vary by evaluation methodology.</p>
<h3 id="api-cost-comparison">API Cost Comparison</h3>
<p>At standard pricing, GPT-5.4 ($2.50/M input) is competitively priced against Claude Opus 4.6 ($3.00/M input) and Gemini 3.1 Pro ($2.00/M input). GPT-5.4-pro ($30/M input) is positioned above all standard competitor offerings and targets enterprise use cases where accuracy premium is justifiable.</p>
<h2 id="best-practices-and-production-recommendations">Best Practices and Production Recommendations</h2>
<p>Production-ready GPT-5.4 deployments share several patterns that distinguish robust integrations from fragile prototypes. The most important is <strong>matching reasoning effort to task complexity at the routing layer</strong>: use a lightweight classifier (even gpt-5.4-mini at <code>none</code> effort) to route incoming requests to the appropriate effort level before forwarding to gpt-5.4. This alone typically reduces per-request cost by 30–50% without measurable quality loss at the application level. Second, <strong>implement structured output validation</strong> for all tool-calling workflows — gpt-5.4 is highly reliable but not infallible, and downstream systems should validate JSON structure before acting on tool call arguments. Third, <strong>monitor long-context billing proactively</strong>: instrument token counts before each request and alert when a conversation history approaches the 272K threshold so you can summarize or truncate before incurring the 2x surcharge. Finally, for computer use agents, implement <strong>screenshot diffing</strong> to detect when an action had no visible effect — this prevents infinite loops where the model keeps clicking a button that isn&rsquo;t registering.</p>
<h3 id="production-checklist">Production Checklist</h3>
<ul>
<li>Set <code>reasoning.effort</code> explicitly — don&rsquo;t rely on defaults for cost-sensitive workloads</li>
<li>Use <code>stream: true</code> for all requests with <code>high</code> or <code>xhigh</code> effort to improve perceived latency</li>
<li>Track <code>usage.total_tokens</code> per response and alert above 272K input</li>
<li>Validate tool call JSON with a schema before executing side effects</li>
<li>For computer use: implement action timeout + screenshot diff to detect stuck states</li>
<li>Cache frequent system prompts with the Prompt Caching API to reduce input token costs</li>
<li>Use <code>gpt-5.4-mini</code> for pre-filtering and routing, gpt-5.4 for execution</li>
</ul>
<h3 id="migration-from-gpt-4o-in-3-steps">Migration from gpt-4o in 3 Steps</h3>
<ol>
<li>Update <code>model=&quot;gpt-4o&quot;</code> to <code>model=&quot;gpt-5.4&quot;</code> in all API calls</li>
<li>Add <code>reasoning={&quot;effort&quot;: &quot;medium&quot;}</code> as a default; tune per endpoint based on latency requirements</li>
<li>Test with your existing prompt suite — gpt-5.4 is backward-compatible but often returns longer, more detailed responses at <code>medium</code>+ effort, which may affect downstream parsers</li>
</ol>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>What is GPT-5.4 and when was it released?</strong></p>
<p>GPT-5.4 is OpenAI&rsquo;s most capable general-purpose model as of 2026, released on March 5, 2026. It combines a 1,050,000-token context window, five reasoning effort levels (<code>none</code> through <code>xhigh</code>), native computer use, and full tool support in a single Chat Completions API model.</p>
<p><strong>How do I access the GPT-5.4 API?</strong></p>
<p>Access GPT-5.4 through the standard OpenAI Chat Completions API by setting <code>model=&quot;gpt-5.4&quot;</code>. It requires an OpenAI API key with access to GPT-5.4 (available to all paid API tiers as of Q2 2026). No endpoint changes or new authentication methods are needed.</p>
<p><strong>What is the <code>reasoning.effort</code> parameter and which level should I use?</strong></p>
<p><code>reasoning.effort</code> controls how much internal chain-of-thought reasoning the model performs. Use <code>low</code> or <code>none</code> for classification and simple extraction, <code>medium</code> for most conversational and summarization tasks, and <code>high</code> or <code>xhigh</code> for complex code synthesis, agentic planning, or high-stakes analysis. xhigh costs 3–5x more than low but delivers measurably better accuracy on complex tasks.</p>
<p><strong>How much does GPT-5.4 cost compared to GPT-5.4-pro?</strong></p>
<p>Standard gpt-5.4 costs $2.50 per million input tokens and $15 per million output tokens. GPT-5.4-pro costs $30/$180 per million input/output tokens — 12x more expensive but optimized for accuracy-critical enterprise workloads. Input tokens beyond 272K are billed at 2x the standard rate for both variants.</p>
<p><strong>Can GPT-5.4 replace Selenium or Playwright for browser automation?</strong></p>
<p>For exploratory, exception-heavy, or visually complex automation tasks, GPT-5.4&rsquo;s computer use (75% OSWorld accuracy) is a strong alternative to selector-based automation. For high-volume, stable UI workflows where selectors are reliable, traditional automation remains faster and cheaper. Most teams use GPT-5.4 computer use for the hard cases — onboarding flows, exception handling, testing dynamic UIs — while keeping Playwright for predictable, high-frequency tasks.</p>
]]></content:encoded></item></channel></rss>