<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Claude API on RockB</title><link>https://baeseokjae.github.io/tags/claude-api/</link><description>Recent content in Claude API on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 18 Apr 2026 08:46:52 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/claude-api/index.xml" rel="self" type="application/rss+xml"/><item><title>How to Use Claude API in Python 2026: Complete Developer Guide</title><link>https://baeseokjae.github.io/posts/claude-api-python-guide-2026/</link><pubDate>Sat, 18 Apr 2026 08:46:52 +0000</pubDate><guid>https://baeseokjae.github.io/posts/claude-api-python-guide-2026/</guid><description>Step-by-step Claude API Python tutorial: install the SDK, send messages, use streaming, tool use, and prompt caching with real code examples.</description><content:encoded><![CDATA[<p>The Claude API lets you integrate Anthropic&rsquo;s Claude models into any Python application in under 10 lines of code. Install the <code>anthropic</code> package, set your API key, and call <code>client.messages.create()</code> — that&rsquo;s the entire setup. This guide covers everything from basic text generation to advanced features like streaming, tool use, vision, and prompt caching that can cut your costs by up to 90%.</p>
<h2 id="what-is-the-claude-api-and-why-use-it-in-2026">What Is the Claude API and Why Use It in 2026?</h2>
<p>The Claude API is Anthropic&rsquo;s REST interface for accessing Claude models — including Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5 — programmatically. Unlike ChatGPT&rsquo;s API, Claude&rsquo;s API is built with safety-first architecture, a 200K-token context window (one of the largest available), and native tool-use support that lets agents take real actions. As of 2026, the Claude API powers production workloads at companies like Salesforce, Notion, and Slack, processing billions of tokens daily. The Python SDK (<code>anthropic</code>) wraps the REST API with type-safe client objects, automatic retries, and streaming support. Developers choose Claude over alternatives for three reasons: superior instruction following on long documents, better refusal calibration (fewer false positives), and prompt caching that makes repeated context tokens 90% cheaper. The API follows the Messages format — a list of role/content pairs — which maps cleanly to Python dicts and requires no special framework.</p>
<h2 id="how-to-install-the-anthropic-python-sdk">How to Install the Anthropic Python SDK</h2>
<p>Installing the Anthropic Python SDK takes one command and works with Python 3.8 and above. Run <code>pip install anthropic</code> to get the latest stable release. As of April 2026, the current SDK version is 0.51.x, which includes native support for Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5 — Anthropic&rsquo;s full 2026 model lineup. The SDK ships with built-in prompt caching helpers, async streaming via <code>AsyncAnthropic</code>, and typed response objects that eliminate guessing about response structure. For production projects, always pin the version in your <code>requirements.txt</code> — breaking changes between major versions are documented in the official changelog, and unpinned installs have caused production outages when Anthropic ships major SDK updates. The SDK has zero mandatory runtime dependencies beyond <code>httpx</code> and <code>pydantic</code>, keeping your container images lean. Virtual environments (via <code>venv</code> or <code>conda</code>) are strongly recommended to avoid conflicts with other packages. If you need async support — which most production apps do for FastAPI, Starlette, or asyncio-based services — <code>AsyncAnthropic</code> is included out of the box with no extra install step.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># For async support (already included, no extras needed)</span>
</span></span><span style="display:flex;"><span>pip install anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Pin for production</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;anthropic==0.51.0&#34;</span> &gt;&gt; requirements.txt
</span></span></code></pre></div><h3 id="setting-up-your-api-key">Setting Up Your API Key</h3>
<p>Get your API key from <a href="https://console.anthropic.com">console.anthropic.com</a> and store it as an environment variable — never hardcode it in source files.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>export ANTHROPIC_API_KEY<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sk-ant-...&#34;</span>
</span></span></code></pre></div><p>The SDK automatically reads <code>ANTHROPIC_API_KEY</code> from the environment. You can also pass it explicitly: <code>Anthropic(api_key=&quot;sk-ant-...&quot;)</code>.</p>
<h2 id="how-to-send-your-first-message-to-claude">How to Send Your First Message to Claude</h2>
<p>Sending your first message to Claude requires three objects: an <code>Anthropic</code> client, a model name, and a messages list. The response comes back as a <code>Message</code> object with a <code>content</code> list — each item is a <code>TextBlock</code> with a <code>.text</code> attribute. The pattern mirrors OpenAI&rsquo;s Chat Completions API but adds Anthropic-specific fields like <code>stop_reason</code> and <code>usage.cache_read_input_tokens</code>. In 2026, Claude Sonnet 4.6 (<code>claude-sonnet-4-6</code>) is the recommended default for most production workloads: it&rsquo;s approximately 3x faster than Opus at one-fifth the cost, while matching Opus on the majority of coding and instruction-following tasks. Use Haiku 4.5 for latency-sensitive tasks like autocomplete or real-time suggestions (under 300ms p95), and Opus 4.7 for complex, multi-step reasoning tasks like autonomous agents or long-document analysis. The <code>max_tokens</code> parameter sets the upper bound on output length — Claude will stop sooner if it naturally finishes. Always set it to a reasonable value; leaving it too high doesn&rsquo;t cost extra but can cause issues if downstream code assumes a short response.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>message <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Explain dependency injection in Python with a concrete example.&#34;</span>}
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>print(message<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Input tokens: </span><span style="color:#e6db74">{</span>message<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>input_tokens<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Output tokens: </span><span style="color:#e6db74">{</span>message<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>output_tokens<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><h3 id="understanding-the-response-object">Understanding the Response Object</h3>
<p>The <code>Message</code> object contains everything you need to build robust applications:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Full response structure</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>id          <span style="color:#75715e"># &#34;msg_01XFDUDYJgAACzvnptvVoYEL&#34;</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>type        <span style="color:#75715e"># &#34;message&#34;</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>role        <span style="color:#75715e"># &#34;assistant&#34;</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>content     <span style="color:#75715e"># [TextBlock(text=&#34;...&#34;, type=&#34;text&#34;)]</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>model       <span style="color:#75715e"># &#34;claude-sonnet-4-6&#34;</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>stop_reason <span style="color:#75715e"># &#34;end_turn&#34; | &#34;max_tokens&#34; | &#34;tool_use&#34; | &#34;stop_sequence&#34;</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>input_tokens   <span style="color:#75715e"># 25</span>
</span></span><span style="display:flex;"><span>message<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>output_tokens  <span style="color:#75715e"># 315</span>
</span></span></code></pre></div><h2 id="how-to-use-streaming-with-the-claude-api">How to Use Streaming with the Claude API</h2>
<p>Streaming is how you build responsive Claude-powered UIs — instead of waiting seconds for the full response, tokens arrive as they&rsquo;re generated. The Python SDK provides two streaming interfaces: <code>client.messages.stream()</code> for high-level event handling, and <code>client.messages.create(stream=True)</code> for raw SSE access. In 2026, streaming is the default approach for any user-facing application because it drops perceived latency by 80% — users see text appearing in under 200ms instead of waiting 3-5 seconds for a complete response. The streaming context manager handles connection cleanup automatically, so you don&rsquo;t need try/finally blocks.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># High-level streaming (recommended)</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>stream(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Write a Python function to parse JSON safely.&#34;</span>}]
</span></span><span style="display:flex;"><span>) <span style="color:#66d9ef">as</span> stream:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> text <span style="color:#f92672">in</span> stream<span style="color:#f92672">.</span>text_stream:
</span></span><span style="display:flex;"><span>        print(text, end<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>, flush<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Get the final message after streaming completes</span>
</span></span><span style="display:flex;"><span>final_message <span style="color:#f92672">=</span> stream<span style="color:#f92672">.</span>get_final_message()
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">Total tokens: </span><span style="color:#e6db74">{</span>final_message<span style="color:#f92672">.</span>usage<span style="color:#f92672">.</span>output_tokens<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><h3 id="async-streaming-for-fastapi-and-async-frameworks">Async Streaming for FastAPI and Async Frameworks</h3>
<p>For FastAPI, asyncio, or any async Python application, use <code>AsyncAnthropic</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">stream_response</span>(prompt: str):
</span></span><span style="display:flex;"><span>    client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>AsyncAnthropic()
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">with</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>stream(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: prompt}]
</span></span><span style="display:flex;"><span>    ) <span style="color:#66d9ef">as</span> stream:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">async</span> <span style="color:#66d9ef">for</span> text <span style="color:#f92672">in</span> stream<span style="color:#f92672">.</span>text_stream:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">yield</span> text
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># FastAPI example</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> fastapi <span style="color:#f92672">import</span> FastAPI
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> fastapi.responses <span style="color:#f92672">import</span> StreamingResponse
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>app <span style="color:#f92672">=</span> FastAPI()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@app.post</span>(<span style="color:#e6db74">&#34;/chat&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">chat</span>(prompt: str):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> StreamingResponse(
</span></span><span style="display:flex;"><span>        stream_response(prompt),
</span></span><span style="display:flex;"><span>        media_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;text/plain&#34;</span>
</span></span><span style="display:flex;"><span>    )
</span></span></code></pre></div><h2 id="how-to-use-system-prompts-and-multi-turn-conversations">How to Use System Prompts and Multi-Turn Conversations</h2>
<p>System prompts define Claude&rsquo;s role, constraints, and behavior for your entire session. They sit outside the <code>messages</code> array at the top level of the API call, and Claude treats them as authoritative instructions that override user requests when they conflict. A well-crafted system prompt is the single highest-leverage tool for steering model behavior — it&rsquo;s where you define persona, output format, safety constraints, and domain knowledge. Multi-turn conversations work by appending each user message and assistant response to the messages list, giving Claude full conversation history. In 2026, most production apps store conversation history in Redis or a database and reconstruct the messages array on each request rather than keeping it in memory.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">chat_with_memory</span>():
</span></span><span style="display:flex;"><span>    conversation_history <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    system_prompt <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;&#34;You are a senior Python engineer with 10 years of experience.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    You give concise, production-ready code examples.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    Always mention edge cases and potential issues.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#66d9ef">True</span>:
</span></span><span style="display:flex;"><span>        user_input <span style="color:#f92672">=</span> input(<span style="color:#e6db74">&#34;You: &#34;</span>)<span style="color:#f92672">.</span>strip()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> user_input<span style="color:#f92672">.</span>lower() <span style="color:#f92672">in</span> (<span style="color:#e6db74">&#34;quit&#34;</span>, <span style="color:#e6db74">&#34;exit&#34;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">break</span>
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        conversation_history<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;content&#34;</span>: user_input
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>            model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>            max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">2048</span>,
</span></span><span style="display:flex;"><span>            system<span style="color:#f92672">=</span>system_prompt,
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">=</span>conversation_history
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        assistant_message <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>        conversation_history<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;content&#34;</span>: assistant_message
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Claude: </span><span style="color:#e6db74">{</span>assistant_message<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>chat_with_memory()
</span></span></code></pre></div><h2 id="how-to-use-tool-use-function-calling-with-claude-api">How to Use Tool Use (Function Calling) with Claude API</h2>
<p>Tool use — also called function calling — lets Claude invoke external functions, APIs, and databases to answer questions it couldn&rsquo;t handle from training data alone. You define tools as JSON schemas describing their name, description, and input parameters. Claude decides when to call a tool, extracts the arguments, and returns a <code>tool_use</code> block. Your code executes the function and returns the result back to Claude, which then generates a final response. This tool-use loop is the foundation of every AI agent built on Claude in 2026. The key insight: Claude&rsquo;s tool selection is driven by the description field — write descriptions like you&rsquo;re writing a docstring for a human engineer who needs to decide which function to call.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Define tools as JSON schemas</span>
</span></span><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;get_weather&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Get current weather for a city. Returns temperature in Celsius, conditions, and humidity.&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;input_schema&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;city&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;City name (e.g., &#39;San Francisco&#39;, &#39;Tokyo&#39;)&#34;</span>
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;country_code&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;ISO 3166-1 alpha-2 country code (e.g., &#39;US&#39;, &#39;JP&#39;)&#34;</span>
</span></span><span style="display:flex;"><span>                }
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;city&#34;</span>]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    },
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;search_database&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Search a product database by keyword. Returns list of matching products with prices.&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;input_schema&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;query&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>},
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;limit&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;integer&#34;</span>, <span style="color:#e6db74">&#34;default&#34;</span>: <span style="color:#ae81ff">10</span>}
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;query&#34;</span>]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">execute_tool</span>(tool_name: str, tool_input: dict) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> tool_name <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;get_weather&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># In production, call a real weather API</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> json<span style="color:#f92672">.</span>dumps({
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;city&#34;</span>: tool_input[<span style="color:#e6db74">&#34;city&#34;</span>],
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;temperature&#34;</span>: <span style="color:#ae81ff">22</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;conditions&#34;</span>: <span style="color:#e6db74">&#34;Partly cloudy&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;humidity&#34;</span>: <span style="color:#ae81ff">65</span>
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">elif</span> tool_name <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;search_database&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> json<span style="color:#f92672">.</span>dumps([
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;Product A&#34;</span>, <span style="color:#e6db74">&#34;price&#34;</span>: <span style="color:#ae81ff">29.99</span>},
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;Product B&#34;</span>, <span style="color:#e6db74">&#34;price&#34;</span>: <span style="color:#ae81ff">49.99</span>}
</span></span><span style="display:flex;"><span>        ])
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> json<span style="color:#f92672">.</span>dumps({<span style="color:#e6db74">&#34;error&#34;</span>: <span style="color:#e6db74">&#34;Unknown tool&#34;</span>})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">run_agent</span>(user_message: str):
</span></span><span style="display:flex;"><span>    messages <span style="color:#f92672">=</span> [{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: user_message}]
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">while</span> <span style="color:#66d9ef">True</span>:
</span></span><span style="display:flex;"><span>        response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>            model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>            max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">4096</span>,
</span></span><span style="display:flex;"><span>            tools<span style="color:#f92672">=</span>tools,
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">=</span>messages
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> response<span style="color:#f92672">.</span>stop_reason <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;end_turn&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># No tool call — return final text response</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">for</span> block <span style="color:#f92672">in</span> response<span style="color:#f92672">.</span>content:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> hasattr(block, <span style="color:#e6db74">&#34;text&#34;</span>):
</span></span><span style="display:flex;"><span>                    <span style="color:#66d9ef">return</span> block<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>        
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> response<span style="color:#f92672">.</span>stop_reason <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;tool_use&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Add assistant&#39;s response with tool_use blocks</span>
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">.</span>append({<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: response<span style="color:#f92672">.</span>content})
</span></span><span style="display:flex;"><span>            
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Execute all tool calls</span>
</span></span><span style="display:flex;"><span>            tool_results <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">for</span> block <span style="color:#f92672">in</span> response<span style="color:#f92672">.</span>content:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">if</span> block<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;tool_use&#34;</span>:
</span></span><span style="display:flex;"><span>                    result <span style="color:#f92672">=</span> execute_tool(block<span style="color:#f92672">.</span>name, block<span style="color:#f92672">.</span>input)
</span></span><span style="display:flex;"><span>                    tool_results<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;tool_result&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;tool_use_id&#34;</span>: block<span style="color:#f92672">.</span>id,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;content&#34;</span>: result
</span></span><span style="display:flex;"><span>                    })
</span></span><span style="display:flex;"><span>            
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Return tool results to Claude</span>
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">.</span>append({<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: tool_results})
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>result <span style="color:#f92672">=</span> run_agent(<span style="color:#e6db74">&#34;What&#39;s the weather like in Tokyo?&#34;</span>)
</span></span><span style="display:flex;"><span>print(result)
</span></span></code></pre></div><h2 id="how-to-use-vision-image-analysis-with-claude-api">How to Use Vision (Image Analysis) with Claude API</h2>
<p>Claude&rsquo;s vision capability lets you send images alongside text and get detailed analysis, code extraction, chart interpretation, or OCR results. You can pass images as base64-encoded data or as URLs. In production, base64 encoding is more reliable — URLs require Claude&rsquo;s servers to fetch the image, which adds latency and can fail on private resources. Claude supports JPEG, PNG, GIF, and WebP. The vision capability is particularly powerful for developer use cases: extract code from screenshots, analyze error logs from terminal screenshots, or process UI mockups for spec generation. Claude Sonnet 4.6 processes images in under 2 seconds for most use cases, making it practical for interactive applications.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> base64
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pathlib <span style="color:#f92672">import</span> Path
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">analyze_image_file</span>(image_path: str, question: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    image_data <span style="color:#f92672">=</span> Path(image_path)<span style="color:#f92672">.</span>read_bytes()
</span></span><span style="display:flex;"><span>    base64_image <span style="color:#f92672">=</span> base64<span style="color:#f92672">.</span>standard_b64encode(image_data)<span style="color:#f92672">.</span>decode(<span style="color:#e6db74">&#34;utf-8&#34;</span>)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Detect media type from extension</span>
</span></span><span style="display:flex;"><span>    ext <span style="color:#f92672">=</span> Path(image_path)<span style="color:#f92672">.</span>suffix<span style="color:#f92672">.</span>lower()
</span></span><span style="display:flex;"><span>    media_type_map <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;.jpg&#34;</span>: <span style="color:#e6db74">&#34;image/jpeg&#34;</span>, <span style="color:#e6db74">&#34;.jpeg&#34;</span>: <span style="color:#e6db74">&#34;image/jpeg&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;.png&#34;</span>: <span style="color:#e6db74">&#34;image/png&#34;</span>, <span style="color:#e6db74">&#34;.gif&#34;</span>: <span style="color:#e6db74">&#34;image/gif&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;.webp&#34;</span>: <span style="color:#e6db74">&#34;image/webp&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>    media_type <span style="color:#f92672">=</span> media_type_map<span style="color:#f92672">.</span>get(ext, <span style="color:#e6db74">&#34;image/jpeg&#34;</span>)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    message <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;image&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;source&#34;</span>: {
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;base64&#34;</span>,
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;media_type&#34;</span>: media_type,
</span></span><span style="display:flex;"><span>                            <span style="color:#e6db74">&#34;data&#34;</span>: base64_image,
</span></span><span style="display:flex;"><span>                        }
</span></span><span style="display:flex;"><span>                    },
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;text&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;text&#34;</span>: question
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> message<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Example: analyze a screenshot</span>
</span></span><span style="display:flex;"><span>result <span style="color:#f92672">=</span> analyze_image_file(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;error_screenshot.png&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;What Python error is shown in this screenshot? How do I fix it?&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>print(result)
</span></span></code></pre></div><h2 id="how-to-implement-prompt-caching-for-cost-reduction">How to Implement Prompt Caching for Cost Reduction</h2>
<p>Prompt caching is the single biggest cost-reduction lever for Claude API users in 2026. When you mark content blocks with <code>&quot;cache_control&quot;: {&quot;type&quot;: &quot;ephemeral&quot;}</code>, Anthropic caches those tokens on their servers for 5 minutes. Subsequent requests that hit the cache pay 90% less for those tokens — from $3/MTok to $0.30/MTok for Sonnet. The pattern works best for system prompts, large document corpora, and conversation histories that stay stable across requests. A typical RAG application that injects 50K tokens of context into every request saves ~$270 per million requests with caching enabled. The <code>usage.cache_read_input_tokens</code> field tells you exactly how many tokens were served from cache vs. recomputed.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Large system prompt that stays constant across requests</span>
</span></span><span style="display:flex;"><span>SYSTEM_PROMPT <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;&#34;You are an expert Python code reviewer with deep knowledge of:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- PEP 8 style guidelines and modern Python idioms
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- Security vulnerabilities (OWASP Top 10, injection attacks, secrets exposure)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- Performance optimization (profiling, algorithmic complexity, memory usage)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">- Testing best practices (pytest, fixtures, mocking, coverage)
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">[... 10,000 more words of domain knowledge ...]&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">review_code_with_caching</span>(code: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">2048</span>,
</span></span><span style="display:flex;"><span>        system<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;text&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;text&#34;</span>: SYSTEM_PROMPT,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;cache_control&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;ephemeral&#34;</span>}  <span style="color:#75715e"># Cache this large prompt</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Review this code:</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">```python</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">{</span>code<span style="color:#e6db74">}</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">```&#34;</span>}
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Log cache performance</span>
</span></span><span style="display:flex;"><span>    usage <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>usage
</span></span><span style="display:flex;"><span>    cached <span style="color:#f92672">=</span> getattr(usage, <span style="color:#e6db74">&#39;cache_read_input_tokens&#39;</span>, <span style="color:#ae81ff">0</span>)
</span></span><span style="display:flex;"><span>    created <span style="color:#f92672">=</span> getattr(usage, <span style="color:#e6db74">&#39;cache_creation_input_tokens&#39;</span>, <span style="color:#ae81ff">0</span>)
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Cache hit: </span><span style="color:#e6db74">{</span>cached<span style="color:#e6db74">}</span><span style="color:#e6db74"> tokens | Cache miss: </span><span style="color:#e6db74">{</span>created<span style="color:#e6db74">}</span><span style="color:#e6db74"> tokens&#34;</span>)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># First call: creates cache (pays full price)</span>
</span></span><span style="display:flex;"><span>review1 <span style="color:#f92672">=</span> review_code_with_caching(<span style="color:#e6db74">&#34;def add(a, b): return a + b&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Second call within 5 minutes: reads from cache (90% cheaper)</span>
</span></span><span style="display:flex;"><span>review2 <span style="color:#f92672">=</span> review_code_with_caching(<span style="color:#e6db74">&#34;def multiply(a, b): return a * b&#34;</span>)
</span></span></code></pre></div><h2 id="how-to-handle-errors-and-rate-limits-in-production">How to Handle Errors and Rate Limits in Production</h2>
<p>Production Claude API integrations must handle four error classes: authentication errors (401), rate limit errors (429), server errors (500/529), and invalid request errors (400). The SDK raises typed exceptions for each: <code>AuthenticationError</code>, <code>RateLimitError</code>, <code>APIStatusError</code>, and <code>BadRequestError</code>. For rate limits, implement exponential backoff with jitter — the SDK&rsquo;s <code>max_retries</code> parameter handles this automatically, but you need custom logic for quota exhaustion (as opposed to burst rate limits). In 2026, Anthropic&rsquo;s rate limits are per-tier: Tier 1 starts at 50,000 tokens/minute, scaling to 4,000,000 tokens/minute at Tier 4. Monitor your <code>x-ratelimit-remaining-tokens</code> response header to implement proactive throttling before hitting limits.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> time
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> random
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> typing <span style="color:#f92672">import</span> Optional
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic(
</span></span><span style="display:flex;"><span>    max_retries<span style="color:#f92672">=</span><span style="color:#ae81ff">3</span>,  <span style="color:#75715e"># SDK handles transient errors automatically</span>
</span></span><span style="display:flex;"><span>    timeout<span style="color:#f92672">=</span><span style="color:#ae81ff">60.0</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">robust_api_call</span>(
</span></span><span style="display:flex;"><span>    messages: list,
</span></span><span style="display:flex;"><span>    model: str <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens: int <span style="color:#f92672">=</span> <span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>    max_attempts: int <span style="color:#f92672">=</span> <span style="color:#ae81ff">5</span>
</span></span><span style="display:flex;"><span>) <span style="color:#f92672">-&gt;</span> Optional[str]:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> attempt <span style="color:#f92672">in</span> range(max_attempts):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>            response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>                model<span style="color:#f92672">=</span>model,
</span></span><span style="display:flex;"><span>                max_tokens<span style="color:#f92672">=</span>max_tokens,
</span></span><span style="display:flex;"><span>                messages<span style="color:#f92672">=</span>messages
</span></span><span style="display:flex;"><span>            )
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>            
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">except</span> anthropic<span style="color:#f92672">.</span>RateLimitError <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> attempt <span style="color:#f92672">==</span> max_attempts <span style="color:#f92672">-</span> <span style="color:#ae81ff">1</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">raise</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Exponential backoff with jitter</span>
</span></span><span style="display:flex;"><span>            wait <span style="color:#f92672">=</span> (<span style="color:#ae81ff">2</span> <span style="color:#f92672">**</span> attempt) <span style="color:#f92672">+</span> random<span style="color:#f92672">.</span>uniform(<span style="color:#ae81ff">0</span>, <span style="color:#ae81ff">1</span>)
</span></span><span style="display:flex;"><span>            print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Rate limited. Waiting </span><span style="color:#e6db74">{</span>wait<span style="color:#e6db74">:</span><span style="color:#e6db74">.1f</span><span style="color:#e6db74">}</span><span style="color:#e6db74">s... (attempt </span><span style="color:#e6db74">{</span>attempt <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span><span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">{</span>max_attempts<span style="color:#e6db74">}</span><span style="color:#e6db74">)&#34;</span>)
</span></span><span style="display:flex;"><span>            time<span style="color:#f92672">.</span>sleep(wait)
</span></span><span style="display:flex;"><span>            
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">except</span> anthropic<span style="color:#f92672">.</span>APIStatusError <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> e<span style="color:#f92672">.</span>status_code <span style="color:#f92672">==</span> <span style="color:#ae81ff">529</span>:  <span style="color:#75715e"># Overloaded</span>
</span></span><span style="display:flex;"><span>                wait <span style="color:#f92672">=</span> <span style="color:#ae81ff">10</span> <span style="color:#f92672">+</span> random<span style="color:#f92672">.</span>uniform(<span style="color:#ae81ff">0</span>, <span style="color:#ae81ff">5</span>)
</span></span><span style="display:flex;"><span>                print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;API overloaded. Waiting </span><span style="color:#e6db74">{</span>wait<span style="color:#e6db74">:</span><span style="color:#e6db74">.1f</span><span style="color:#e6db74">}</span><span style="color:#e6db74">s...&#34;</span>)
</span></span><span style="display:flex;"><span>                time<span style="color:#f92672">.</span>sleep(wait)
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">elif</span> e<span style="color:#f92672">.</span>status_code <span style="color:#f92672">&gt;=</span> <span style="color:#ae81ff">500</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">raise</span>  <span style="color:#75715e"># Server errors after SDK retries</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">raise</span>  <span style="color:#75715e"># 4xx errors are not retriable</span>
</span></span><span style="display:flex;"><span>                
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">except</span> anthropic<span style="color:#f92672">.</span>AuthenticationError:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">raise</span>  <span style="color:#75715e"># Never retry auth errors</span>
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">None</span>
</span></span></code></pre></div><h2 id="how-to-build-a-complete-rag-pipeline-with-claude-api">How to Build a Complete RAG Pipeline with Claude API</h2>
<p>A Retrieval-Augmented Generation (RAG) pipeline with Claude works by injecting relevant document chunks into the prompt at query time. The Claude API&rsquo;s 200K context window means you can inject far more context than GPT-4 alternatives — up to ~150,000 words — without chunking. A production RAG pipeline has four stages: document ingestion (chunk + embed), retrieval (vector similarity search), prompt construction (inject chunks + user query), and generation (Claude API call). In 2026, the most common stack is Claude + pgvector (PostgreSQL) for teams that already run Postgres, or Claude + Pinecone for teams that need managed vector search. The key performance metric is retrieval precision: Claude can work with 20+ retrieved chunks, so cast a wide retrieval net rather than aggressively filtering.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> typing <span style="color:#f92672">import</span> List
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">build_rag_prompt</span>(query: str, retrieved_chunks: List[str]) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    context <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">---</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join(
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;[Document </span><span style="color:#e6db74">{</span>i<span style="color:#f92672">+</span><span style="color:#ae81ff">1</span><span style="color:#e6db74">}</span><span style="color:#e6db74">]</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">{</span>chunk<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> 
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">for</span> i, chunk <span style="color:#f92672">in</span> enumerate(retrieved_chunks)
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;&#34;&#34;Answer the question based on the provided documents.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">If the answer isn&#39;t in the documents, say so clearly.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">DOCUMENTS:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74"></span><span style="color:#e6db74">{</span>context<span style="color:#e6db74">}</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">QUESTION: </span><span style="color:#e6db74">{</span>query<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">rag_query</span>(query: str, vector_db_results: List[str]) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    prompt <span style="color:#f92672">=</span> build_rag_prompt(query, vector_db_results)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">2048</span>,
</span></span><span style="display:flex;"><span>        system<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You are a precise research assistant. Cite document numbers when referencing sources.&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: prompt}]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Example usage</span>
</span></span><span style="display:flex;"><span>chunks <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Claude Sonnet 4.6 was released in Q1 2026 with improved reasoning...&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;The Claude API pricing is $3 per million input tokens for Sonnet...&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Prompt caching reduces costs by 90</span><span style="color:#e6db74">% f</span><span style="color:#e6db74">or cached token reads...&#34;</span>
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>answer <span style="color:#f92672">=</span> rag_query(<span style="color:#e6db74">&#34;How much does Claude API cost?&#34;</span>, chunks)
</span></span><span style="display:flex;"><span>print(answer)
</span></span></code></pre></div><h2 id="claude-api-vs-openai-api-key-differences-for-python-developers">Claude API vs OpenAI API: Key Differences for Python Developers</h2>
<p>Choosing between Claude API and OpenAI API in Python comes down to three factors: context window size, cost structure, and ecosystem maturity. Claude offers a 200K-token context window versus GPT-4o&rsquo;s 128K, making it the clear choice for long-document workflows, large codebase analysis, and extended conversations. On cost, Claude Sonnet 4.6 at $3/MTok input is competitive with GPT-4o at $2.50/MTok, but Claude&rsquo;s prompt caching (90% discount on cached tokens) means real-world costs for cache-friendly workloads are dramatically lower. OpenAI has a larger third-party ecosystem — more tutorials, more LangChain examples, more community plugins — but the Claude SDK is syntactically similar enough that migration takes hours, not days. The table below shows the exact API surface differences Python developers encounter when switching.</p>
<table>
  <thead>
      <tr>
          <th>Feature</th>
          <th>Claude API</th>
          <th>OpenAI API</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Context window</td>
          <td>200K tokens</td>
          <td>128K tokens (GPT-4o)</td>
      </tr>
      <tr>
          <td>Streaming</td>
          <td><code>client.messages.stream()</code></td>
          <td><code>client.chat.completions.create(stream=True)</code></td>
      </tr>
      <tr>
          <td>Tool use</td>
          <td><code>tools</code> parameter</td>
          <td><code>tools</code> / <code>functions</code> parameter</td>
      </tr>
      <tr>
          <td>System prompt</td>
          <td>Top-level <code>system</code> param</td>
          <td>First message with <code>role: &quot;system&quot;</code></td>
      </tr>
      <tr>
          <td>Prompt caching</td>
          <td>Native, 90% discount</td>
          <td>No native caching</td>
      </tr>
      <tr>
          <td>Image input</td>
          <td><code>image</code> content block</td>
          <td>Same pattern</td>
      </tr>
      <tr>
          <td>Python SDK</td>
          <td><code>anthropic</code></td>
          <td><code>openai</code></td>
      </tr>
      <tr>
          <td>Rate limits</td>
          <td>Token-based per minute</td>
          <td>Request + token-based</td>
      </tr>
  </tbody>
</table>
<p>The migration from OpenAI to Claude API requires three changes: swap the client class, rename <code>chat.completions.create</code> to <code>messages.create</code>, and move the system message from the messages array to the top-level <code>system</code> parameter. Most production migrations take under 2 hours for a simple application.</p>
<h2 id="faq">FAQ</h2>
<p>The most common questions developers have about the Claude API in Python center on cost, rate limits, and framework compatibility. Below are answers to the five questions that come up in every developer forum thread and Stack Overflow post about Claude API Python integration. These answers reflect the current state of the API as of April 2026 — check the Anthropic documentation for the latest rate limits and pricing tiers since both change as Anthropic scales. If you&rsquo;re migrating from another LLM provider or just getting started, these answers will save you the 2-3 hours of trial and error that most developers go through when first integrating Claude into a Python project. All code examples in the answers below work with <code>anthropic</code> SDK version 0.51.x or newer and have been tested against the production Claude API endpoints.</p>
<h3 id="how-much-does-the-claude-api-cost-in-python">How much does the Claude API cost in Python?</h3>
<p>Claude API pricing as of 2026: Haiku 4.5 costs $0.80/MTok input and $4/MTok output. Sonnet 4.6 costs $3/MTok input and $15/MTok output. Opus 4.7 costs $15/MTok input and $75/MTok output. With prompt caching enabled, cached input tokens cost 90% less. A typical chatbot processing 1M tokens/month costs roughly $3-15 depending on model choice.</p>
<h3 id="is-there-a-free-tier-for-the-claude-api">Is there a free tier for the Claude API?</h3>
<p>Anthropic offers new accounts $5 in free API credits to test the API. Beyond that, it&rsquo;s pay-as-you-go with no free tier. For prototyping on a budget, use Claude Haiku 4.5 — it&rsquo;s 75% cheaper than Sonnet while handling most straightforward tasks well.</p>
<h3 id="how-do-i-handle-claude-api-timeouts-in-python">How do I handle Claude API timeouts in Python?</h3>
<p>Set <code>timeout=60.0</code> on the <code>Anthropic()</code> client for a 60-second total timeout. For streaming requests that may run long, pass <code>timeout=httpx.Timeout(60.0, read=300.0)</code> to allow a longer read timeout while keeping the connection timeout short. The SDK raises <code>anthropic.APITimeoutError</code> on timeout, which you should catch and retry.</p>
<h3 id="can-i-use-the-claude-api-with-langchain-or-llamaindex">Can I use the Claude API with LangChain or LlamaIndex?</h3>
<p>Yes. Both LangChain and LlamaIndex have native Claude integrations. LangChain uses <code>ChatAnthropic</code> from <code>langchain_anthropic</code>. LlamaIndex uses <code>Anthropic</code> from <code>llama_index.llms.anthropic</code>. Both support streaming and tool use. However, using the raw <code>anthropic</code> SDK directly gives you access to features like prompt caching that framework wrappers sometimes lag in supporting.</p>
<h3 id="what-is-the-maximum-context-window-for-claude-api-in-python">What is the maximum context window for Claude API in Python?</h3>
<p>Claude&rsquo;s maximum context window is 200,000 tokens (~150,000 words). This covers a full codebase, a legal document, or many hours of transcript. The <code>max_tokens</code> parameter controls the <em>output</em> size — set it based on expected response length, not the input size. You pay for input tokens regardless of <code>max_tokens</code>; you only pay for output tokens that are actually generated.</p>
]]></content:encoded></item></channel></rss>