<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Llm Function Calling on RockB</title><link>https://baeseokjae.github.io/tags/llm-function-calling/</link><description>Recent content in Llm Function Calling on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 27 Apr 2026 14:36:31 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/llm-function-calling/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Function Calling and Tool Use Guide 2026: OpenAI, Anthropic, Google</title><link>https://baeseokjae.github.io/posts/llm-function-calling-tool-use-guide-2026/</link><pubDate>Mon, 27 Apr 2026 14:36:31 +0000</pubDate><guid>https://baeseokjae.github.io/posts/llm-function-calling-tool-use-guide-2026/</guid><description>Complete 2026 guide to LLM function calling across OpenAI, Anthropic, and Google Gemini—with code, security, and production patterns.</description><content:encoded><![CDATA[<p>Function calling is the bridge between a language model&rsquo;s text output and the real world. Instead of asking a model to guess what the weather is, you hand it a <code>get_weather</code> tool definition, and it decides when to call it, what arguments to pass, and how to incorporate the result. As of 2026, every major provider—OpenAI, Anthropic, and Google—supports this pattern, but the APIs look meaningfully different. This guide walks through each one with working Python code and covers parallel calls, agent loops, security, and how to pick the right approach.</p>
<h2 id="what-is-llm-function-calling-and-why-does-it-matter">What Is LLM Function Calling and Why Does It Matter?</h2>
<p>LLM function calling is a structured protocol that lets a language model request execution of external functions during inference—passing typed arguments back to your application, which runs the function and returns results for the model to reason about. Unlike raw text completion, function calling gives the model a typed interface to the real world: databases, APIs, file systems, payment processors. OpenAI introduced the concept in June 2023 under the name &ldquo;function calling,&rdquo; since renamed to the <code>tools</code> parameter. By 2026, this capability is the core primitive for every production AI agent. The LLMCompiler paper (ICML 2024) showed that parallel tool calls reduce end-to-end latency by up to 3.7x compared to sequential execution. Tool use also carries real costs: each tool definition adds 100–300 input tokens, so a system with 15 tools adds roughly 1,500–4,500 tokens to every request. Understanding format differences, security boundaries, and performance patterns is no longer optional for teams shipping AI features.</p>
<h3 id="why-all-three-providers-matter">Why All Three Providers Matter</h3>
<p>OpenAI, Anthropic, and Google each have production deployments with billions of API calls per month. They share the same conceptual model—define tool, detect call, execute, return result—but differ enough in JSON schema and response structure that copy-paste code will break. Teams that lock into one provider today often need to migrate later. Knowing all three formats also helps you choose correctly: Anthropic&rsquo;s server-side tools (web search, code execution) remove entire categories of infrastructure work; Google&rsquo;s streaming argument support cuts latency in real-time UIs; OpenAI&rsquo;s <code>strict: true</code> mode guarantees schema-valid outputs at the cost of parallel call support.</p>
<h2 id="the-universal-5-step-pattern-that-works-across-all-providers">The Universal 5-Step Pattern That Works Across All Providers</h2>
<p>Every function calling implementation follows the same five steps, regardless of which LLM you use. First, <strong>define your tools</strong> as structured schemas describing function names, descriptions, and parameter types. Second, <strong>send a request</strong> with both the user message and the tools array attached. Third, <strong>detect a tool call</strong> in the response—the model returns a structured object instead of plain text when it decides to invoke a function. Fourth, <strong>execute the function</strong> in your application code and capture the result. Fifth, <strong>return the result</strong> to the model in a follow-up request so it can formulate a final answer. This loop can repeat multiple times—multi-step agent patterns chain dozens of tool calls before returning to the user. The difference between providers is entirely in the JSON structure of steps 1, 3, and 5. Step 4 is always pure Python (or whatever language you use), and the result you return is just a string or structured object. Mastering the universal pattern first makes provider-specific syntax a minor detail rather than a conceptual hurdle.</p>
<h3 id="core-vocabulary">Core Vocabulary</h3>
<table>
  <thead>
      <tr>
          <th>Term</th>
          <th>Meaning</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Tool / Function</td>
          <td>A callable action the model can request</td>
      </tr>
      <tr>
          <td>Tool definition</td>
          <td>JSON schema describing the function&rsquo;s name, description, and parameters</td>
      </tr>
      <tr>
          <td>Tool call</td>
          <td>The model&rsquo;s structured request to invoke a specific function</td>
      </tr>
      <tr>
          <td>Tool result</td>
          <td>Your application&rsquo;s response after executing the function</td>
      </tr>
      <tr>
          <td>Parallel tool calls</td>
          <td>Multiple tool calls in a single model turn</td>
      </tr>
      <tr>
          <td>Agent loop</td>
          <td>Repeated tool call / result cycles until the model produces a final answer</td>
      </tr>
  </tbody>
</table>
<h2 id="openai-implementation-tools-array-strict-mode-and-parallel-calls">OpenAI Implementation: Tools Array, Strict Mode, and Parallel Calls</h2>
<p>OpenAI&rsquo;s function calling API uses a <code>tools</code> parameter that accepts an array of objects, each with <code>type: &quot;function&quot;</code> and a nested <code>function</code> object containing <code>name</code>, <code>description</code>, and <code>parameters</code> (a JSON Schema object). When the model decides to call a function, the response&rsquo;s <code>choices[0].finish_reason</code> is <code>&quot;tool_calls&quot;</code> and <code>choices[0].message.tool_calls</code> contains an array of call objects. Each call has an <code>id</code>, a <code>type</code>, and a <code>function</code> sub-object with <code>name</code> and <code>arguments</code> (a JSON string). You execute the function, then append both the assistant message and a <code>tool</code> role message to the conversation history before the next request. OpenAI&rsquo;s <code>strict: true</code> mode—added in late 2024—enforces exact schema compliance on arguments, eliminating the hallucinated-field class of bugs. The tradeoff: <code>strict</code> mode is incompatible with parallel tool calls, so you pick one guarantee or the other. As of 2026, the <code>gpt-4.1</code> series and <code>o3</code> models all support parallel tool calls when <code>strict</code> is disabled, enabling the 3.7x latency improvement from the LLMCompiler paper.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> openai <span style="color:#f92672">import</span> OpenAI
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OpenAI()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;function&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;function&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;get_weather&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Get current weather for a city&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;parameters&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;city&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;City name&#34;</span>},
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;unit&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;enum&#34;</span>: [<span style="color:#e6db74">&#34;celsius&#34;</span>, <span style="color:#e6db74">&#34;fahrenheit&#34;</span>]}
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;city&#34;</span>]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4.1&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;What&#39;s the weather in Tokyo and Berlin?&#34;</span>}],
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>tools,
</span></span><span style="display:flex;"><span>    tool_choice<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;auto&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>message <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> message<span style="color:#f92672">.</span>tool_calls:
</span></span><span style="display:flex;"><span>    results <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> call <span style="color:#f92672">in</span> message<span style="color:#f92672">.</span>tool_calls:
</span></span><span style="display:flex;"><span>        args <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>loads(call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>arguments)
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Execute your actual function here</span>
</span></span><span style="display:flex;"><span>        result <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;temperature&#34;</span>: <span style="color:#ae81ff">22</span>, <span style="color:#e6db74">&#34;condition&#34;</span>: <span style="color:#e6db74">&#34;sunny&#34;</span>}
</span></span><span style="display:flex;"><span>        results<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;tool&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;tool_call_id&#34;</span>: call<span style="color:#f92672">.</span>id,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;content&#34;</span>: json<span style="color:#f92672">.</span>dumps(result)
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    final <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4.1&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;What&#39;s the weather in Tokyo and Berlin?&#34;</span>},
</span></span><span style="display:flex;"><span>            message,
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">*</span>results
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>        tools<span style="color:#f92672">=</span>tools
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    print(final<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message<span style="color:#f92672">.</span>content)
</span></span></code></pre></div><h3 id="openai-tool-choice-control">OpenAI Tool Choice Control</h3>
<p>The <code>tool_choice</code> parameter controls whether the model must use tools. <code>&quot;auto&quot;</code> lets it decide, <code>&quot;required&quot;</code> forces at least one call, <code>{&quot;type&quot;: &quot;function&quot;, &quot;function&quot;: {&quot;name&quot;: &quot;get_weather&quot;}}</code> forces a specific function. Setting <code>&quot;none&quot;</code> disables tool calling entirely, useful when you want a plain text response after results are returned.</p>
<h2 id="anthropic-tool-use-architecture-input-schema-content-blocks-and-server-side-tools">Anthropic Tool Use Architecture: Input Schema, Content Blocks, and Server-Side Tools</h2>
<p>Anthropic&rsquo;s tool use API differs from OpenAI in three important ways: tools are defined with <code>input_schema</code> instead of <code>parameters</code>, responses use typed content blocks instead of a <code>tool_calls</code> array, and Anthropic uniquely offers server-side built-in tools that run inside the API without any execution code on your end. When Claude decides to use a tool, <code>stop_reason</code> is <code>&quot;tool_use&quot;</code> and the <code>content</code> array contains a block of <code>type: &quot;tool_use&quot;</code> alongside any text blocks. You match on <code>type == &quot;tool_use&quot;</code> to extract <code>name</code> and <code>input</code> (already a parsed dict, not a JSON string). Results go back as a <code>tool_result</code> content block in a new <code>user</code> turn. The most distinctive Anthropic feature is built-in server-side tools: <code>web_search</code>, <code>code_execution</code> (sandboxed Python), and <code>text_editor</code>. Declaring <code>{&quot;type&quot;: &quot;web_search_20250305&quot;, &quot;name&quot;: &quot;web_search&quot;}</code> in the tools list gives Claude live internet access with zero infrastructure on your end—Anthropic runs the search and returns results as part of the content stream. This removes entire categories of retrieval infrastructure that OpenAI and Google customers must build themselves.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> anthropic
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> anthropic<span style="color:#f92672">.</span>Anthropic()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;get_weather&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Get current weather for a city&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;input_schema&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;city&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;City name&#34;</span>},
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;unit&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;enum&#34;</span>: [<span style="color:#e6db74">&#34;celsius&#34;</span>, <span style="color:#e6db74">&#34;fahrenheit&#34;</span>]}
</span></span><span style="display:flex;"><span>            },
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;city&#34;</span>]
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>tools,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Weather in Paris?&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> response<span style="color:#f92672">.</span>stop_reason <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;tool_use&#34;</span>:
</span></span><span style="display:flex;"><span>    tool_use <span style="color:#f92672">=</span> next(b <span style="color:#66d9ef">for</span> b <span style="color:#f92672">in</span> response<span style="color:#f92672">.</span>content <span style="color:#66d9ef">if</span> b<span style="color:#f92672">.</span>type <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;tool_use&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Execute your function with tool_use.input (already a dict)</span>
</span></span><span style="display:flex;"><span>    result <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;temperature&#34;</span>: <span style="color:#ae81ff">18</span>, <span style="color:#e6db74">&#34;condition&#34;</span>: <span style="color:#e6db74">&#34;cloudy&#34;</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    final <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>        max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>        tools<span style="color:#f92672">=</span>tools,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Weather in Paris?&#34;</span>},
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: response<span style="color:#f92672">.</span>content},
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: [
</span></span><span style="display:flex;"><span>                    {
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;tool_result&#34;</span>,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;tool_use_id&#34;</span>: tool_use<span style="color:#f92672">.</span>id,
</span></span><span style="display:flex;"><span>                        <span style="color:#e6db74">&#34;content&#34;</span>: str(result)
</span></span><span style="display:flex;"><span>                    }
</span></span><span style="display:flex;"><span>                ]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    print(final<span style="color:#f92672">.</span>content[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>text)
</span></span></code></pre></div><h3 id="anthropic-server-side-tools">Anthropic Server-Side Tools</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Web search with no infrastructure required</span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>messages<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-opus-4-7&#34;</span>,
</span></span><span style="display:flex;"><span>    max_tokens<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span>,
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;web_search_20250305&#34;</span>, <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;web_search&#34;</span>}],
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Latest AI news today?&#34;</span>}]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Claude handles search internally — no tool_use loop needed</span>
</span></span></code></pre></div><h2 id="google-gemini-function-declarations-protocol-buffer-types-and-streaming">Google Gemini Function Declarations: Protocol Buffer Types and Streaming</h2>
<p>Google Gemini uses <code>FunctionDeclaration</code> objects inside a <code>Tool</code> wrapper, passed via the <code>tools</code> parameter. The type system derives from Protocol Buffers: <code>STRING</code>, <code>NUMBER</code>, <code>BOOLEAN</code>, <code>ARRAY</code>, <code>OBJECT</code> (all caps, from <code>google.generativeai.types</code>). When Gemini returns a function call, <code>response.candidates[0].content.parts</code> contains a <code>Part</code> with a <code>function_call</code> attribute holding <code>name</code> and <code>args</code> (a dict). Results go back as a <code>Part</code> with a <code>function_response</code> attribute. Gemini 2.5+ adds streaming function call arguments—arguments arrive incrementally as tokens generate, enabling your UI to show progress before execution completes. This is particularly valuable for latency-sensitive applications where users wait for agent responses. Google also supports <code>ANY</code> mode for <code>tool_config</code> (equivalent to OpenAI&rsquo;s <code>&quot;required&quot;</code>) and specific function forcing via <code>allowed_function_names</code>. Gemini&rsquo;s automatic function calling mode can execute Python functions directly using reflection, though production systems should handle execution explicitly for auditability.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> google.generativeai <span style="color:#66d9ef">as</span> genai
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>genai<span style="color:#f92672">.</span>configure(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;YOUR_API_KEY&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Tool(
</span></span><span style="display:flex;"><span>    function_declarations<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>        genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>FunctionDeclaration(
</span></span><span style="display:flex;"><span>            name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;get_weather&#34;</span>,
</span></span><span style="display:flex;"><span>            description<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;Get current weather for a city&#34;</span>,
</span></span><span style="display:flex;"><span>            parameters<span style="color:#f92672">=</span>genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Schema(
</span></span><span style="display:flex;"><span>                type<span style="color:#f92672">=</span>genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Type<span style="color:#f92672">.</span>OBJECT,
</span></span><span style="display:flex;"><span>                properties<span style="color:#f92672">=</span>{
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;city&#34;</span>: genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Schema(type<span style="color:#f92672">=</span>genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Type<span style="color:#f92672">.</span>STRING),
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;unit&#34;</span>: genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Schema(
</span></span><span style="display:flex;"><span>                        type<span style="color:#f92672">=</span>genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Type<span style="color:#f92672">.</span>STRING,
</span></span><span style="display:flex;"><span>                        enum<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;celsius&#34;</span>, <span style="color:#e6db74">&#34;fahrenheit&#34;</span>]
</span></span><span style="display:flex;"><span>                    )
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                required<span style="color:#f92672">=</span>[<span style="color:#e6db74">&#34;city&#34;</span>]
</span></span><span style="display:flex;"><span>            )
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>model <span style="color:#f92672">=</span> genai<span style="color:#f92672">.</span>GenerativeModel(<span style="color:#e6db74">&#34;gemini-2.5-pro&#34;</span>, tools<span style="color:#f92672">=</span>[tools])
</span></span><span style="display:flex;"><span>chat <span style="color:#f92672">=</span> model<span style="color:#f92672">.</span>start_chat()
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> chat<span style="color:#f92672">.</span>send_message(<span style="color:#e6db74">&#34;Weather in Sydney?&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>part <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>candidates[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>content<span style="color:#f92672">.</span>parts[<span style="color:#ae81ff">0</span>]
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> hasattr(part, <span style="color:#e6db74">&#34;function_call&#34;</span>):
</span></span><span style="display:flex;"><span>    fc <span style="color:#f92672">=</span> part<span style="color:#f92672">.</span>function_call
</span></span><span style="display:flex;"><span>    result <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;temperature&#34;</span>: <span style="color:#ae81ff">25</span>, <span style="color:#e6db74">&#34;condition&#34;</span>: <span style="color:#e6db74">&#34;sunny&#34;</span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">from</span> google.protobuf.struct_pb2 <span style="color:#f92672">import</span> Struct
</span></span><span style="display:flex;"><span>    response_struct <span style="color:#f92672">=</span> Struct()
</span></span><span style="display:flex;"><span>    response_struct<span style="color:#f92672">.</span>update(result)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    final <span style="color:#f92672">=</span> chat<span style="color:#f92672">.</span>send_message(
</span></span><span style="display:flex;"><span>        genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>content_types<span style="color:#f92672">.</span>to_contents(
</span></span><span style="display:flex;"><span>            genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>Part(
</span></span><span style="display:flex;"><span>                function_response<span style="color:#f92672">=</span>genai<span style="color:#f92672">.</span>types<span style="color:#f92672">.</span>FunctionResponse(
</span></span><span style="display:flex;"><span>                    name<span style="color:#f92672">=</span>fc<span style="color:#f92672">.</span>name,
</span></span><span style="display:flex;"><span>                    response<span style="color:#f92672">=</span>response_struct
</span></span><span style="display:flex;"><span>                )
</span></span><span style="display:flex;"><span>            )
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    print(final<span style="color:#f92672">.</span>text)
</span></span></code></pre></div><h2 id="side-by-side-provider-comparison">Side-by-Side Provider Comparison</h2>
<p>The three providers share conceptual structure but differ significantly in API surface, type systems, unique capabilities, and operational tradeoffs that matter in production. OpenAI&rsquo;s <code>tools</code> parameter uses standard JSON Schema with a <code>parameters</code> key, returns arguments as a JSON string that needs manual parsing, and uniquely offers <code>strict: true</code> mode for guaranteed schema-valid outputs. Anthropic uses <code>input_schema</code> instead of <code>parameters</code>, returns arguments as a pre-parsed Python dict (eliminating a common source of parse errors), and is the only provider offering server-side built-in tools like <code>web_search</code> and <code>code_execution</code> that require zero infrastructure. Google Gemini uses Protocol Buffer-derived types (<code>STRING</code>, <code>NUMBER</code>, <code>OBJECT</code>) rather than JSON Schema keywords, and Gemini 2.5+ uniquely supports streaming function call arguments—arguments arrive token-by-token as they generate, which cuts time-to-execution in latency-sensitive UIs. All three support parallel tool calls, though OpenAI&rsquo;s strict mode disables this feature.</p>
<table>
  <thead>
      <tr>
          <th>Feature</th>
          <th>OpenAI</th>
          <th>Anthropic</th>
          <th>Google Gemini</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema key</td>
          <td><code>parameters</code> (JSON Schema)</td>
          <td><code>input_schema</code> (JSON Schema)</td>
          <td><code>FunctionDeclaration</code> (Protobuf types)</td>
      </tr>
      <tr>
          <td>Tool call detection</td>
          <td><code>finish_reason == &quot;tool_calls&quot;</code></td>
          <td><code>stop_reason == &quot;tool_use&quot;</code></td>
          <td>Part has <code>function_call</code> attr</td>
      </tr>
      <tr>
          <td>Arguments format</td>
          <td>JSON string (needs parsing)</td>
          <td>Dict (pre-parsed)</td>
          <td>Proto Struct (dict-like)</td>
      </tr>
      <tr>
          <td>Result format</td>
          <td><code>tool</code> role message</td>
          <td><code>tool_result</code> content block</td>
          <td><code>function_response</code> Part</td>
      </tr>
      <tr>
          <td>Parallel calls</td>
          <td>Yes (not with strict mode)</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Strict schema mode</td>
          <td><code>strict: true</code></td>
          <td>N/A</td>
          <td>N/A</td>
      </tr>
      <tr>
          <td>Server-side tools</td>
          <td>No</td>
          <td>Yes (web_search, code_exec)</td>
          <td>No</td>
      </tr>
      <tr>
          <td>Streaming args</td>
          <td>No</td>
          <td>No</td>
          <td>Yes (Gemini 2.5+)</td>
      </tr>
      <tr>
          <td>Forced tool use</td>
          <td><code>tool_choice: &quot;required&quot;</code></td>
          <td><code>tool_choice: {&quot;type&quot;: &quot;any&quot;}</code></td>
          <td><code>tool_config: ANY</code></td>
      </tr>
  </tbody>
</table>
<h3 id="token-cost-implications">Token Cost Implications</h3>
<p>Each tool definition consumes input tokens. A typical tool with a medium-complexity schema costs 100–300 tokens. A system with 20 tools adds 2,000–6,000 tokens per request—roughly $0.01–$0.06 at current pricing on GPT-4.1. For high-volume applications (millions of requests/day), tool count is a meaningful cost lever. Strategies: remove unused tools, consolidate related tools into one with an <code>action</code> enum parameter, and cache system prompts (Anthropic prompt caching reduces repeated tool definition costs by ~90%).</p>
<h2 id="parallel-tool-calls-37x-latency-improvement-in-practice">Parallel Tool Calls: 3.7x Latency Improvement in Practice</h2>
<p>Parallel tool calls are the single highest-impact performance optimization for agentic systems. The LLMCompiler paper (ICML 2024) demonstrated up to 3.7x latency reduction by executing independent tool calls concurrently rather than waiting for each to complete before starting the next. All three major providers return multiple tool call objects in a single response when the model determines calls are independent. Your application executes them concurrently using <code>asyncio.gather</code> or <code>ThreadPoolExecutor</code>, then returns all results in a single follow-up request. The model sees all results simultaneously and synthesizes a final answer. The pattern requires identifying which tool calls truly are independent—calls where output B depends on output A must remain sequential. The model usually gets this right, but you should validate in production and implement timeout handling so a slow tool doesn&rsquo;t block all parallel results.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> openai <span style="color:#f92672">import</span> AsyncOpenAI
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> AsyncOpenAI()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">execute_tool</span>(call):
</span></span><span style="display:flex;"><span>    args <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>loads(call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>arguments)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>name <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;get_weather&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>sleep(<span style="color:#ae81ff">0.1</span>)  <span style="color:#75715e"># Simulated API call</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> {<span style="color:#e6db74">&#34;temperature&#34;</span>: <span style="color:#ae81ff">22</span>, <span style="color:#e6db74">&#34;city&#34;</span>: args[<span style="color:#e6db74">&#34;city&#34;</span>]}
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>name <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;get_stock_price&#34;</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>sleep(<span style="color:#ae81ff">0.15</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> {<span style="color:#e6db74">&#34;price&#34;</span>: <span style="color:#ae81ff">150.5</span>, <span style="color:#e6db74">&#34;symbol&#34;</span>: args[<span style="color:#e6db74">&#34;symbol&#34;</span>]}
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> {}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">run_parallel_tools</span>():
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>        model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4.1&#34;</span>,
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Weather in NYC and AAPL stock price?&#34;</span>}],
</span></span><span style="display:flex;"><span>        tools<span style="color:#f92672">=</span>tools  <span style="color:#75715e"># Define your tools array here</span>
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    message <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> message<span style="color:#f92672">.</span>tool_calls:
</span></span><span style="display:flex;"><span>        results <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>gather(
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">*</span>[execute_tool(call) <span style="color:#66d9ef">for</span> call <span style="color:#f92672">in</span> message<span style="color:#f92672">.</span>tool_calls]
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>        tool_messages <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>            {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;tool&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;tool_call_id&#34;</span>: call<span style="color:#f92672">.</span>id,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: json<span style="color:#f92672">.</span>dumps(result)
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">for</span> call, result <span style="color:#f92672">in</span> zip(message<span style="color:#f92672">.</span>tool_calls, results)
</span></span><span style="display:flex;"><span>        ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        final <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>            model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4.1&#34;</span>,
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>                {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Weather in NYC and AAPL stock price?&#34;</span>},
</span></span><span style="display:flex;"><span>                message,
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">*</span>tool_messages
</span></span><span style="display:flex;"><span>            ]
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> final<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message<span style="color:#f92672">.</span>content
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>asyncio<span style="color:#f92672">.</span>run(run_parallel_tools())
</span></span></code></pre></div><h2 id="multi-step-agent-loops-building-production-ready-systems">Multi-Step Agent Loops: Building Production-Ready Systems</h2>
<p>A multi-step agent loop runs the tool call / result cycle until the model produces a final answer with no pending tool calls. Every production agent needs three safeguards that most tutorials skip: a <strong>maximum iteration limit</strong> (prevents infinite loops from hallucinated or broken tools), <strong>timeout handling</strong> (prevents one slow external API from hanging the entire agent), and <strong>loop detection</strong> (prevents the model from calling the same tool with the same arguments repeatedly). A reasonable production ceiling is 15 iterations with a 30-second per-tool timeout and a 120-second total loop timeout. Beyond these hard limits, implement soft controls: log every tool call with arguments and results, alert on loops that exceed 8 iterations (usually indicates a confused model or broken tool), and expose a kill switch for human override.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> json
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> time
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> openai <span style="color:#f92672">import</span> OpenAI
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OpenAI()
</span></span><span style="display:flex;"><span>MAX_ITERATIONS <span style="color:#f92672">=</span> <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span>LOOP_TIMEOUT <span style="color:#f92672">=</span> <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">run_agent</span>(user_message: str, tools: list, tool_registry: dict) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    messages <span style="color:#f92672">=</span> [{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: user_message}]
</span></span><span style="display:flex;"><span>    start_time <span style="color:#f92672">=</span> time<span style="color:#f92672">.</span>time()
</span></span><span style="display:flex;"><span>    seen_calls <span style="color:#f92672">=</span> set()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> iteration <span style="color:#f92672">in</span> range(MAX_ITERATIONS):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> time<span style="color:#f92672">.</span>time() <span style="color:#f92672">-</span> start_time <span style="color:#f92672">&gt;</span> LOOP_TIMEOUT:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;Agent timeout: exceeded 120 seconds&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>            model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4.1&#34;</span>,
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">=</span>messages,
</span></span><span style="display:flex;"><span>            tools<span style="color:#f92672">=</span>tools
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        message <span style="color:#f92672">=</span> response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message
</span></span><span style="display:flex;"><span>        messages<span style="color:#f92672">.</span>append(message)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>finish_reason <span style="color:#f92672">!=</span> <span style="color:#e6db74">&#34;tool_calls&#34;</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> message<span style="color:#f92672">.</span>content
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">for</span> call <span style="color:#f92672">in</span> message<span style="color:#f92672">.</span>tool_calls:
</span></span><span style="display:flex;"><span>            call_key <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">{</span>call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">:</span><span style="color:#e6db74">{</span>call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>arguments<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> call_key <span style="color:#f92672">in</span> seen_calls:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;Loop detected: same tool called with same args twice&#34;</span>
</span></span><span style="display:flex;"><span>            seen_calls<span style="color:#f92672">.</span>add(call_key)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            func <span style="color:#f92672">=</span> tool_registry<span style="color:#f92672">.</span>get(call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>name)
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> func:
</span></span><span style="display:flex;"><span>                result <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: unknown function </span><span style="color:#e6db74">{</span>call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">else</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>                    args <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>loads(call<span style="color:#f92672">.</span>function<span style="color:#f92672">.</span>arguments)
</span></span><span style="display:flex;"><span>                    result <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>dumps(func(<span style="color:#f92672">**</span>args))
</span></span><span style="display:flex;"><span>                <span style="color:#66d9ef">except</span> <span style="color:#a6e22e">Exception</span> <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>                    result <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: </span><span style="color:#e6db74">{</span>e<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            messages<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;tool&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;tool_call_id&#34;</span>: call<span style="color:#f92672">.</span>id,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: result
</span></span><span style="display:flex;"><span>            })
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;Agent exceeded maximum iterations&#34;</span>
</span></span></code></pre></div><h2 id="error-handling-malformed-arguments-hallucinated-functions-and-recovery">Error Handling: Malformed Arguments, Hallucinated Functions, and Recovery</h2>
<p>Production function calling systems fail in four distinct ways, each requiring a different recovery strategy. <strong>Malformed arguments</strong> occur when the model generates JSON that doesn&rsquo;t validate against your schema—use Pydantic validation and return a structured error string (not an exception) so the model can retry with corrected arguments. <strong>Hallucinated function names</strong> happen when the model calls a function that doesn&rsquo;t exist in your registry—detect with a dictionary lookup and return <code>&quot;Error: unknown function X&quot;</code> to prompt self-correction. <strong>Timeout failures</strong> occur when external APIs are slow—implement per-call timeouts with <code>asyncio.wait_for</code> and return a timeout error string so the model can acknowledge the failure. <strong>Unexpected results</strong> happen when a function returns data in an unexpected format—validate outputs before returning them to the model and normalize structure. One critical rule: never raise a Python exception inside the tool execution path of an agent loop. Always catch exceptions, convert to error strings, and return them as tool results. This keeps the conversation alive and gives the model a chance to recover or escalate to the user.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> pydantic <span style="color:#f92672">import</span> BaseModel, ValidationError
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">WeatherArgs</span>(BaseModel):
</span></span><span style="display:flex;"><span>    city: str
</span></span><span style="display:flex;"><span>    unit: str <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;celsius&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">safe_execute_tool</span>(name: str, arguments: str, timeout: float <span style="color:#f92672">=</span> <span style="color:#ae81ff">10.0</span>) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    validators <span style="color:#f92672">=</span> {<span style="color:#e6db74">&#34;get_weather&#34;</span>: WeatherArgs}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> name <span style="color:#f92672">not</span> <span style="color:#f92672">in</span> tool_registry:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: unknown function &#39;</span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>        args_dict <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>loads(arguments)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">except</span> json<span style="color:#f92672">.</span>JSONDecodeError <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: invalid JSON arguments — </span><span style="color:#e6db74">{</span>e<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> name <span style="color:#f92672">in</span> validators:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>            validated <span style="color:#f92672">=</span> validators[name](<span style="color:#f92672">**</span>args_dict)
</span></span><span style="display:flex;"><span>            args_dict <span style="color:#f92672">=</span> validated<span style="color:#f92672">.</span>dict()
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">except</span> ValidationError <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: argument validation failed — </span><span style="color:#e6db74">{</span>e<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">try</span>:
</span></span><span style="display:flex;"><span>        result <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> asyncio<span style="color:#f92672">.</span>wait_for(
</span></span><span style="display:flex;"><span>            asyncio<span style="color:#f92672">.</span>to_thread(tool_registry[name], <span style="color:#f92672">**</span>args_dict),
</span></span><span style="display:flex;"><span>            timeout<span style="color:#f92672">=</span>timeout
</span></span><span style="display:flex;"><span>        )
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> json<span style="color:#f92672">.</span>dumps(result)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">except</span> asyncio<span style="color:#f92672">.</span>TimeoutError:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> timed out after </span><span style="color:#e6db74">{</span>timeout<span style="color:#e6db74">}</span><span style="color:#e6db74">s&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">except</span> <span style="color:#a6e22e">Exception</span> <span style="color:#66d9ef">as</span> e:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Error: </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74"> raised </span><span style="color:#e6db74">{</span>type(e)<span style="color:#f92672">.</span>__name__<span style="color:#e6db74">}</span><span style="color:#e6db74">: </span><span style="color:#e6db74">{</span>e<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><h2 id="security-first-prompt-injection-owasp-llm06-and-authorization">Security First: Prompt Injection, OWASP LLM06, and Authorization</h2>
<p>Function calling is OWASP LLM06 (Excessive Agency)—the primary attack vector is prompt injection via tool arguments. An attacker embeds instructions in data your tool retrieves (a web page, a database record, an email), and the model executes those instructions as if they came from the user. Defense requires three layers. First, <strong>input sanitization</strong>: strip HTML and markdown from tool results before returning them to the model. Second, <strong>authorization checks</strong>: validate that the current user has permission to call high-privilege tools before executing—never trust the model&rsquo;s decision alone. Third, <strong>audit logging</strong>: record every tool call with the user identity, arguments, and result. For destructive operations (database writes, API calls that spend money, emails), require explicit human approval rather than autonomous execution. Implement a confirmation step where the model proposes the action, your system presents it to the user, and only executes after explicit approval. OWASP also recommends a principle of least privilege: expose only the minimum tool set needed for a given session, not all available tools at once.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> hashlib
</span></span><span style="display:flex;"><span><span style="color:#f92672">import</span> time
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>AUDIT_LOG <span style="color:#f92672">=</span> []
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">authorized_tool_call</span>(user_id: str, tool_name: str, args: dict, 
</span></span><span style="display:flex;"><span>                          user_permissions: set) <span style="color:#f92672">-&gt;</span> tuple[bool, str]:
</span></span><span style="display:flex;"><span>    tool_permissions <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;get_weather&#34;</span>: set(),           <span style="color:#75715e"># Public</span>
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;read_database&#34;</span>: {<span style="color:#e6db74">&#34;db_read&#34;</span>},   <span style="color:#75715e"># Requires permission</span>
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;write_database&#34;</span>: {<span style="color:#e6db74">&#34;db_write&#34;</span>}, <span style="color:#75715e"># Requires elevated permission</span>
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;send_email&#34;</span>: {<span style="color:#e6db74">&#34;email_send&#34;</span>},   <span style="color:#75715e"># Requires explicit permission</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>    required <span style="color:#f92672">=</span> tool_permissions<span style="color:#f92672">.</span>get(tool_name, {<span style="color:#e6db74">&#34;admin&#34;</span>})
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> required<span style="color:#f92672">.</span>issubset(user_permissions):
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">False</span>, <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Unauthorized: </span><span style="color:#e6db74">{</span>tool_name<span style="color:#e6db74">}</span><span style="color:#e6db74"> requires </span><span style="color:#e6db74">{</span>required<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Sanitize arguments to prevent injection</span>
</span></span><span style="display:flex;"><span>    sanitized <span style="color:#f92672">=</span> {}
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">for</span> k, v <span style="color:#f92672">in</span> args<span style="color:#f92672">.</span>items():
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> isinstance(v, str):
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">import</span> re
</span></span><span style="display:flex;"><span>            v <span style="color:#f92672">=</span> re<span style="color:#f92672">.</span>sub(<span style="color:#e6db74">r</span><span style="color:#e6db74">&#39;&lt;[^&gt;]+&gt;&#39;</span>, <span style="color:#e6db74">&#39;&#39;</span>, v)   <span style="color:#75715e"># Strip HTML</span>
</span></span><span style="display:flex;"><span>            v <span style="color:#f92672">=</span> re<span style="color:#f92672">.</span>sub(<span style="color:#e6db74">r</span><span style="color:#e6db74">&#39;\[.*?\]\(.*?\)&#39;</span>, <span style="color:#e6db74">&#39;&#39;</span>, v)  <span style="color:#75715e"># Strip markdown links</span>
</span></span><span style="display:flex;"><span>        sanitized[k] <span style="color:#f92672">=</span> v
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    AUDIT_LOG<span style="color:#f92672">.</span>append({
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;timestamp&#34;</span>: time<span style="color:#f92672">.</span>time(),
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;user_id&#34;</span>: user_id,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;tool&#34;</span>: tool_name,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;args_hash&#34;</span>: hashlib<span style="color:#f92672">.</span>sha256(json<span style="color:#f92672">.</span>dumps(sanitized)<span style="color:#f92672">.</span>encode())<span style="color:#f92672">.</span>hexdigest(),
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;authorized&#34;</span>: <span style="color:#66d9ef">True</span>
</span></span><span style="display:flex;"><span>    })
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#66d9ef">True</span>, json<span style="color:#f92672">.</span>dumps(sanitized)
</span></span></code></pre></div><h2 id="production-best-practices-validation-logging-and-cost-tracking">Production Best Practices: Validation, Logging, and Cost Tracking</h2>
<p>Production function calling deployments need observability that most prototypes skip. Track these four metrics for every tool call in production: <strong>latency</strong> (time from tool call detection to result return), <strong>error rate</strong> (fraction of calls that return error strings), <strong>token cost</strong> (input tokens for tool definitions × requests per day), and <strong>iteration depth</strong> (how many loops agent runs require—outliers indicate confused models). Build cost tracking from the start: <code>tool_definition_tokens × requests × price_per_token</code> adds up at scale. A system with 10 tools averaging 200 tokens each running 1M requests/day at $0.015/1K tokens spends $30/day on tool definitions alone—before the actual conversation tokens. Tool description quality directly affects both accuracy and cost: precise descriptions reduce the number of retries and multi-step loops needed. Keep descriptions under 200 characters, use active voice (&ldquo;Returns weather data for a city&rdquo;), and include one concrete example in the <code>description</code> field when behavior isn&rsquo;t obvious.</p>
<h2 id="future-trends-mcp-standardization-and-whats-coming-in-2027">Future Trends: MCP Standardization and What&rsquo;s Coming in 2027</h2>
<p>Model Context Protocol (MCP) is the emerging standard for tool definitions that work across providers without per-provider schema translation. Anthropic released MCP as an open protocol in late 2024, and by mid-2026 OpenAI and Google have both announced MCP compatibility in their roadmaps. MCP moves tool definitions out of your API request and into a standardized server that any MCP-compatible model can query at runtime. Instead of embedding 20 tool schemas in every request (paying 3,000+ tokens each time), an MCP server hosts them and the model fetches only what it needs. This will significantly reduce the per-request token cost of large tool libraries. Server-side tools (Anthropic&rsquo;s current built-in web_search and code_execution) will expand across providers—expect Google and OpenAI to offer managed retrieval and execution environments by 2027. Streaming argument generation (currently Gemini-exclusive) will become universal, enabling real-time UI feedback during complex multi-tool agent runs.</p>
<h2 id="decision-framework-function-calling-vs-structured-outputs-vs-mcp">Decision Framework: Function Calling vs Structured Outputs vs MCP</h2>
<p>Choosing the right tool-integration pattern depends on three questions. <strong>Do you need the model to decide when to call an external system?</strong> If yes, use function calling—the model controls invocation. If you just need the output in a specific JSON format, use structured outputs instead (cheaper, simpler, no execution loop). <strong>Do you need execution to happen inside the API without your infrastructure?</strong> If yes, use Anthropic&rsquo;s server-side tools for web search and code execution. <strong>Do you need to share tool definitions across multiple models or teams?</strong> If yes, use MCP to define tools once and expose them to any compatible model. A common mistake is using function calling when structured outputs would suffice—if you&rsquo;re extracting entities from text or generating a form response, structured outputs give you the same schema guarantees at lower cost and complexity. Reserve function calling for cases where you genuinely need the model to call external systems based on dynamic reasoning.</p>
<table>
  <thead>
      <tr>
          <th>Use Case</th>
          <th>Recommended Pattern</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Extract structured data from text</td>
          <td>Structured outputs</td>
      </tr>
      <tr>
          <td>Query a live external API</td>
          <td>Function calling</td>
      </tr>
      <tr>
          <td>Web search without your infrastructure</td>
          <td>Anthropic server-side tools</td>
      </tr>
      <tr>
          <td>Cross-provider tool sharing</td>
          <td>MCP</td>
      </tr>
      <tr>
          <td>Real-time UI with streaming args</td>
          <td>Google Gemini 2.5+</td>
      </tr>
      <tr>
          <td>Guaranteed schema compliance</td>
          <td>OpenAI strict mode</td>
      </tr>
      <tr>
          <td>Parallel independent queries</td>
          <td>All providers (use asyncio)</td>
      </tr>
  </tbody>
</table>
<h2 id="faq">FAQ</h2>
<p><strong>Does function calling work with all models or just the most expensive ones?</strong>
All major paid-tier models from OpenAI (GPT-4.1, GPT-4o mini), Anthropic (Claude Haiku, Claude Sonnet, Claude Opus), and Google (Gemini 1.5 Flash, Gemini 2.5 Pro) support function calling. Smaller/cheaper models like GPT-4o mini and Claude Haiku support it well for simple single-tool use cases. For complex multi-step agent loops with many tools, frontier models (GPT-4.1, Claude Opus 4.7, Gemini 2.5 Pro) are significantly more reliable at choosing the right tools and arguments.</p>
<p><strong>Can I use function calling with streaming responses?</strong>
Yes, with caveats. OpenAI and Anthropic both support streaming with function calls—tool call arguments stream as tokens, letting you start processing before the argument JSON is complete. Google Gemini 2.5+ streams function call arguments natively. The practical complexity is that you must buffer the argument stream and parse the complete JSON before executing—streaming doesn&rsquo;t help with execution latency, only with time-to-first-token of the argument.</p>
<p><strong>How do I prevent the model from calling tools it shouldn&rsquo;t?</strong>
Use <code>tool_choice</code> (OpenAI), <code>tool_choice</code> with <code>{&quot;type&quot;: &quot;none&quot;}</code> (Anthropic), or remove the tool from the tools array entirely. Removing tools from the request is the most reliable approach for session-level restrictions because it&rsquo;s enforced by the API rather than relying on model compliance. For operation-level restrictions (allow tool X for read operations but not write), implement authorization checks in your execution layer—never trust the model to enforce permissions.</p>
<p><strong>What&rsquo;s the maximum number of tools I can include in a request?</strong>
OpenAI supports up to 128 tools per request. Anthropic supports a similar number (documented limit varies by model). Google Gemini&rsquo;s limit is lower and varies by model generation. In practice, you should stay well below these limits for cost and reliability reasons: more than 20–30 tools significantly increases both token costs and the likelihood of the model choosing the wrong tool. If you genuinely need more, use tool routing—a classifier model or heuristic that selects a relevant subset of tools for each request.</p>
<p><strong>How do I handle functions that take a long time to execute?</strong>
Use async execution with explicit timeouts. Set a per-tool timeout (typically 10–30 seconds) and a total agent loop timeout (60–120 seconds). When a timeout occurs, return a timeout error string as the tool result—don&rsquo;t let the exception propagate. The model can then decide to retry, use a different approach, or report the failure to the user. For genuinely long operations (batch jobs, file processing), return a job ID immediately and provide a separate <code>check_job_status</code> tool the model can poll in subsequent turns.</p>
]]></content:encoded></item></channel></rss>