<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Z.ai on RockB</title><link>https://baeseokjae.github.io/tags/z.ai/</link><description>Recent content in Z.ai on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 15 May 2026 06:05:13 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/z.ai/index.xml" rel="self" type="application/rss+xml"/><item><title>Z.ai API Developer Guide 2026: GLM Models, Pricing, and Setup</title><link>https://baeseokjae.github.io/posts/z-ai-api-developer-guide-2026/</link><pubDate>Fri, 15 May 2026 06:05:13 +0000</pubDate><guid>https://baeseokjae.github.io/posts/z-ai-api-developer-guide-2026/</guid><description>Complete Z.ai API guide: GLM model lineup, Coding Plan pricing, OpenAI/Anthropic-compatible endpoints, and step-by-step Claude Code setup.</description><content:encoded><![CDATA[<p>Z.ai is Zhipu AI&rsquo;s international developer platform, offering access to the GLM model family — including GLM-5.1, the first open-weight model to top the SWE-bench Pro leaderboard — via OpenAI-compatible and Anthropic-compatible APIs. Coding Plan subscriptions start at $10/month, making it the cheapest frontier-adjacent coding setup available in 2026.</p>
<h2 id="what-is-zai-zhipu-ais-international-developer-platform-explained">What Is Z.ai? Zhipu AI&rsquo;s International Developer Platform Explained</h2>
<p>Z.ai is the international-facing developer API platform operated by Zhipu AI, a Beijing-based AI lab founded in 2019 as a spinout from Tsinghua University. The platform exposes Zhipu&rsquo;s GLM (General Language Model) series to developers worldwide through two API compatibility layers: an OpenAI-compatible endpoint at <code>https://api.z.ai/api/openai/v1</code> and an Anthropic-compatible endpoint at <code>https://api.z.ai/api/anthropic</code> — making Z.ai the only provider besides Anthropic itself that offers a true Anthropic API drop-in replacement. Zhipu AI trained the GLM models without Nvidia hardware, a geopolitical differentiator as export restrictions tighten in 2026. The platform offers free models (GLM-4.7-Flash, GLM-4.5-Flash) for prototyping, quota-based Coding Plan subscriptions for Claude Code users, and direct per-token billing for production workloads. As of May 2026, GLM-5.1 scores 58.4% on SWE-bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). For developers who need frontier-adjacent coding performance without the $200/month Claude Max bill, Z.ai is the most cost-effective path.</p>
<h3 id="why-developers-are-choosing-zai">Why Developers Are Choosing Z.ai</h3>
<p>The practical draw is simple: a $30/quarter Coding Plan ($10/month equivalent) lets you run Claude Code against GLM-5.1 using a quota system, replacing Anthropic&rsquo;s direct subscription at a fraction of the cost. The Anthropic-compatible endpoint means zero code changes — the same environment variables that point Claude Code at Anthropic&rsquo;s API can be redirected to Z.ai with a base URL swap.</p>
<h2 id="glm-model-family-overview-glm-51-glm-5-turbo-glm-47-and-free-flash-models">GLM Model Family Overview: GLM-5.1, GLM-5-Turbo, GLM-4.7, and Free Flash Models</h2>
<p>The Z.ai GLM model family spans six tiers in 2026, ranging from zero-cost flash models suitable for prototyping to the full GLM-5.1 flagship designed for long-horizon agentic tasks. GLM-5.1 is the headline model: 745 billion parameters, 200K token context window, and a demonstrated ability to run autonomous execution sessions up to 8 hours — in one published benchmark, it completed 655 iterations with 6,000+ tool calls to build a functional vector database at 21,500 QPS. GLM-5-Turbo is the speed-optimized variant for latency-sensitive applications. GLM-4.7 targets balanced cost-performance, priced at $0.60/M input and $2.20/M output tokens. GLM-4.7-Flash and GLM-4.5-Flash are fully free for all registered Z.ai accounts, with GLM-4.7-Flash offering a 203K context window — the largest free context window available from any major API provider. GLM-4-Air rounds out the family as an ultra-low-cost option for high-volume, simple tasks. The key architectural differentiator: all GLM models were trained on custom hardware without Nvidia GPUs, making Zhipu AI independent from Western export-controlled supply chains.</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>Context</th>
          <th>Input Price</th>
          <th>Output Price</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>GLM-5.1</td>
          <td>200K</td>
          <td>$1.00/M</td>
          <td>$3.20/M</td>
          <td>SWE-bench Pro #1</td>
      </tr>
      <tr>
          <td>GLM-5-Turbo</td>
          <td>128K</td>
          <td>$0.80/M</td>
          <td>$2.80/M</td>
          <td>Speed-optimized</td>
      </tr>
      <tr>
          <td>GLM-4.7</td>
          <td>128K</td>
          <td>$0.60/M</td>
          <td>$2.20/M</td>
          <td>Best cost/performance</td>
      </tr>
      <tr>
          <td>GLM-4.7-Flash</td>
          <td>203K</td>
          <td>Free</td>
          <td>Free</td>
          <td>Largest free context</td>
      </tr>
      <tr>
          <td>GLM-4.5-Flash</td>
          <td>128K</td>
          <td>Free</td>
          <td>Free</td>
          <td>Prototyping</td>
      </tr>
      <tr>
          <td>GLM-4-Air</td>
          <td>128K</td>
          <td>$0.10/M</td>
          <td>$0.30/M</td>
          <td>Ultra-low cost</td>
      </tr>
  </tbody>
</table>
<h3 id="which-model-should-you-default-to">Which Model Should You Default To?</h3>
<p>For Claude Code integration, GLM-5.1 is the default recommendation — it&rsquo;s what Coding Plan quota is designed for, and the performance gap vs Claude Opus 4.6 is under 4% on SWE-bench Verified (77.8% vs 80.8%). For direct API workloads where cost matters more than peak coding performance, GLM-4.7 at $0.60/$2.20 per million tokens is the practical sweet spot.</p>
<h2 id="zai-pricing-coding-plans-vs-direct-per-token-api-access">Z.ai Pricing: Coding Plans vs Direct Per-Token API Access</h2>
<p>Z.ai pricing splits into two distinct models. Zhipu AI removed the $3/month promotional pricing on February 11, 2026; current Coding Plans use a quota-based system with three tiers. The Lite plan is $30/quarter (<del>$10/month) and provides enough quota for individual developers doing moderate coding assistance. The Pro plan is $90/quarter (</del>$30/month), targeting power users running multi-file refactors and agentic tasks daily. The Max plan is $240/quarter (~$80/month), designed for teams or developers running Claude Code as their primary coding environment throughout the workday. All Coding Plan quotas reset quarterly and are consumed by API calls routed through the Anthropic-compatible endpoint — there is no separate charge per token when using Coding Plan quota. Direct per-token API access is available for production workloads that need predictable billing without quota caps. For most Claude Code users, the Lite plan ($10/month) offers compelling value: it replaces a $20/month Claude Pro subscription for coding use cases while using a model that scores within 3 points of Claude Sonnet on SWE-bench Verified.</p>
<table>
  <thead>
      <tr>
          <th>Plan</th>
          <th>Quarterly Price</th>
          <th>Monthly Equivalent</th>
          <th>Best For</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Free</td>
          <td>$0</td>
          <td>$0</td>
          <td>Prototyping with Flash models</td>
      </tr>
      <tr>
          <td>Lite</td>
          <td>$30/quarter</td>
          <td>~$10/month</td>
          <td>Individual developers</td>
      </tr>
      <tr>
          <td>Pro</td>
          <td>$90/quarter</td>
          <td>~$30/month</td>
          <td>Daily power users</td>
      </tr>
      <tr>
          <td>Max</td>
          <td>$240/quarter</td>
          <td>~$80/month</td>
          <td>Full-time Claude Code users</td>
      </tr>
  </tbody>
</table>
<h3 id="direct-api-vs-coding-plan-cost-math">Direct API vs Coding Plan: Cost Math</h3>
<p>If you send 50M input tokens and 20M output tokens per month using GLM-5.1: direct API billing = ($1.00 × 50) + ($3.20 × 20) = $50 + $64 = $114/month. The Max Coding Plan at $80/month is capped but limited by quota. For light usage under the Lite plan&rsquo;s quota, you save 80-90% vs Anthropic direct.</p>
<h2 id="quick-start-openai-compatible-api-setup-in-python">Quick Start: OpenAI-Compatible API Setup in Python</h2>
<p>Z.ai&rsquo;s OpenAI-compatible endpoint is the fastest path from zero to working API calls. Register at z.ai, generate an API key from the dashboard, and you can reuse any OpenAI SDK code by changing two values: the base URL and the API key. The endpoint is <code>https://api.z.ai/api/openai/v1</code>, and model names follow the pattern <code>glm-5.1</code>, <code>glm-4.7</code>, <code>glm-4.7-flash</code>. No other configuration changes are required — chat completions, function calling, streaming, and embeddings all work identically to the OpenAI API contract. For teams already running OpenAI SDK in production, migrating to Z.ai for cost reduction requires only environment variable changes, not code refactoring. The recommended starting model is <code>glm-4.7-flash</code> for free exploration, then <code>glm-5.1</code> once you have a Coding Plan or direct billing configured.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> openai <span style="color:#f92672">import</span> OpenAI
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OpenAI(
</span></span><span style="display:flex;"><span>    api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-z-ai-api-key&#34;</span>,
</span></span><span style="display:flex;"><span>    base_url<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://api.z.ai/api/openai/v1&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;glm-5.1&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Write a Python function to parse JWT tokens without a library.&#34;</span>}
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>print(response<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>message<span style="color:#f92672">.</span>content)
</span></span></code></pre></div><h3 id="streaming-responses">Streaming Responses</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>stream <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;glm-4.7&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Explain the actor model in distributed systems.&#34;</span>}],
</span></span><span style="display:flex;"><span>    stream<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> chunk <span style="color:#f92672">in</span> stream:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> chunk<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>delta<span style="color:#f92672">.</span>content:
</span></span><span style="display:flex;"><span>        print(chunk<span style="color:#f92672">.</span>choices[<span style="color:#ae81ff">0</span>]<span style="color:#f92672">.</span>delta<span style="color:#f92672">.</span>content, end<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>, flush<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span></code></pre></div><h3 id="function-calling">Function Calling</h3>
<p>Z.ai&rsquo;s OpenAI-compatible endpoint supports the same <code>tools</code> parameter as OpenAI. Define tools using the standard JSON schema format:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>tools <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;function&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#e6db74">&#34;function&#34;</span>: {
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;name&#34;</span>: <span style="color:#e6db74">&#34;get_current_weather&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;Get the current weather for a location&#34;</span>,
</span></span><span style="display:flex;"><span>            <span style="color:#e6db74">&#34;parameters&#34;</span>: {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;location&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;description&#34;</span>: <span style="color:#e6db74">&#34;City name&#34;</span>},
</span></span><span style="display:flex;"><span>                    <span style="color:#e6db74">&#34;unit&#34;</span>: {<span style="color:#e6db74">&#34;type&#34;</span>: <span style="color:#e6db74">&#34;string&#34;</span>, <span style="color:#e6db74">&#34;enum&#34;</span>: [<span style="color:#e6db74">&#34;celsius&#34;</span>, <span style="color:#e6db74">&#34;fahrenheit&#34;</span>]}
</span></span><span style="display:flex;"><span>                },
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;required&#34;</span>: [<span style="color:#e6db74">&#34;location&#34;</span>]
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>response <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>chat<span style="color:#f92672">.</span>completions<span style="color:#f92672">.</span>create(
</span></span><span style="display:flex;"><span>    model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;glm-5.1&#34;</span>,
</span></span><span style="display:flex;"><span>    messages<span style="color:#f92672">=</span>[{<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;What&#39;s the weather in Beijing?&#34;</span>}],
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>tools
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h2 id="claude-code--zai-drop-in-replacement-via-anthropic-compatible-endpoint">Claude Code + Z.ai: Drop-In Replacement via Anthropic-Compatible Endpoint</h2>
<p>Z.ai is the only provider besides Anthropic that offers a true Anthropic-compatible API endpoint, located at <code>https://api.z.ai/api/anthropic</code>. This makes it a genuine drop-in replacement for Claude Code: you redirect Claude Code&rsquo;s API calls to Z.ai using two environment variables, and Claude Code operates normally using GLM-5.1 instead of Claude Opus or Sonnet. This setup works because GLM-5.1 supports the Anthropic messages API contract, including extended thinking format and tool use. The practical result: a $10/month Coding Plan Lite subscription replaces a $20/month Claude Pro subscription for coding-focused workflows, with GLM-5.1 achieving 94.6% of Claude Opus 4.6&rsquo;s coding performance on standard benchmarks. Z.ai also provides <code>npx @z_ai/coding-helper</code> as an auto-configuration tool that sets the environment variables and verifies connectivity in one command.</p>
<h3 id="manual-setup-two-environment-variables">Manual Setup (Two Environment Variables)</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>export ANTHROPIC_BASE_URL<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://api.z.ai/api/anthropic&#34;</span>
</span></span><span style="display:flex;"><span>export ANTHROPIC_AUTH_TOKEN<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-z-ai-api-key&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Optional: increase timeout for long-running agentic tasks</span>
</span></span><span style="display:flex;"><span>export API_TIMEOUT_MS<span style="color:#f92672">=</span><span style="color:#ae81ff">300000</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Launch Claude Code</span>
</span></span><span style="display:flex;"><span>claude
</span></span></code></pre></div><h3 id="auto-configuration-tool">Auto-Configuration Tool</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>npx @z_ai/coding-helper
</span></span></code></pre></div><p>This interactive tool prompts for your Z.ai API key, sets environment variables, and runs a connectivity test. It also configures the default model to <code>glm-5.1</code> for Coding Plan accounts.</p>
<h3 id="switching-models-inside-claude-code">Switching Models Inside Claude Code</h3>
<p>Once connected, use the <code>/model</code> command inside Claude Code to switch between GLM models:</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 552 57"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>/</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>/</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>/</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='8' y='36' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='16' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='40' y='20' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='40' y='36' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>g</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>g</text>
<text text-anchor='middle' x='56' y='36' fill='currentColor' style='font-size:1em'>g</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='64' y='36' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>.</text>
<text text-anchor='middle' x='96' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='96' y='36' fill='currentColor' style='font-size:1em'>.</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>7</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='192' y='4' fill='currentColor' style='font-size:1em'>#</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>#</text>
<text text-anchor='middle' x='192' y='36' fill='currentColor' style='font-size:1em'>#</text>
<text text-anchor='middle' x='208' y='4' fill='currentColor' style='font-size:1em'>F</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>F</text>
<text text-anchor='middle' x='208' y='36' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='216' y='4' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='216' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='224' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='224' y='36' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='232' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='232' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='232' y='36' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='248' y='4' fill='currentColor' style='font-size:1em'>f</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='248' y='36' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='256' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='256' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='264' y='4' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='264' y='36' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='272' y='4' fill='currentColor' style='font-size:1em'>g</text>
<text text-anchor='middle' x='272' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='272' y='36' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='280' y='4' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='280' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='280' y='36' fill='currentColor' style='font-size:1em'>/</text>
<text text-anchor='middle' x='288' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='288' y='20' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='288' y='36' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='296' y='4' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='296' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='296' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='304' y='4' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='304' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='304' y='36' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='312' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='312' y='36' fill='currentColor' style='font-size:1em'>f</text>
<text text-anchor='middle' x='320' y='4' fill='currentColor' style='font-size:1em'>—</text>
<text text-anchor='middle' x='320' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='320' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='328' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='328' y='36' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='336' y='4' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='336' y='20' fill='currentColor' style='font-size:1em'>,</text>
<text text-anchor='middle' x='336' y='36' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='344' y='4' fill='currentColor' style='font-size:1em'>W</text>
<text text-anchor='middle' x='344' y='36' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='352' y='4' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='352' y='20' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='352' y='36' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='360' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='360' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='360' y='36' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='368' y='4' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='368' y='20' fill='currentColor' style='font-size:1em'>w</text>
<text text-anchor='middle' x='368' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='376' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='376' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='384' y='4' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='384' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='384' y='36' fill='currentColor' style='font-size:1em'>f</text>
<text text-anchor='middle' x='392' y='4' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='392' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='400' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='400' y='20' fill='currentColor' style='font-size:1em'>q</text>
<text text-anchor='middle' x='400' y='36' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='408' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='416' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='416' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='416' y='36' fill='currentColor' style='font-size:1em'>f</text>
<text text-anchor='middle' x='424' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='424' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='424' y='36' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='432' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='432' y='20' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='432' y='36' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='440' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='448' y='4' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='448' y='20' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='448' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='456' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='456' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='456' y='36' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='464' y='4' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='464' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='464' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='472' y='4' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='472' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='472' y='36' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='480' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='480' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='480' y='36' fill='currentColor' style='font-size:1em'>v</text>
<text text-anchor='middle' x='488' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='488' y='20' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='488' y='36' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='496' y='20' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='504' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='504' y='36' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='512' y='20' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='512' y='36' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='520' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='520' y='36' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='528' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='528' y='36' fill='currentColor' style='font-size:1em'>k</text>
<text text-anchor='middle' x='536' y='36' fill='currentColor' style='font-size:1em'>s</text>
</g>

    </svg>
  
</div>
<h3 id="verifying-the-setup">Verifying the Setup</h3>
<p>After launching Claude Code with Z.ai environment variables, check the session header — it should show the model name (e.g., <code>glm-5.1</code>) and confirm the base URL. A quick test:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>claude -p <span style="color:#e6db74">&#34;Write a one-line Python HTTP server&#34;</span>
</span></span></code></pre></div><p>If you receive a response, the routing is working correctly. If you see an authentication error, verify that <code>ANTHROPIC_AUTH_TOKEN</code> is set to your Z.ai API key (not an Anthropic key).</p>
<h2 id="glm-51-benchmarks-how-it-compares-to-gpt-5-and-claude-opus-46">GLM-5.1 Benchmarks: How It Compares to GPT-5 and Claude Opus 4.6</h2>
<p>GLM-5.1 is the first open-weight model to top the SWE-bench Pro leaderboard, scoring 58.4% versus GPT-5.4&rsquo;s 57.7% and Claude Opus 4.6&rsquo;s 57.3% — a margin of 0.7 points over the previous leader. On SWE-bench Verified, which tests real-world GitHub issue resolution, GLM-5.1 scores 77.8% against Claude Opus 4.6&rsquo;s 80.8%, a gap of 3 percentage points. In practical terms, GLM-5.1 resolves approximately 94.6% as many coding tasks as Claude Opus 4.6 while costing $1.00/M input vs Claude Opus 4.6&rsquo;s $15.00/M input — a 15x price difference for a 5.4% performance gap. The 8-hour sustained autonomous execution benchmark is the most differentiating result: in a published test, GLM-5.1 ran 655 consecutive iterations with more than 6,000 tool calls to build a working vector database serving 21,500 QPS. No other open-weight model has demonstrated equivalent long-horizon agentic performance.</p>
<table>
  <thead>
      <tr>
          <th>Benchmark</th>
          <th>GLM-5.1</th>
          <th>GPT-5.4</th>
          <th>Claude Opus 4.6</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>SWE-bench Pro</td>
          <td><strong>58.4%</strong></td>
          <td>57.7%</td>
          <td>57.3%</td>
      </tr>
      <tr>
          <td>SWE-bench Verified</td>
          <td>77.8%</td>
          <td>76.9%</td>
          <td><strong>80.8%</strong></td>
      </tr>
      <tr>
          <td>Input Price</td>
          <td>$1.00/M</td>
          <td>$15.00/M</td>
          <td>$15.00/M</td>
      </tr>
      <tr>
          <td>Output Price</td>
          <td>$3.20/M</td>
          <td>$60.00/M</td>
          <td>$75.00/M</td>
      </tr>
      <tr>
          <td>Context Window</td>
          <td>200K</td>
          <td>128K</td>
          <td>200K</td>
      </tr>
      <tr>
          <td>Open Weight</td>
          <td>Yes</td>
          <td>No</td>
          <td>No</td>
      </tr>
  </tbody>
</table>
<h3 id="training-without-nvidia-hardware">Training Without Nvidia Hardware</h3>
<p>Zhipu AI trained the GLM-5 series entirely without Nvidia GPUs, using alternative accelerator hardware. This is strategically significant: as US export controls on H100 and future Nvidia GPUs tighten, Zhipu AI&rsquo;s supply chain is not affected. For enterprise buyers concerned about geopolitical risk in their AI vendor stack, this is a material differentiator from models trained exclusively on Nvidia infrastructure.</p>
<h2 id="glm-coding-plan-vs-direct-api-which-should-you-use">GLM Coding Plan vs Direct API: Which Should You Use?</h2>
<p>The Coding Plan and direct per-token API access are optimized for different use cases, and choosing the wrong one wastes money. The Coding Plan uses quota — a fixed allocation of compute per billing period — designed specifically for interactive Claude Code sessions where request patterns are bursty and unpredictable. You pay a flat quarterly fee and consume quota as you code, without tracking individual token counts. Direct per-token API access is billed by actual consumption, making it predictable and auditable for production systems that process user requests at defined volumes. For Claude Code users who code 4-8 hours per day, the Coding Plan is almost always cheaper: the $80/month Max plan provides sustained access that would cost hundreds per month under direct per-token billing at GLM-5.1 rates. For batch processing pipelines, document analysis, or any workload where you can estimate monthly token volumes, direct per-token pricing lets you model costs precisely and avoid paying for unused quota.</p>
<table>
  <thead>
      <tr>
          <th>Scenario</th>
          <th>Recommended Plan</th>
          <th>Reason</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Claude Code daily coding (&lt;2h/day)</td>
          <td>Lite ($10/mo)</td>
          <td>Quota sufficient, flat cost</td>
      </tr>
      <tr>
          <td>Claude Code power user (4h+/day)</td>
          <td>Max ($80/mo)</td>
          <td>Quota covers sustained usage</td>
      </tr>
      <tr>
          <td>Production API — predictable volume</td>
          <td>Direct per-token</td>
          <td>Precise billing, no waste</td>
      </tr>
      <tr>
          <td>Prototyping / testing</td>
          <td>Free Flash models</td>
          <td>Zero cost, 203K context</td>
      </tr>
      <tr>
          <td>CI/CD pipeline, batch tasks</td>
          <td>Direct per-token (GLM-4-Air)</td>
          <td>Lowest per-token rate</td>
      </tr>
  </tbody>
</table>
<h3 id="quota-reset-and-rollover">Quota Reset and Rollover</h3>
<p>Coding Plan quotas reset quarterly and do not roll over. If you consistently under-consume the Lite plan quota, downgrading to free Flash models for off-peak work and supplementing with direct billing during crunch periods is a legitimate optimization. Z.ai does not currently offer monthly billing — all Coding Plans require quarterly payment upfront.</p>
<h2 id="frequently-asked-questions">Frequently Asked Questions</h2>
<p><strong>Q: Does Z.ai work as a Claude Code backend outside China?</strong>
Yes. Z.ai is Zhipu AI&rsquo;s international platform, accessible globally. The Anthropic-compatible endpoint (<code>https://api.z.ai/api/anthropic</code>) and OpenAI-compatible endpoint (<code>https://api.z.ai/api/openai/v1</code>) are both served from infrastructure reachable from the US, EU, and other regions without VPN. Account registration requires a valid email address — no China phone number required.</p>
<p><strong>Q: Is GLM-5.1 actually open-weight, and can I run it locally?</strong>
GLM-5.1 weights are released under an MIT license, making it fully open-weight. However, at 745 billion parameters, local inference requires significant hardware: at minimum 6-8 high-end GPUs (A100 or H100 class). Most developers access GLM-5.1 through the Z.ai API rather than local deployment. Smaller distilled variants are available for local use.</p>
<p><strong>Q: How does the Anthropic-compatible endpoint handle Claude-specific features like extended thinking?</strong>
Z.ai&rsquo;s Anthropic-compatible endpoint supports the core Anthropic messages API contract including tool use and streaming. Claude-specific prompt formats (e.g., Claude system prompts with <code>&lt;thinking&gt;</code> blocks) pass through to GLM-5.1, which handles them in its own reasoning pipeline. Some Claude-specific behaviors may differ — in practice, most Claude Code workflows are unaffected because Claude Code communicates via standard API calls, not Claude-proprietary protocol extensions.</p>
<p><strong>Q: What happened to the $3/month Z.ai promotion?</strong>
Zhipu AI ended the $3/month promotional pricing on February 11, 2026. The promotion was a limited-time offer for early adopters during the Coding Plan launch. Current pricing starts at $30/quarter (~$10/month) for the Lite plan. Existing subscribers who locked in promotional pricing during the promotion window retained it until their current quarter expired.</p>
<p><strong>Q: Can I use Z.ai for image generation or multimodal inputs?</strong>
Yes. GLM-4V variants on Z.ai support image inputs for vision tasks, using the same OpenAI-compatible message format with <code>image_url</code> content blocks. The GLM-5v-Turbo model handles image analysis, document OCR, and visual question answering. Image generation is available through a separate CogView endpoint on the same API key — check Z.ai&rsquo;s documentation for model names and supported resolution formats.</p>
]]></content:encoded></item></channel></rss>