
Claude 300K Output Tokens Guide: Batch API for Large Code Generation 2026
Claude’s Extended Output beta raises the max_tokens ceiling from 128K to 300,000 tokens — but only for requests sent through the Message Batches API. If you’re generating full codebases, book-length documentation, or exhaustive structured extractions in a single turn, this guide covers everything you need to get it working. What Is Extended Output and How Does It Work? Extended Output is a Claude API beta feature, activated via the anthropic-beta: output-300k-2026-03-24 header, that increases the maximum max_tokens limit per request from 128,000 to 300,000 tokens. As of June 2026, it is only available on the Message Batches API — the synchronous Messages API remains capped at 64K–128K depending on the model. The models that support extended output are Claude Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6, all of which carry 1M-token context windows. Claude Fable 5 and Mythos 5 are explicitly excluded and remain at 128K output. A single 300K-token generation can take over an hour to complete, which is why the asynchronous batch architecture is a prerequisite. This is not a setting you flip on a chat endpoint — it’s a deliberate architectural tradeoff: accept latency, gain volume. The practical upside is book-length code scaffolds, full API documentation sets, and exhaustive data extraction jobs that previously required chaining multiple requests with fragile state management between them. ...

