
xAI Grok speech API STT TTS Guide for Developers in 2026
The xAI Grok speech API gives developers separate STT and TTS endpoints for transcription, live captions, generated speech, and voice workflows. Use STT when audio becomes text, TTS when text becomes audio, and the Voice Agent API only when the product needs full two-way spoken conversation. What Is the xAI Grok Speech API in 2026? The xAI Grok speech API is a set of production voice endpoints for speech-to-text, text-to-speech, and conversational voice applications under the Grok developer platform. xAI announced standalone Grok STT and TTS APIs on April 17, 2026, with STT general availability listed on April 15, 2026 and TTS general availability listed on March 16, 2026. For developers, the practical split matters more than the launch timeline: /v1/stt transcribes uploaded or streamed audio, /v1/tts generates audio from text, and the Voice Agent API handles full duplex speech workflows. The speech APIs target common app surfaces such as call analytics, meeting notes, accessibility captions, IVR prompts, podcast production, and voice agents. The core takeaway is simple: treat Grok speech as composable audio infrastructure, not as one monolithic voice product. ...