Agents on RockB

Vercel AI SDK Guide 2026: Build Streaming AI Apps in TypeScript With One SDK

Wed, 22 Apr 2026 05:37:21 +0000

The Vercel AI SDK is a unified TypeScript library that lets you build streaming AI applications across OpenAI, Anthropic, Google, and 13+ other providers without rewriting your core logic when you switch models. Install it once, pick your provider, and ship production-ready AI features in hours instead of days.

What Is the Vercel AI SDK and Why It Matters in 2026

The Vercel AI SDK is an open-source TypeScript toolkit for building AI-powered web applications with a provider-agnostic API, first-class streaming support, and framework-native UI hooks. As of April 2026, it has 11.5 million weekly npm downloads, 23.7K GitHub stars, and 614+ contributors — making it the most widely adopted TypeScript AI library for web developers. The SDK is organized into three layers: AI SDK Core handles server-side text generation, object generation, and tool calling; AI SDK UI provides React/Vue/Svelte hooks like useChat and useCompletion for building chat interfaces without managing stream state; and AI SDK RSC integrates with React Server Components for edge-compatible generative UI. The SDK supports 100+ LLM models across 16+ providers via the Vercel AI Gateway, including OpenAI GPT-4o, Anthropic Claude, Google Gemini, and open models on Together/Groq. In 2026 Vercel added three major features on top: Workflows (long-running durable agents), Sandbox (secure agent code execution), and AI Elements (prebuilt UI components). OpenCode — one of the most popular open-source coding agents — is built entirely on AI SDK, which validates its production-grade viability.

The Three-Layer Architecture

The SDK cleanly separates concerns: Core runs on the server or edge, UI runs on the client, and RSC bridges the two with streaming server components. This separation means you can adopt incrementally — start with Core for a simple API route, add UI hooks when you need chat state management, and layer in RSC if you need server-driven generative UI.

How It Fits the Vercel Ecosystem

AI Gateway gives you one API key to access 100+ models with automatic fallbacks and rate limit management. Sandbox provides a secure Node.js environment for agents that need to execute code. Workflows lets agents suspend and resume across function invocations, solving the serverless timeout problem for long-running tasks.

Getting Started: Installing and Configuring AI SDK

Getting started with the Vercel AI SDK requires installing the ai core package plus one or more provider adapters. The setup takes under five minutes for a Next.js project and works equally well in any Node.js or edge runtime environment. The provider adapter pattern is the key architectural decision: you import a model from its provider package and pass it to AI SDK functions, meaning you can swap from OpenAI to Anthropic by changing a single import and model string — your business logic stays untouched. This design was explicitly chosen to prevent vendor lock-in, and in practice it means you can A/B test models in production, build fallback chains, or migrate providers without refactoring your entire codebase. The package size is small — ai is under 200KB minified — and it is designed to run on Vercel Edge Functions, Cloudflare Workers, and standard Node.js without adaptation. For new projects, the recommended starting point is a Next.js App Router app with the edge runtime on API routes, which gives you global distribution and sub-100ms cold starts.

npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google

// .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=...

// app/api/chat/route.ts
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'

export const runtime = 'edge'

export async function POST(req: Request) {
  const { messages } = await req.json()
  
  const result = streamText({
    model: openai('gpt-4o'),
    messages,
  })
  
  return result.toDataStreamResponse()
}

AI Gateway: One Key for 100+ Models

Vercel AI Gateway lets you use a single VERCEL_API_KEY to access models from OpenAI, Anthropic, Google, Mistral, and more. It handles rate limit rotation, cost tracking, and automatic retry logic. For teams that need to experiment with multiple providers without managing individual API key billing, Gateway is the fastest path to a multi-model setup.

AI SDK Core: Text Generation and Streaming

AI SDK Core is the server-side engine that converts provider-specific APIs into a consistent interface for generating text, streaming responses, and calling tools. The two primary functions are generateText and streamText. generateText is for synchronous operations — you send a prompt and wait for the full response, which is ideal for batch jobs, summarization pipelines, and any context where the user is not watching a UI render in real time. streamText is the streaming counterpart: it returns a ReadableStream that you can pipe directly to a Response object, and it integrates with UI hooks via the toDataStreamResponse() method. Both functions accept the same options object — model, messages, system, tools, maxSteps, temperature, and more — so switching between them is a one-word change. Provider switching is similarly simple: swapping openai('gpt-4o') for anthropic('claude-opus-4-7') is the only change needed. Retry logic and fallbacks are handled with the wrapLanguageModel utility and the fallback provider, which tries a list of models in order if the primary returns an error. The consistency across providers is the single biggest productivity gain AI SDK offers compared to using provider SDKs directly.

import { generateText, streamText } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'

// One-shot generation
const { text } = await generateText({
  model: anthropic('claude-sonnet-4-6'),
  prompt: 'Summarize the key features of React 19 in 3 bullet points.',
})

// Streaming
const result = streamText({
  model: anthropic('claude-sonnet-4-6'),
  prompt: 'Write a guide on async/await in TypeScript.',
})

for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}

Built-In Fallbacks and Retry Logic

import { createFallback } from '@ai-sdk/provider-utils'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'

const resilientModel = createFallback([
  openai('gpt-4o'),
  anthropic('claude-sonnet-4-6'),
])

Structured Output with Zod Schemas

generateObject and streamObject are AI SDK’s solution to one of the biggest pain points in production AI: getting reliable, type-safe structured data from LLMs instead of freeform text that you then parse with fragile regexes. These functions accept a Zod schema and use the model’s native structured output mode — JSON mode for OpenAI, tool-use-based extraction for Anthropic — to guarantee the response matches the schema shape. If the model returns malformed output, AI SDK retries automatically. This is not just a developer convenience: structured output is essential for any AI pipeline where the response feeds into downstream logic, databases, or APIs. Teams using generateObject in production report near-zero JSON parsing errors compared to prompt-based extraction, and the Zod types flow through the entire TypeScript type system so you get autocomplete on the AI response object. The streamObject variant lets you stream partial structured objects, enabling progressive UI rendering as the AI fills in fields — useful for forms, dashboards, or any interface where showing partial data is better than a blank loading state. For data extraction tasks — pulling product specs from HTML, extracting entities from documents, or parsing unstructured API responses — structured output with Zod is the recommended production approach.

import { generateObject, streamObject } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const BlogPostSchema = z.object({
  title: z.string(),
  summary: z.string().max(200),
  tags: z.array(z.string()).max(5),
  seoScore: z.number().min(0).max(100),
  sections: z.array(z.object({
    heading: z.string(),
    keyPoints: z.array(z.string()),
  })),
})

const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: BlogPostSchema,
  prompt: 'Analyze this article and return structured metadata: ...',
})

console.log(object.title) // TypeScript knows the full type

Streaming Structured Objects

const { partialObjectStream } = streamObject({
  model: openai('gpt-4o'),
  schema: BlogPostSchema,
  prompt: 'Generate a blog post outline for: "AI agents in 2026"',
})

for await (const partial of partialObjectStream) {
  // partial.title appears as soon as the model generates it
  updateUI(partial)
}

Building Chat UIs with AI SDK UI Hooks

AI SDK UI is the client-side complement to Core, providing React hooks that manage chat state, streaming responses, and optimistic updates without requiring a single useState or useEffect for stream handling. The primary hook is useChat, which gives you messages, input, handleInputChange, handleSubmit, and isLoading — everything needed to build a ChatGPT-like interface in under 50 lines of React. Under the hood it connects to your AI route, handles stream parsing, and appends message chunks to state as they arrive. The useCompletion hook handles text completion use cases — autocomplete, writing suggestions, or any single-prompt UX. useObject streams structured objects from a streamObject route and exposes the partial object as it builds, enabling progressive form filling or AI-driven dashboard updates. All three hooks work with React, Vue, Svelte, and SolidJS — the framework-agnostic design means you can share backend patterns between projects built on different frontend stacks. The hooks integrate with React Suspense and Error Boundaries for graceful loading and error states without extra wiring.

// app/components/Chat.tsx
'use client'
import { useChat } from 'ai/react'

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  })

  return (
    <div>
      {messages.map(m => (
        <div key={m.id} className={m.role === 'user' ? 'user' : 'assistant'}>
          {m.content}
        div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Sendbutton>
      form>
    div>
  )
}

Framework Support Comparison

Hook	React	Vue	Svelte	SolidJS
`useChat`	✅	✅	✅	✅
`useCompletion`	✅	✅	✅	✅
`useObject`	✅	✅	✅	✅
`useAssistant`	✅	❌	❌	❌

Tool Calling: Giving Your AI Agent Superpowers

Tool calling in AI SDK is the mechanism that transforms a passive text generator into an active agent — the model describes which tools it wants to invoke, AI SDK executes them server-side, and the results feed back into the next model turn automatically. Tools are defined with the tool helper, which takes a description (natural language explanation for the model), parameters (a Zod schema for typed inputs), and an execute function (the actual implementation). The SDK handles the full tool-call cycle: formatting the tool description for the provider, parsing the model’s structured tool-call output, executing the function with validated arguments, and appending the result to the conversation context. maxSteps controls how many tool-call cycles the agent can run before stopping, preventing infinite loops while allowing multi-step reasoning chains of 5–10 steps. Tool results stream to the client via toDataStreamResponse(), so users see intermediate tool outputs in real time rather than waiting for the final answer. In production applications, tool sets commonly include database query tools, web search, calculator functions, external API calls, and file operations — anything your server-side code can do, the agent can orchestrate. The Zod parameter schemas provide input validation at zero extra cost, catching malformed tool calls before they reach your database or external services.

import { streamText, tool } from 'ai'
import { openai } from '@ai-sdk/openai'
import { z } from 'zod'

const result = streamText({
  model: openai('gpt-4o'),
  maxSteps: 5,
  tools: {
    getWeather: tool({
      description: 'Get current weather for a city',
      parameters: z.object({ city: z.string() }),
      execute: async ({ city }) => {
        const res = await fetch(`https://api.weather.com/${city}`)
        return res.json()
      },
    }),
    searchDatabase: tool({
      description: 'Search the product database',
      parameters: z.object({ query: z.string(), limit: z.number().default(5) }),
      execute: async ({ query, limit }) => {
        return db.products.search(query, { limit })
      },
    }),
  },
  messages: [{ role: 'user', content: 'What is the weather in Tokyo and do we sell umbrellas?' }],
})

return result.toDataStreamResponse()

Multi-Step Agent Loop

With maxSteps: 5, the model can: call getWeather → get Tokyo weather → call searchDatabase for umbrellas → combine results → return final answer. Each step is visible to the user via streaming tool call indicators rendered automatically by useChat.

Building AI Agents with Multi-Step Reasoning

An AI agent in the Vercel AI SDK context is a streamText or generateText call with tools enabled and maxSteps set above 1 — the model reasons, calls tools, observes results, and reasons again until it reaches a conclusion or exhausts its step budget. This loop pattern is what separates agents from chatbots: rather than answering from static training knowledge, the agent actively queries databases, fetches URLs, or calls APIs to gather real-time information before formulating a response. The key to a production-grade agent is memory and context management: you control what goes in messages, so you can implement sliding window context, summarization, or retrieval-augmented generation by fetching relevant documents before calling the model. For RAG integration, the standard pattern is to embed the user query, retrieve top-k chunks from a vector store (Pinecone, Supabase pgvector, or Upstash), and prepend them as a system message. The agent then has access to both retrieved context and its tool-calling ability, so it can fetch additional information if the retrieved chunks are insufficient. OpenCode’s architecture demonstrates this at scale: it uses AI SDK with file system tools, runs multi-step reasoning loops to understand a codebase, and streams results back to a terminal UI — all without custom streaming infrastructure because AI SDK handles it.

// Research agent with RAG
async function researchAgent(query: string) {
  const relevantDocs = await vectorStore.search(query, { topK: 5 })
  
  const result = streamText({
    model: openai('gpt-4o'),
    maxSteps: 8,
    system: `You are a research agent. Use context and tools to answer thoroughly.
    
Context from knowledge base:
${relevantDocs.map(d => d.content).join('\n\n')}`,
    messages: [{ role: 'user', content: query }],
    tools: {
      searchWeb: tool({ /* web search implementation */ }),
      fetchUrl: tool({ /* URL fetching implementation */ }),
      saveNote: tool({ /* note saving implementation */ }),
    },
  })
  
  return result.toDataStreamResponse()
}

Vercel Workflows: Long-Running Agents That Survive

Vercel Workflows is a 2026 addition to the AI SDK ecosystem that solves the most critical limitation of serverless AI agents: function timeout. Standard serverless functions on Vercel time out after 30 seconds (Pro plan) or 5 minutes (Enterprise), which is insufficient for agents that need to search the web, process large documents, run multi-stage pipelines, or wait for human approval. Workflows introduces durable execution — agent tasks are broken into named steps that can suspend (persist state to managed storage), wait for external events, and resume exactly where they left off across multiple function invocations without losing context. This makes genuinely complex agentic pipelines feasible on serverless infrastructure: a content generation pipeline can run for 20+ minutes as it researches, drafts, and revises content, with the agent suspending between phases. The @vercel/workflows package integrates directly with AI SDK’s generateText and streamText — you wrap agent logic in a workflow function and use step.run() to define resumable checkpoints. Human-in-the-loop approval is supported via step.waitForEvent(), which suspends the workflow until a webhook fires. In 2026, Workflows is the recommended architecture for any AI task that may exceed 30 seconds or requires coordination between multiple agents.

import { workflow, step } from '@vercel/workflows'
import { generateText } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'

export const contentPipeline = workflow(async ({ input }: { input: { topic: string } }) => {
  // Each step is resumable — survives function timeout
  const research = await step.run('research', async () => {
    const { text } = await generateText({
      model: anthropic('claude-opus-4-7'),
      prompt: `Research: ${input.topic}. Return key facts and sources.`,
    })
    return text
  })

  const draft = await step.run('draft', async () => {
    const { text } = await generateText({
      model: anthropic('claude-sonnet-4-6'),
      prompt: `Using this research: ${research}\n\nWrite a 1000-word article about ${input.topic}.`,
    })
    return text
  })

  return { research, draft }
})

Production Deployment and Scaling

Deploying a Vercel AI SDK application to production requires careful attention to runtime selection, cost management, and observability. For runtime selection, Edge Functions are the right choice for streaming chat routes because they have lower cold-start latency and are globally distributed across 30+ regions — users in Tokyo get a fast response without routing to a US datacenter. Node.js runtime is better for heavy tool execution, large file processing, or anything requiring Node-specific APIs. Cost management starts with the maxTokens parameter to cap spending per request, and AI Gateway adds team-level spend limits and per-model cost tracking with dashboards. For rate limiting on API routes, @vercel/kv with a sliding window counter is the standard pattern: each user or IP gets N requests per minute, excess requests return 429 with a retry-after header. Observability is critical for catching silent model failures: the onFinish callback in streamText and generateText lets you log token usage, model name, latency, and finish reason to your analytics pipeline, enabling cost attribution per feature and alerting on abnormal token consumption. Vercel’s built-in function logs surface AI SDK error events automatically for debugging.

export async function POST(req: Request) {
  const { messages } = await req.json()
  
  // Rate limiting check
  const ip = req.headers.get('x-forwarded-for') ?? 'anonymous'
  const { success } = await ratelimit.limit(ip)
  if (!success) return new Response('Rate limit exceeded', { status: 429 })

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    maxTokens: 2000,
    temperature: 0.7,
    onFinish: ({ usage, finishReason }) => {
      analytics.track('ai_completion', {
        tokens: usage.totalTokens,
        finishReason,
        model: 'gpt-4o',
      })
    },
  })
  
  return result.toDataStreamResponse()
}

AI SDK vs LangChain vs Mastra: Framework Comparison

Choosing between Vercel AI SDK, LangChain.js, and Mastra in 2026 depends primarily on your stack, agent complexity, and how important streaming and bundle size are to your application. Vercel AI SDK is the right choice for TypeScript web developers building streaming-first applications — it is the lightest of the three (under 200KB), has the best Next.js and Edge Function integration, and provides the most seamless streaming API with minimal boilerplate. LangChain.js has the broadest ecosystem: pre-built chains, 50+ vector store integrations, document loaders, memory modules, and a large community cookbook — making it better for teams needing to quickly assemble complex RAG pipelines from components rather than writing integration code themselves. Mastra, which emerged in late 2025, sits between the two: TypeScript-native like AI SDK but with an opinionated agent framework including built-in memory, durable workflow primitives, and multi-agent coordination, targeting developers who need more structure than AI SDK provides without LangChain’s abstraction overhead. The bundle size difference is meaningful for edge and browser deployments where LangChain.js’s 2MB+ footprint can impact cold start times. For most Next.js applications in 2026, AI SDK is the practical default and LangChain or Mastra are reached for only when specific missing features justify the additional complexity.

Feature	Vercel AI SDK	LangChain.js	Mastra
Bundle size	~200KB	~2MB+	~500KB
Streaming	First-class	Good	Good
Tool calling	Native	Via chains	Native
Structured output	Zod-native	Manual	Zod-native
Long-running agents	Via Workflows	Partial	Built-in
Next.js/Edge	Excellent	Moderate	Good
Pre-built integrations	16+ providers	50+	20+
TypeScript types	Excellent	Good	Excellent
Learning curve	Low	High	Medium

FAQ

What is the difference between AI SDK Core and AI SDK UI? AI SDK Core (generateText, streamText, generateObject) runs on the server and handles model calls. AI SDK UI (useChat, useCompletion, useObject) runs on the client and manages stream state, message history, and UI updates. In a Next.js app, Core lives in app/api/ routes and UI hooks live in client components. You can use Core without UI (for backend pipelines) but UI requires a Core-powered API endpoint.

Can I use Vercel AI SDK without Vercel hosting? Yes. The AI SDK is a pure npm package with no dependency on Vercel’s infrastructure. You can use it in any Node.js server, AWS Lambda, Cloudflare Workers, or on-premise environment. Vercel-specific features like Workflows and AI Gateway require Vercel hosting, but AI SDK Core and UI work on any JavaScript runtime.

How do I switch between AI providers in Vercel AI SDK? Change one line: swap the model import and the model string. Replace openai('gpt-4o') with anthropic('claude-sonnet-4-6'). The rest of your code — messages, tools, streaming — stays identical. This is the main design goal: provider portability without refactoring business logic.

What is the recommended way to add memory to an AI SDK agent? The SDK does not manage memory itself — you control messages. Store conversation history in a database (KV, Postgres, or Upstash), retrieve the last N turns before each request, and pass them as messages. For long-term memory across sessions, embed user facts and retrieve them via vector search, prepending relevant memories to the system prompt before each request.

Does Vercel AI SDK support multi-modal inputs like images and PDFs? Yes. Models that support vision (GPT-4o, Claude Opus 4, Gemini Pro Vision) accept content arrays with { type: 'image', image: url } or { type: 'file', data: base64, mimeType: 'application/pdf' } parts alongside text. AI SDK normalizes these into the provider’s expected format automatically, so you write the same code regardless of which vision model you use.

Mastra AI: The TypeScript AI Agent Framework for 2026

Tue, 21 Apr 2026 00:00:00 +0000

Introduction: Why Mastra Is the TypeScript AI Framework to Watch in 2026

The AI agent ecosystem has a Python problem. Not with Python itself—it works fine—but with the fact that most agents ship as web services, and the teams building those services increasingly write TypeScript. Sam Bhagwat, CEO of Mastra, noted on Hacker News that 60–70% of YC X25 agent startups are building in TypeScript, not Python. The tooling hasn’t caught up. LangChain, CrewAI, and AutoGen all originated in Python, leaving TypeScript developers either wrapping Python services or cobbling together their own agent infrastructure.

Mastra was built to close that gap.

The shift from Python to TypeScript for AI agents

The shift is practical, not ideological. When your production stack runs on Node.js or edge runtimes, reaching for a Python framework introduces serialization overhead, deployment complexity, and a skills mismatch. TypeScript gives you shared types between your agent logic and your API layer, native streaming support for Server-Sent Events and WebSocket responses, and a single runtime for your entire backend. The ergonomics matter: you can define a tool’s input schema with Zod, pass that schema directly to the LLM as a function definition, and validate the LLM’s output against the same schema—no JSON Schema translation layer required.

What is Mastra?

Mastra is an open-source (Apache 2.0) TypeScript framework for building AI agents. It was created by the team behind Gatsby, the React static-site generator that peaked at 50k+ GitHub stars. That team shipped a framework before; they understand the ergonomics of developer tooling. Mastra provides structured primitives for agents, tools, workflows, RAG pipelines, evals, and observability—all expressed as TypeScript code, not YAML DSLs or visual editors that generate unreadable files.

The project has accumulated 23,200+ GitHub stars, 14,334 commits, and 1,079 branches as of April 2026. The velocity is real. Mastra raised a $22M Series A led by Spark Capital in early 2026, bringing total funding to $35M.

Enterprise adoption

The customer list is worth examining because it signals production readiness, not just developer enthusiasm:

Company	Use Case
Docker	Event-driven PR management agents with MCP
Brex	Financial agents that helped drive the $5.1B Capital One acquisition
Marsh McLennan	Enterprise search agent used by 100k+ people daily
Elastic	Agentic RAG with Elasticsearch
SoftBank	Enterprise productivity at scale
Replit	Agent 3 built on Mastra primitives
MongoDB, Workday, Salesforce, Plaid	Various production agent deployments

That’s not a “coming soon” list. Marsh McLennan’s agent is in daily production use by over 100,000 people. Brex’s agents contributed to a multi-billion-dollar acquisition. These are load-bearing systems.

Getting Started: Setting Up Your First Mastra Project

Prerequisites and installation

You need Node.js 18+ and an LLM API key (OpenAI, Anthropic, or Google). Create a new project:

npm create mastra@latest

The scaffold prompts you for a project name, your preferred LLM provider, and whether you want the Mastra Studio dev UI included. After setup:

cd my-mastra-app
npm install
npm run dev

The dev server starts on port 4111 by default and opens Mastra Studio.

Project structure overview

A scaffolded Mastra project looks like this:

All agent configuration lives in TypeScript files under src/mastra/. The framework discovers and registers agents, tools, and workflows based on exports from these files. No YAML, no code generation.

Mastra Studio: the interactive dev UI

Mastra Studio runs locally at http://localhost:4111 and provides:

Agent playground: chat with any defined agent, inspect tool calls, and trace token usage in real time
Workflow visualizer: see step DAGs, run workflows step by step, and inspect intermediate state
RAG testing: query your knowledge base and verify retrieval quality
Eval runner: execute model-graded and rule-based evaluations against agent outputs
Logs and traces: structured view of every LLM call, tool invocation, and workflow transition

Studio is not required in production—it’s a dev-time tool. But it replaces the ad-hoc console.log-driven debugging loop that most agent developers fall into.

Building Your First AI Agent with Mastra

Defining an agent with system prompts and tools

An agent in Mastra is a typed object with a system prompt, a model reference, and a set of tools. Here’s a minimal research agent:

import { Agent } from "@mastra/core/agent";
import { createTool } from "@mastra/core/tools";
import { z } from "zod";
import { openai } from "@ai-sdk/openai";

const searchTool = createTool({
  id: "web-search",
  description: "Search the web for information about a topic",
  inputSchema: z.object({
    query: z.string().describe("The search query"),
  }),
  execute: async ({ context }) => {
    const results = await fetchSearchResults(context.query);
    return { results };
  },
});

const researchAgent = new Agent({
  name: "research-agent",
  instructions: `You are a research assistant. When asked about a topic:
1. Search the web for relevant information
2. Synthesize the findings into a concise summary
3. Cite your sources
Always use the search tool before answering factual questions.`,
  model: openai("gpt-4o"),
  tools: { searchTool },
});

Key design decisions here: tools use Zod schemas for both input validation and LLM function-calling definition. The instructions field replaces the informal system-prompt string with a structured prompt that Mastra can version, evaluate against, and refactor across deployments.

Adding memory

Mastra supports two memory primitives: working memory and semantic recall.

Working memory is a short-term scratchpad that persists within a conversation thread. It stores structured state the agent can read and write:

import { Memory } from "@mastra/memory";

const memory = new Memory({
  options: {
    lastMessages: 10,        // Include last 10 messages in context
    workingMemory: {
      enabled: true,
      template: `
# User Profile
- Name: unknown
- Preferences: unknown
- Current Task: none
      `,
    },
    semanticRecall: {
      topK: 3,               // Recall 3 most relevant past messages
      messageRange: 2,       // Include 2 messages around each match
    },
  },
});

const contextualAgent = new Agent({
  name: "contextual-agent",
  instructions: "You are a helpful assistant that remembers user context.",
  model: openai("gpt-4o"),
  memory,
});

Semantic recall embeds past conversation turns and retrieves the top-K most relevant ones when a new message arrives. This means the agent can reference a preference mentioned 50 turns ago without loading the entire history into the context window. Working memory lets the agent maintain structured state—user profile, task progress, preferences—that persists across messages in the same thread.

Connecting LLM providers

Mastra uses the Vercel AI SDK model interface, so any provider that implements that interface works:

import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";
import { google } from "@ai-sdk/google";

// Swap models by changing one line
const agent = new Agent({
  name: "flexible-agent",
  instructions: "You are a versatile assistant.",
  model: anthropic("claude-sonnet-4-20250514"),
  // model: openai("gpt-4o"),
  // model: google("gemini-2.0-flash"),
});

Being model-agnostic matters operationally: you can run evals across models, fall back from one provider to another, and choose cost-effective models per task without rewriting agent logic.

Tools and MCP: Connecting Your Agent to the Real World

Built-in tool types in Mastra

Mastra’s createTool API is the foundational primitive. Every tool has an id, a description (used in the LLM’s function-calling prompt), an inputSchema (Zod), and an execute function:

import { createTool } from "@mastra/core/tools";
import { z } from "zod";

const calculateTool = createTool({
  id: "calculate",
  description: "Evaluate a mathematical expression",
  inputSchema: z.object({
    expression: z.string().describe("Math expression to evaluate, e.g. '2 + 2'"),
  }),
  outputSchema: z.object({
    result: z.number(),
  }),
  execute: async ({ context }) => {
    // Safe evaluation — no eval()
    const result = safeMathEval(context.expression);
    return { result };
  },
});

The outputSchema is optional but recommended. When provided, Mastra validates the tool’s output against it before returning the result to the agent. This catches malformed tool outputs early and prevents cascading errors.

MCP (Model Context Protocol) integration

MCP is Anthropic’s open protocol for connecting LLMs to external tools and data sources. Mastra implements both the client and server sides. As a client, Mastra can connect to any MCP server and expose its tools to agents:

import { MCPClient } from "@mastra/mcp";

const mcp = new MCPClient({
  servers: {
    github: {
      command: "npx",
      args: ["-y", "@modelcontextprotocol/server-github"],
      env: {
        GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GITHUB_TOKEN!,
      },
    },
  },
});

// MCP tools are automatically available to agents
const tools = await mcp.tools();

This is how Docker connected their agents to GitHub. The GitHub MCP server provides tools for listing PRs, reading diffs, posting comments, and managing labels—all without writing custom API integration code.

Real example: GitHub MCP server for PR automation

Docker’s architecture is instructive. They built three sub-agents, each with a narrow responsibility:

Analyze PR agent: Reads the PR diff and generates a structured analysis
Generate comment agent: Takes the analysis and writes a review comment
Post and close agent: Posts the comment and manages PR labels

These agents are orchestrated by a Mastra workflow that triggers on a GitHub webhook event. The key insight: rather than one monolithic agent trying to do everything, each agent has a focused system prompt and minimal tool access. This reduces error rates and makes the system auditable—if the posted comment is wrong, you check agent 2, not the entire pipeline.

// Simplified Docker-style PR automation workflow trigger
app.post("/webhook/github", async (req, res) => {
  const { action, pull_request } = req.body;
  if (action === "opened" || action === "synchronize") {
    await prReviewWorkflow.run({
      prNumber: pull_request.number,
      repo: pull_request.base.repo.full_name,
    });
  }
  res.status(200).send("ok");
});

Workflows: Orchestrating Complex Agent Tasks

When to use workflows vs agents

An agent with tools handles simple request-response patterns well. But when your task involves multiple sequential steps, conditional branching, parallel execution, or retry logic, you need a workflow. The distinction:

Agent: LLM decides what to do next based on context. Good for open-ended reasoning.
Workflow: You define the control flow. Good for deterministic multi-step processes.

If you can draw a flowchart for your process, use a workflow. If the process requires the LLM to decide its own path, use an agent. Many production systems combine both: a workflow orchestrates high-level steps, and agents handle LLM-driven reasoning within individual steps.

Building step-based workflows

Mastra workflows are defined as a series of steps, each with an input schema, an output schema, and an execute function:

import { Workflow, Step } from "@mastra/core/workflows";
import { z } from "zod";

const fetchContentStep = new Step({
  id: "fetch-content",
  inputSchema: z.object({ url: z.string() }),
  outputSchema: z.object({ content: z.string(), title: z.string() }),
  execute: async ({ context }) => {
    const page = await fetch(context.url).then((r) => r.text());
    const title = extractTitle(page);
    return { content: page, title };
  },
});

const summarizeStep = new Step({
  id: "summarize",
  inputSchema: z.object({ content: z.string() }),
  outputSchema: z.object({ summary: z.string() }),
  execute: async ({ context, mastra }) => {
    const agent = mastra.getAgent("research-agent")!;
    const result = await agent.generate(
      `Summarize this content concisely:\n\n${context.content}`
    );
    return { summary: result.text };
  },
});

const contentWorkflow = new Workflow({
  name: "content-summarizer",
  triggerSchema: z.object({ url: z.string() }),
});

contentWorkflow
  .step(fetchContentStep)
  .then(summarizeStep)
  .commit();

Steps can reference the Mastra instance to use agents, tools, or other workflows. The context object carries the output of the previous step(s).

Parallel execution, conditional branching, and nesting

Workflows support branch for conditional paths and parallel for concurrent execution:

contentWorkflow
  .step(fetchContentStep)
  .branch([
    {
      condition: ({ context }) => context.content.length > 5000,
      then: longContentStep,
    },
    {
      condition: ({ context }) => context.content.length <= 5000,
      then: shortContentStep,
    },
  ])
  .then(summarizeStep)
  .commit();

// Parallel execution
analysisWorkflow
  .parallel([sentimentStep, entityStep, keywordStep])
  .then(mergeResultsStep)
  .commit();

Workflows can also nest: a step can invoke another workflow as a sub-routine. This lets you compose complex processes from smaller, testable workflow units. Each nested workflow maintains its own state and can be run and debugged independently.

RAG with Mastra: Giving Your Agent Knowledge

Embedding and vector search support

Mastra includes built-in embedding and vector search through its RAG module. You define an embedder and a vector store, then index documents:

import { MastraRAG } from "@mastra/rag";
import { openai } from "@ai-sdk/openai";
import { PineconeVector } from "@mastra/pinecone";

const rag = new MastraRAG({
  embedder: openai.embedding("text-embedding-3-small"),
  vectorStore: new PineconeVector({
    indexName: "knowledge-base",
  }),
});

// Index documents
await rag.index({
  docs: [
    { id: "doc-1", text: "Mastra supports multiple LLM providers..." },
    { id: "doc-2", text: "Workflows enable conditional branching..." },
  ],
});

// Query
const results = await rag.retrieve({
  query: "How do I branch in a workflow?",
  topK: 5,
});

Mastra handles chunking, embedding, and storage. You control the chunk size, overlap, and embedding model.

Mastra built-in RAG capabilities

Beyond basic retrieval, Mastra provides:

Query transformation: Automatically rewrites user queries for better retrieval
Hybrid search: Combines vector similarity with keyword matching for improved recall
Re-ranking: Applies a second-pass relevance model to filter and reorder results

These are not external integrations—they ship with the framework and are configurable via the RAG constructor options.

Integration with Elasticsearch and other vector stores

Elastic published a detailed walkthrough of building agentic RAG with Mastra and Elasticsearch. The pattern:

Index documents into Elasticsearch with dense vector fields
Use Mastra’s RAG module with the Elasticsearch vector store adapter
Define an agent that retrieves context and generates answers

import { MastraRAG } from "@mastra/rag";
import { ElasticsearchVector } from "@mastra/elasticsearch";

const rag = new MastraRAG({
  embedder: openai.embedding("text-embedding-3-small"),
  vectorStore: new ElasticsearchVector({
    indexName: "docs-index",
    client: esClient,
  }),
});

const ragAgent = new Agent({
  name: "rag-agent",
  instructions: `Answer questions based on the retrieved context.
If the context doesn't contain enough information, say so.
Always cite the source document.`,
  model: openai("gpt-4o"),
  tools: {
    retrieve: rag.asTool(),
  },
});

The rag.asTool() method wraps the RAG pipeline as a Mastra tool, making it available to any agent. The Elastic integration demonstrates that Mastra’s RAG layer is vendor-agnostic—you can swap Pinecone for Elasticsearch for pgvector without changing agent code.

Productionizing: Evals, Observability, and Guardrails

Running model-graded and rule-based evals

Mastra includes an evaluation framework that runs LLM outputs against criteria. Two eval types:

Model-graded: An LLM judges the output against a rubric. Useful for open-ended quality assessment.
Rule-based: Deterministic checks on output structure, content, or behavior. Useful for guardrails.

import { Eval } from "@mastra/evals";
import { z } from "zod";

const relevanceEval = new Eval({
  name: "relevance",
  type: "model-graded",
  prompt: `Rate the relevance of the answer to the question on a scale of 1-5.
Question: {{input}}
Answer: {{output}}
Respond with a number only.`,
  model: openai("gpt-4o-mini"),
});

const noPIIEval = new Eval({
  name: "no-pii",
  type: "rule-based",
  check: ({ output }) => {
    const piiPatterns = [/\b\d{3}-\d{2}-\d{4}\b/, /\b[A-Z]\d{8}\b/];
    return !piiPatterns.some((p) => p.test(output));
  },
});

You run evals against datasets in Mastra Studio or programmatically. The results give you a quantifiable quality signal before deploying changes.

Tracing agent calls and token usage

Every LLM call, tool invocation, and workflow step is automatically traced. In Mastra Studio, you see:

Total latency per request
Token usage breakdown (prompt vs. completion)
Tool call sequences and their outputs
Memory retrieval operations and their relevance scores

For production monitoring, Mastra integrates with OpenTelemetry. You can export traces to any OTel-compatible backend (Datadog, Grafana, Honeycomb, etc.):

// mastra.config.ts
export default defineConfig({
  observability: {
    otel: {
      enabled: true,
      serviceName: "my-mastra-app",
      traceExporter: "otlp",
      endpoint: "http://localhost:4318/v1/traces",
    },
  },
});

This is a meaningful differentiator. Most agent frameworks leave observability as an exercise for the developer. Mastra wires it into every primitive.

Guardrails for prompt injection prevention

Mastra provides input and output guardrails as middleware on agent calls:

import { guardrail } from "@mastra/guardrails";

const agent = new Agent({
  name: "safe-agent",
  instructions: "You are a helpful assistant.",
  model: openai("gpt-4o"),
  guardrails: {
    input: [
      guardrail.injectionDetection(), // Detect common injection patterns
    ],
    output: [
      guardrail.lengthLimit({ maxTokens: 500 }),
      guardrail.piiDetection(),       // Block outputs containing PII
    ],
  },
});

Input guardrails run before the LLM call. Output guardrails run after. If a guardrail triggers, the agent returns a structured error instead of the raw LLM output. This is not a complete security solution—you still need proper access controls and sandboxing—but it adds a structured layer of defense.

Mastra Studio metrics, logs, and datasets

Studio aggregates eval results, trace data, and token usage into dashboards. You can:

Compare eval scores across model versions
Track token cost trends over time
Build evaluation datasets from production traces
Replay failed conversations to diagnose issues

The dataset feature is particularly useful: you can capture production agent interactions, annotate them, and use them as regression test suites when you change prompts, models, or tools.

Deployment: From Dev to Production

Mastra Server: deploying as a REST API

Mastra generates a server that exposes your agents, tools, and workflows as REST endpoints:

import { Mastra } from "@mastra/core";
import { createServer } from "@mastra/server";

const mastra = new Mastra({
  agents: { researchAgent, contextualAgent },
  workflows: { contentWorkflow },
});

const server = createServer(mastra);
server.listen(3000);

This gives you REST endpoints like:

POST /api/agents/researchAgent/generate — single-turn generation
POST /api/agents/researchAgent/stream — streaming generation
POST /api/workflows/contentWorkflow/run — trigger a workflow
GET /api/workflows/contentWorkflow/runs/{runId} — check workflow status

The API is auto-generated from your Mastra instance definition. No manual route wiring.

Framework integration

Mastra integrates with popular Node.js frameworks as middleware:

Next.js (App Router):

// app/api/agents/[agentId]/route.ts
import { mastra } from "@/mastra";
import { NextRequest } from "next/server";

export async function POST(
  req: NextRequest,
  { params }: { params: { agentId: string } }
) {
  const agent = mastra.getAgent(params.agentId);
  const { message } = await req.json();
  const result = await agent.generate(message);
  return Response.json({ text: result.text });
}

Express:

import express from "express";
import { mastra } from "./mastra";

const app = express();
app.use(express.json());

app.post("/api/agents/:agentId/generate", async (req, res) => {
  const agent = mastra.getAgent(req.params.agentId);
  const result = await agent.generate(req.body.message);
  res.json({ text: result.text });
});

app.listen(3000);

Hono and SvelteKit are similarly supported with adapter packages. The pattern is the same: import your Mastra instance, call getAgent or getWorkflow, and handle the request.

Mastra Platform: Studio + Server + Memory Gateway

For teams that don’t want to manage their own infrastructure, Mastra offers a hosted platform:

Mastra Studio (cloud): Same dev UI, hosted and shared across your team
Mastra Server (cloud): Managed deployment of your agents and workflows
Memory Gateway: Hosted memory service with persistent storage, semantic recall, and cross-session state

Pricing and tiers

Tier	Price	Key Limits
Starter	Free	3 agents, 10k memory records, community support
Teams	$250/team/month	Unlimited agents, 1M memory records, priority support
Enterprise	Custom	Dedicated infra, SLA, SSO, custom integrations

The free tier is genuinely usable for prototyping and personal projects. The Teams tier is where production deployments land.

Mastra vs Other AI Frameworks: TypeScript-First Comparison

Mastra vs LangGraph vs CrewAI

Feature	Mastra	LangGraph	CrewAI
Language	TypeScript	Python	Python
Agent abstraction	`Agent` class with tools + memory	`StateGraph` with nodes/edges	`Crew` with `Agent` roles
Workflow model	Step-based with branch/parallel	State graph with conditional edges	Sequential/hierarchical process
Memory	Built-in (working + semantic recall)	Manual (checkpointer interface)	Short-term + long-term memory
Observability	Built-in OTel + Studio	LangSmith (separate product)	Manual or LangSmith
Eval framework	Built-in	LangSmith (separate product)	Not included
MCP support	Client + server	Client (via langchain-mcp)	Not native
RAG	Built-in module	Manual (LangChain retrieval)	Manual
Deployment	Built-in server + Platform	Manual or LangServe	Manual
License	Apache 2.0	MIT	MIT

The core distinction: LangGraph and CrewAI are Python frameworks that require the Python ecosystem for production deployment. If your stack is TypeScript, you’ll write a Python service, wrap it in a Docker container, and communicate via HTTP. That works, but it adds operational overhead and prevents you from sharing types, tests, and utilities across your codebase.

Mastra vs Vercel AI SDK

The Vercel AI SDK focuses on LLM integration at the transport layer: streaming responses, managing tool calls, and providing React hooks for chat UIs. Mastra operates at a higher level:

Vercel AI SDK: “How do I call an LLM and stream the response to a React component?”
Mastra: “How do I build an agent with memory, tools, and guardrails, evaluate it, and deploy it as an API?”

They’re complementary. Mastra uses the Vercel AI SDK’s model interface under the hood for LLM calls. You can use the Vercel AI SDK for your frontend chat UI and Mastra for your backend agent logic.

When to choose Mastra (and when not to)

Choose Mastra when:

Your backend is TypeScript/Node.js
You need structured agents with memory, tools, and guardrails
You want built-in evals and observability without extra tooling
You’re building multi-step workflows with conditional logic

Skip Mastra when:

Your team is Python-first and you’re happy with LangGraph or CrewAI
You only need raw LLM streaming (use Vercel AI SDK directly)
You need capabilities Mastra doesn’t support yet (e.g., specialized multimodal agent patterns)

Real-World Examples and Case Studies

Docker: Event-driven PR management agent

Docker built an event-driven agent system that responds to GitHub webhooks. When a PR is opened, their Mastra workflow:

Triggers on the webhook payload
Routes through the Docker MCP Gateway to the GitHub MCP server
Runs the analyze PR agent on the diff
Passes the analysis to the generate comment agent
Posts the comment via the post and close agent

This is not a chatbot. There’s no human in the loop. The entire pipeline runs in response to an event, with no user interaction. That’s a different deployment model from most demo agents, and it’s where Mastra’s workflow primitives matter—the pipeline is deterministic, observable, and can fail gracefully at any step.

Elastic: Agentic RAG with Elasticsearch

Elastic built a RAG assistant combining a Vite + React frontend, a Mastra backend, and Elasticsearch as the vector store. Their writeup highlights the developer experience of staying in a single language stack: the same TypeScript types that define the search index schema also define the agent’s tool interface and the frontend’s API contract.

SoftBank: Enterprise productivity at scale

SoftBank deployed Mastra-based agents internally for productivity tools. Scale is the notable aspect—this isn’t a prototype serving 10 users. Mastra’s memory gateway and observability infrastructure handle the traffic.

Replit: Agent 3 building Mastra agents

Replit’s Agent 3 can scaffold and deploy Mastra agents. This meta-pattern—an AI agent building other AI agents—validates Mastra’s code-first API design. If an LLM can generate valid Mastra code from a description, the abstractions are well-defined enough to be machine-writable.

Conclusion and Next Steps

Mastra addresses a real gap in the AI agent tooling ecosystem: a production-grade, TypeScript-native framework with first-class support for the primitives that matter—agents, tools, workflows, RAG, evals, and observability. The enterprise adoption numbers (100k+ daily users at Marsh McLennan, Brex’s $5.1B acquisition involvement) confirm that it’s not just developer-friendly but production-ready.

Key takeaways

TypeScript-first is now a viable choice for AI agents. With 60–70% of YC X25 agent startups choosing TypeScript, the ecosystem demand is clear. Mastra provides the framework primitives that Python alternatives have had for years.
MCP integration is a differentiator. Mastra’s ability to connect to any MCP server as a tool source gives agents access to external systems without custom integration code. Docker’s PR automation demonstrates this in production.
Built-in evals and observability are not optional extras. If you can’t measure agent quality, you can’t improve it. Mastra’s eval framework and OpenTelemetry integration give you measurement from day one.
Workflows complement agents. Not every problem needs an LLM deciding what to do next. Mastra’s workflow engine handles the structured part of your pipeline while agents handle the reasoning part.
The Gatsby team’s framework experience shows. The DX decisions—Zod schemas, code-first configuration, Studio as a dev tool—reflect experience shipping a framework used by tens of thousands of developers.

Resources

Mastra docs: mastra.ai/docs
Mastra GitHub: github.com/mastra-ai/mastra
Mastra templates: mastra.ai/templates
Agent Book: mastra.ai/agentbook — community-contributed agent examples
Community Discord: mastra.ai/community

The future of TypeScript AI development

The trajectory is clear. As agent deployments move from demos to production, the operational requirements—evals, observability, guardrails, memory management, workflow orchestration—become the differentiating factors. Mastra builds these into the framework rather than leaving them as integration exercises. For TypeScript teams building AI agents in 2026, Mastra is the framework that matches the language, runtime, and operational demands of the job.