<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Ag2 on RockB</title><link>https://baeseokjae.github.io/tags/ag2/</link><description>Recent content in Ag2 on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 19 Apr 2026 16:31:58 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/ag2/index.xml" rel="self" type="application/rss+xml"/><item><title>AG2 (AutoGen v0.4) Guide: Event-Driven Multi-Agent Framework for Python Developers</title><link>https://baeseokjae.github.io/posts/ag2-autogen-v0-4-guide-2026/</link><pubDate>Sun, 19 Apr 2026 16:31:58 +0000</pubDate><guid>https://baeseokjae.github.io/posts/ag2-autogen-v0-4-guide-2026/</guid><description>Complete guide to AG2 (AutoGen v0.4): architecture, ConversableAgent, GroupChat, async messaging, and production best practices for Python developers.</description><content:encoded><![CDATA[<p>AG2 (formerly Microsoft AutoGen, now maintained by the ag2ai community) is a Python framework for building multi-agent AI systems where multiple LLM-powered agents collaborate, debate, and execute tasks autonomously. The v0.4 rewrite introduced an async-first, event-driven architecture that makes AG2 one of the most capable frameworks for complex conversational agent pipelines in 2026.</p>
<h2 id="what-is-ag2-autogen-v04-and-why-it-matters-in-2026">What Is AG2 (AutoGen v0.4) and Why It Matters in 2026</h2>
<p>AG2 is an open-source Python framework that enables developers to build networks of LLM-powered agents that communicate with each other through structured message passing to solve complex tasks collaboratively. Originally released as Microsoft AutoGen, the project transitioned to the independent ag2ai organization in November 2024 with over 54,000 GitHub stars and millions of cumulative downloads. The v0.4 release was a complete architectural redesign — not an incremental update — focused on async-first execution, improved code quality, robustness, and scalability for production workloads. In 2026, AG2 powers document review pipelines at enterprise scale, code generation workflows in CI/CD systems, and research automation for data teams. The framework supports Python 3.10 through 3.13 and integrates with OpenAI, Anthropic, Google Gemini, Alibaba DashScope, and local models via Ollama. What makes AG2 distinctive is its conversation-centric model: agents don&rsquo;t just call tools — they argue, critique, refine, and reach consensus through structured dialogue, which is fundamentally different from how LangGraph or CrewAI approach orchestration.</p>
<p>The shift from v0.2 to v0.4 wasn&rsquo;t just about adding features. The v0.2 API was synchronous by default and relied heavily on <code>initiate_chat()</code> as the entry point for everything. V0.4 separates concerns into three distinct layers — Core, AgentChat, and Extensions — and makes asynchronous execution the primary pattern. If you&rsquo;re running AutoGen in production on v0.2, migration requires meaningful refactoring. If you&rsquo;re starting fresh in 2026, use AG2 v0.4 from the beginning.</p>
<h3 id="why-the-community-fork-happened">Why the Community Fork Happened</h3>
<p>Microsoft Research originally developed AutoGen as a research project. When the ag2ai community took over maintenance, it signaled a shift toward production stability over research experimentation. The AG2 team committed to semantic versioning, a stable public API, and a clear deprecation policy — things the research-focused AutoGen lacked. The GitHub community responded: the ag2ai/ag2 repo accumulated 20,000+ Discord members and 3,000+ GitHub forks within months of the transition.</p>
<h2 id="ag2-architecture-deep-dive-core-agentchat-and-extensions-layers">AG2 Architecture Deep Dive: Core, AgentChat, and Extensions Layers</h2>
<p>AG2&rsquo;s v0.4 architecture is organized into three layers that each serve a distinct purpose, allowing developers to work at the abstraction level that fits their use case — from low-level message control to high-level team orchestration. The <strong>Core layer</strong> (<code>autogen_core</code>) provides the fundamental runtime: the actor model, message routing, async event loop, and subscription system. The <strong>AgentChat layer</strong> (<code>autogen_agentchat</code>) builds on Core with pre-built agent types — <code>AssistantAgent</code>, <code>UserProxyAgent</code>, <code>ConversableAgent</code> — and team coordination patterns like <code>RoundRobinGroupChat</code> and <code>SelectorGroupChat</code>. The <strong>Extensions layer</strong> (<code>autogen_ext</code>) provides integrations with external systems: vector databases, code executors, LLM clients for different providers, and tool adapters.</p>
<p>This layered design matters practically: if you need custom routing logic or want to implement a novel agent communication pattern, you work at the Core layer. If you&rsquo;re building a standard multi-agent pipeline, AgentChat has everything you need. If you&rsquo;re integrating with Qdrant, running code in Docker, or using Azure OpenAI, Extensions handles it. Most developers will work entirely within AgentChat with occasional dips into Extensions.</p>
<p>The Core layer implements the <strong>actor model</strong>: each agent is an independent actor with its own message inbox, local state, and processing loop. Agents don&rsquo;t call each other directly — they publish messages to a runtime that routes them based on topic subscriptions. This is what makes AG2&rsquo;s event-driven pattern different from simple function chaining. An agent can subscribe to multiple message types, emit messages that trigger other agents asynchronously, and handle failures without blocking the entire pipeline.</p>
<h3 id="understanding-the-runtime">Understanding the Runtime</h3>
<p>The <code>SingleThreadedAgentRuntime</code> is the default for local development. For production distributed systems, AG2 provides distributed runtime support. The runtime manages agent lifecycle, handles message queuing, and enforces the subscription model. You register agents with the runtime, define their topic subscriptions, and then publish events — the runtime handles the rest.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_core <span style="color:#f92672">import</span> SingleThreadedAgentRuntime
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>runtime <span style="color:#f92672">=</span> SingleThreadedAgentRuntime()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">await</span> runtime<span style="color:#f92672">.</span>start()
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Register agents and publish messages</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">await</span> runtime<span style="color:#f92672">.</span>stop_when_idle()
</span></span></code></pre></div><h2 id="key-concepts-conversableagent-assistantagent-and-event-driven-messaging">Key Concepts: ConversableAgent, AssistantAgent, and Event-Driven Messaging</h2>
<p>AG2&rsquo;s agent model centers on <code>ConversableAgent</code> — the base class that every agent in the AgentChat layer inherits from — which implements the core protocol for sending, receiving, and responding to messages within a multi-agent conversation. Every agent in AG2 can initiate a conversation, respond to messages, call tools, and delegate subtasks to other agents. <code>AssistantAgent</code> extends <code>ConversableAgent</code> with LLM-backed reasoning: it takes messages, constructs prompts, calls the configured LLM, and returns structured responses. <code>UserProxyAgent</code> acts as a human-in-the-loop stand-in: it can execute code, request human input, or auto-reply based on configured rules.</p>
<p>The event-driven messaging model in v0.4 works differently from the synchronous <code>initiate_chat()</code> pattern in v0.2. Instead of one agent kicking off a blocking conversation, agents publish messages to typed topics. Other agents that have subscribed to those topic types receive the messages and process them in their own async loops. This enables genuinely parallel agent execution — multiple agents can process messages simultaneously without waiting for each other.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.agents <span style="color:#f92672">import</span> AssistantAgent
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.teams <span style="color:#f92672">import</span> RoundRobinGroupChat
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_ext.models.openai <span style="color:#f92672">import</span> OpenAIChatCompletionClient
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>model_client <span style="color:#f92672">=</span> OpenAIChatCompletionClient(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4o&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>planner <span style="color:#f92672">=</span> AssistantAgent(
</span></span><span style="display:flex;"><span>    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;planner&#34;</span>,
</span></span><span style="display:flex;"><span>    model_client<span style="color:#f92672">=</span>model_client,
</span></span><span style="display:flex;"><span>    system_message<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You break complex tasks into actionable steps.&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>executor <span style="color:#f92672">=</span> AssistantAgent(
</span></span><span style="display:flex;"><span>    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;executor&#34;</span>,
</span></span><span style="display:flex;"><span>    model_client<span style="color:#f92672">=</span>model_client,
</span></span><span style="display:flex;"><span>    system_message<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You implement the steps provided by the planner.&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h3 id="tools-and-function-calling">Tools and Function Calling</h3>
<p>AG2 agents call Python functions as tools through the standard function-calling API. You define tools as regular Python functions with type annotations, register them with an agent, and the agent decides when to call them based on conversation context. AG2 supports OpenAI&rsquo;s function calling format and automatically generates the JSON schema from Python type hints.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">search_docs</span>(query: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&#34;&#34;Search internal documentation for the given query.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># implementation here</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> results
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>agent <span style="color:#f92672">=</span> AssistantAgent(
</span></span><span style="display:flex;"><span>    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;researcher&#34;</span>,
</span></span><span style="display:flex;"><span>    model_client<span style="color:#f92672">=</span>model_client,
</span></span><span style="display:flex;"><span>    tools<span style="color:#f92672">=</span>[search_docs]
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><h2 id="getting-started-installing-ag2-and-your-first-multi-agent-system">Getting Started: Installing AG2 and Your First Multi-Agent System</h2>
<p>Installing AG2 and running your first multi-agent conversation requires Python 3.10+ and three pip packages — <code>autogen-agentchat</code> for the high-level agent API, <code>autogen-ext</code> for LLM provider clients, and optionally <code>autogen-core</code> if you need direct runtime access. The separation into multiple packages is intentional: it keeps dependency footprints small. A project that only needs OpenAI doesn&rsquo;t pull in Anthropic or Gemini client libraries.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install autogen-agentchat autogen-ext<span style="color:#f92672">[</span>openai<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>For Anthropic Claude or Google Gemini:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install autogen-ext<span style="color:#f92672">[</span>anthropic<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>pip install autogen-ext<span style="color:#f92672">[</span>gemini<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>For local models via Ollama:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install autogen-ext<span style="color:#f92672">[</span>ollama<span style="color:#f92672">]</span>
</span></span></code></pre></div><p>Here&rsquo;s a minimal two-agent system that solves a coding task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.agents <span style="color:#f92672">import</span> AssistantAgent, UserProxyAgent
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.teams <span style="color:#f92672">import</span> RoundRobinGroupChat
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.conditions <span style="color:#f92672">import</span> TextMentionTermination
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_ext.models.openai <span style="color:#f92672">import</span> OpenAIChatCompletionClient
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">main</span>():
</span></span><span style="display:flex;"><span>    model_client <span style="color:#f92672">=</span> OpenAIChatCompletionClient(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4o-mini&#34;</span>)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    assistant <span style="color:#f92672">=</span> AssistantAgent(
</span></span><span style="display:flex;"><span>        name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;assistant&#34;</span>,
</span></span><span style="display:flex;"><span>        model_client<span style="color:#f92672">=</span>model_client,
</span></span><span style="display:flex;"><span>        system_message<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You are a helpful Python developer. Solve the task and say TERMINATE when done.&#34;</span>
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    user_proxy <span style="color:#f92672">=</span> UserProxyAgent(
</span></span><span style="display:flex;"><span>        name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;user_proxy&#34;</span>,
</span></span><span style="display:flex;"><span>        human_input_mode<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;NEVER&#34;</span>,
</span></span><span style="display:flex;"><span>        code_execution_config<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;use_docker&#34;</span>: <span style="color:#66d9ef">False</span>}
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    termination <span style="color:#f92672">=</span> TextMentionTermination(<span style="color:#e6db74">&#34;TERMINATE&#34;</span>)
</span></span><span style="display:flex;"><span>    team <span style="color:#f92672">=</span> RoundRobinGroupChat([assistant, user_proxy], termination_condition<span style="color:#f92672">=</span>termination)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    result <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> team<span style="color:#f92672">.</span>run(task<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;Write a Python function that finds all prime numbers up to N using the Sieve of Eratosthenes.&#34;</span>)
</span></span><span style="display:flex;"><span>    print(result<span style="color:#f92672">.</span>messages[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]<span style="color:#f92672">.</span>content)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>asyncio<span style="color:#f92672">.</span>run(main())
</span></span></code></pre></div><h3 id="configuring-llm-providers">Configuring LLM Providers</h3>
<p>AG2 uses provider-specific client classes from <code>autogen_ext.models</code>. This is different from v0.2&rsquo;s config list approach. You instantiate a client for your provider and pass it to agents directly:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># OpenAI</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_ext.models.openai <span style="color:#f92672">import</span> OpenAIChatCompletionClient
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OpenAIChatCompletionClient(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4o&#34;</span>, api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;sk-...&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Anthropic</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_ext.models.anthropic <span style="color:#f92672">import</span> AnthropicChatCompletionClient
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> AnthropicChatCompletionClient(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;claude-sonnet-4-6&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Ollama (local)</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_ext.models.ollama <span style="color:#f92672">import</span> OllamaChatCompletionClient
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> OllamaChatCompletionClient(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;llama3.2&#34;</span>)
</span></span></code></pre></div><h2 id="building-real-world-pipelines-groupchat-swarms-and-nested-chats">Building Real-World Pipelines: GroupChat, Swarms, and Nested Chats</h2>
<p>AG2&rsquo;s power emerges in multi-agent orchestration patterns — GroupChat for turn-based collaboration, Swarms for dynamic handoffs, and nested chats for hierarchical task decomposition. These patterns let you build pipelines where agents specialize, delegate, and verify each other&rsquo;s work rather than relying on a single LLM to do everything. A 4-agent GroupChat with 5 rounds generates at least 20 LLM calls, so pattern selection has direct cost implications. Choosing the right orchestration pattern for your task type is one of the most important architectural decisions in an AG2 system.</p>
<p><strong>RoundRobinGroupChat</strong> cycles through agents in fixed order — simple, predictable, good for sequential workflows where each agent has a distinct phase:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.teams <span style="color:#f92672">import</span> RoundRobinGroupChat
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>team <span style="color:#f92672">=</span> RoundRobinGroupChat(
</span></span><span style="display:flex;"><span>    participants<span style="color:#f92672">=</span>[researcher, writer, reviewer],
</span></span><span style="display:flex;"><span>    termination_condition<span style="color:#f92672">=</span>TextMentionTermination(<span style="color:#e6db74">&#34;APPROVED&#34;</span>)
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p><strong>SelectorGroupChat</strong> uses an LLM to dynamically select the next speaker based on conversation context — better for complex workflows where the optimal next step depends on what&rsquo;s happened so far:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.teams <span style="color:#f92672">import</span> SelectorGroupChat
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>team <span style="color:#f92672">=</span> SelectorGroupChat(
</span></span><span style="display:flex;"><span>    participants<span style="color:#f92672">=</span>[planner, coder, tester, debugger],
</span></span><span style="display:flex;"><span>    model_client<span style="color:#f92672">=</span>model_client,
</span></span><span style="display:flex;"><span>    selector_prompt<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;Based on the conversation, select the most appropriate next agent.&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p><strong>Swarm</strong> implements handoff-based routing: agents pass control to each other explicitly using <code>HandoffMessage</code>. This is the pattern for customer service bots, triage systems, or any workflow where each agent knows when to escalate or delegate:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.teams <span style="color:#f92672">import</span> Swarm
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen_agentchat.messages <span style="color:#f92672">import</span> HandoffMessage
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Agents use HandoffMessage to transfer control</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Swarm routes to the specified agent automatically</span>
</span></span><span style="display:flex;"><span>team <span style="color:#f92672">=</span> Swarm(participants<span style="color:#f92672">=</span>[triage_agent, billing_agent, support_agent])
</span></span></code></pre></div><h3 id="nested-chats-for-complex-decomposition">Nested Chats for Complex Decomposition</h3>
<p>Nested chats let a parent agent kick off an entire sub-conversation as part of its own reasoning. This is powerful for research tasks where an agent needs to gather information from multiple specialized sub-agents before synthesizing a response. In v0.4, you implement nested chats by having an agent&rsquo;s tool call <code>initiate_chat()</code> internally, creating a new conversation context.</p>
<h2 id="ag2-vs-langgraph-vs-crewai-choosing-the-right-framework-in-2026">AG2 vs LangGraph vs CrewAI: Choosing the Right Framework in 2026</h2>
<p>AG2 excels at multi-party conversational workflows, consensus-building, and scenarios where agents need to debate or critique each other — LangGraph is better for deterministic state machines with complex branching logic, and CrewAI is better for simple role-based pipelines where ease of setup matters more than flexibility. This is the practical decision guide based on actual production use patterns in 2026. All three frameworks are mature enough for production, but they optimize for fundamentally different problem shapes. The wrong choice means fighting the framework; the right choice means the framework amplifies your design.</p>
<table>
  <thead>
      <tr>
          <th>Criteria</th>
          <th>AG2</th>
          <th>LangGraph</th>
          <th>CrewAI</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Primary pattern</strong></td>
          <td>Conversational agents</td>
          <td>State machine graphs</td>
          <td>Role-based crews</td>
      </tr>
      <tr>
          <td><strong>Learning curve</strong></td>
          <td>Medium</td>
          <td>High</td>
          <td>Low</td>
      </tr>
      <tr>
          <td><strong>Async support</strong></td>
          <td>Native (v0.4)</td>
          <td>Yes</td>
          <td>Limited</td>
      </tr>
      <tr>
          <td><strong>Human-in-loop</strong></td>
          <td>Built-in</td>
          <td>Manual</td>
          <td>Basic</td>
      </tr>
      <tr>
          <td><strong>Debugging</strong></td>
          <td>Conversation logs</td>
          <td>Graph visualization</td>
          <td>Simple logs</td>
      </tr>
      <tr>
          <td><strong>Best for</strong></td>
          <td>Group debates, consensus</td>
          <td>Complex branching workflows</td>
          <td>Simple automation</td>
      </tr>
      <tr>
          <td><strong>Python skill needed</strong></td>
          <td>Intermediate</td>
          <td>Advanced</td>
          <td>Beginner-friendly</td>
      </tr>
      <tr>
          <td><strong>Cost per run</strong></td>
          <td>High (many LLM calls)</td>
          <td>Controllable</td>
          <td>Medium</td>
      </tr>
  </tbody>
</table>
<p><strong>Choose AG2 when:</strong></p>
<ul>
<li>Your task benefits from agents critiquing each other&rsquo;s work (code review, document editing, research validation)</li>
<li>You need flexible conversation routing that depends on semantic content</li>
<li>You&rsquo;re building customer service, tutoring, or debate-style applications</li>
<li>You want native async with multi-provider LLM support</li>
</ul>
<p><strong>Choose LangGraph when:</strong></p>
<ul>
<li>Your workflow has predictable branches with clear state transitions</li>
<li>You need fine-grained control over every execution step</li>
<li>You&rsquo;re building workflows where correctness is more important than flexibility</li>
<li>Your team has strong Python and graph-theory background</li>
</ul>
<p><strong>Choose CrewAI when:</strong></p>
<ul>
<li>You need to ship fast and the workflow is straightforward</li>
<li>Non-engineers are defining the agent roles and tasks</li>
<li>The task doesn&rsquo;t require complex inter-agent negotiation</li>
</ul>
<h3 id="migration-from-autogen-v02-to-ag2-v04">Migration from AutoGen v0.2 to AG2 v0.4</h3>
<p>The v0.2 to v0.4 migration involves breaking changes at every level. Key changes:</p>
<ol>
<li><strong>Import paths changed</strong>: <code>from autogen import AssistantAgent</code> → <code>from autogen_agentchat.agents import AssistantAgent</code></li>
<li><strong>Config list removed</strong>: Replace <code>llm_config={&quot;config_list&quot;: [...]}</code> with provider-specific client objects</li>
<li><strong><code>initiate_chat()</code> deprecated</strong>: Use team-based APIs with <code>await team.run(task=...)</code></li>
<li><strong>Synchronous code won&rsquo;t work</strong>: Everything is async — wrap with <code>asyncio.run()</code> or use <code>asyncio.get_event_loop()</code></li>
</ol>
<h2 id="production-best-practices-cost-control-state-management-and-observability">Production Best Practices: Cost Control, State Management, and Observability</h2>
<p>Running AG2 in production requires explicit strategies for controlling LLM costs, persisting conversation state across sessions, and observing agent behavior — because the default configuration optimizes for flexibility, not cost or reliability. A 4-agent GroupChat with 5 rounds generates at least 20 LLM calls, each sending the full conversation history as context. Without cost controls, a single complex task can consume $5–$20 in API calls. With the right patterns, you can cut that by 60–80% while maintaining output quality.</p>
<p><strong>Cost Control Strategies:</strong></p>
<ol>
<li>
<p><strong>Use cheaper models for simple agents</strong>: Route tool-calling agents to <code>gpt-4o-mini</code> or <code>claude-haiku-4-5</code> and reserve expensive models for reasoning-heavy agents</p>
</li>
<li>
<p><strong>Set max_turns explicitly</strong>: Always cap GroupChat rounds:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>team <span style="color:#f92672">=</span> RoundRobinGroupChat(participants<span style="color:#f92672">=</span>[<span style="color:#f92672">...</span>], max_turns<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>)
</span></span></code></pre></div></li>
<li>
<p><strong>Cache LLM responses</strong>: For deterministic subtasks (document classification, entity extraction), cache results to avoid redundant LLM calls</p>
</li>
<li>
<p><strong>Use selective context</strong>: AG2 v0.4 supports message filtering — don&rsquo;t send the entire conversation history to every agent for every turn</p>
</li>
</ol>
<p><strong>State Persistence:</strong></p>
<p>AG2 v0.4 introduces <code>save_state()</code> and <code>load_state()</code> on team objects, enabling conversation checkpointing:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Save after completion</span>
</span></span><span style="display:flex;"><span>state <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> team<span style="color:#f92672">.</span>save_state()
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;checkpoint.json&#34;</span>, <span style="color:#e6db74">&#34;w&#34;</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span>    json<span style="color:#f92672">.</span>dump(state, f)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Resume from checkpoint</span>
</span></span><span style="display:flex;"><span>new_team <span style="color:#f92672">=</span> RoundRobinGroupChat(participants<span style="color:#f92672">=</span>[<span style="color:#f92672">...</span>])
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;checkpoint.json&#34;</span>) <span style="color:#66d9ef">as</span> f:
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">await</span> new_team<span style="color:#f92672">.</span>load_state(json<span style="color:#f92672">.</span>load(f))
</span></span><span style="display:flex;"><span>result <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> new_team<span style="color:#f92672">.</span>run(task<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;Continue from where we left off&#34;</span>)
</span></span></code></pre></div><p><strong>Observability:</strong></p>
<p>AG2 integrates with OpenTelemetry for distributed tracing. Each LLM call, tool invocation, and agent message is a traceable span. For production systems, connect to Jaeger, Datadog, or Honeycomb:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> opentelemetry <span style="color:#f92672">import</span> trace
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> opentelemetry.sdk.trace <span style="color:#f92672">import</span> TracerProvider
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>tracer_provider <span style="color:#f92672">=</span> TracerProvider()
</span></span><span style="display:flex;"><span>trace<span style="color:#f92672">.</span>set_tracer_provider(tracer_provider)
</span></span><span style="display:flex;"><span><span style="color:#75715e"># AG2 automatically instruments LLM calls and agent messages</span>
</span></span></code></pre></div><h3 id="error-handling-and-retries">Error Handling and Retries</h3>
<p>AG2 agents can fail silently if LLM calls time out or return malformed responses. Implement explicit retry logic at the team level and validate agent outputs before passing them downstream. The <code>on_messages_stream()</code> method lets you inspect messages in real-time and terminate early if an agent enters a failure loop.</p>
<h2 id="ag2-beta-and-the-road-to-v10-what-python-developers-need-to-know">AG2 Beta and the Road to v1.0: What Python Developers Need to Know</h2>
<p>AG2 Beta (<code>autogen.beta</code>) previews the v1.0 architecture, which introduces streaming-first agent responses, improved memory systems, and a unified tool registry that works across all agent types — changes that will affect how you build production systems starting in late 2026. The Beta track is importable today as <code>from autogen.beta import ...</code> alongside the stable v0.4 API. The ag2ai team has committed to not breaking stable v0.4 APIs before a 6-month deprecation window, but Beta APIs can change without notice. The most significant v1.0 changes for Python developers are:</p>
<p><strong>Streaming responses</strong>: V1.0 makes streaming the default for all LLM calls, enabling real-time output for user-facing applications. In v0.4, streaming requires explicit configuration per agent. In v1.0, it&rsquo;s automatic with a unified <code>on_token()</code> callback.</p>
<p><strong>Memory architecture</strong>: V1.0 introduces pluggable memory backends. Agents can store and retrieve context from vector databases (Qdrant, Pinecone, Chroma) without custom tool implementations. This replaces the manual retrieval patterns required in v0.4.</p>
<p><strong>Unified tool registry</strong>: In v0.4, each agent has its own tool list. V1.0 introduces a shared registry where tools can be discovered and used by any agent in the system, reducing code duplication in large multi-agent pipelines.</p>
<p><strong>What to do now</strong>: Build on stable v0.4 APIs for production systems. Experiment with <code>autogen.beta</code> in development to prepare for migration. Watch the ag2ai/ag2 GitHub releases for the v1.0 roadmap — the community is active and the release cadence is roughly quarterly.</p>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Q: Is AG2 the same as AutoGen?</strong>
AG2 is the community continuation of Microsoft AutoGen. After the ag2ai organization took over in November 2024, they published the package as <code>ag2</code> on PyPI while maintaining the <code>autogen</code> namespace for backward compatibility. The codebase is the same project, now with community governance instead of Microsoft Research ownership.</p>
<p><strong>Q: Can I use AG2 with local LLMs?</strong>
Yes. AG2 v0.4 supports Ollama via <code>autogen_ext.models.ollama.OllamaChatCompletionClient</code>. Install <code>pip install autogen-ext[ollama]</code>, start Ollama locally with <code>ollama serve</code>, and configure an <code>OllamaChatCompletionClient</code> pointing to <code>http://localhost:11434</code>. This enables fully offline multi-agent systems with models like Llama 3.2 or Mistral.</p>
<p><strong>Q: How does AG2 v0.4 differ from v0.2 in practice?</strong>
V0.4 requires async code everywhere — you can&rsquo;t run <code>initiate_chat()</code> synchronously. The import paths changed (now <code>autogen_agentchat</code>, <code>autogen_core</code>, <code>autogen_ext</code> instead of just <code>autogen</code>). LLM configuration moved from config lists to provider-specific client objects. Team-based APIs replaced the direct <code>initiate_chat()</code> pattern. Plan for a meaningful refactoring effort when migrating from v0.2.</p>
<p><strong>Q: How much does running AG2 cost in production?</strong>
Cost depends heavily on model choice and GroupChat configuration. A 4-agent GroupChat with 5 rounds generates at least 20 LLM calls. Using <code>gpt-4o-mini</code> ($0.15/1M input tokens) instead of <code>gpt-4o</code> ($2.50/1M input tokens) can reduce costs by 94% for agents that don&rsquo;t require advanced reasoning. Budget for 50–200 tokens of conversation history per message multiplied by the number of agents and rounds.</p>
<p><strong>Q: Is AG2 ready for production in 2026?</strong>
Yes, with caveats. The stable v0.4 API is production-ready. The ag2ai community has implemented semantic versioning, a deprecation policy, and a stable public API contract. Large-scale enterprise deployment requires custom work for state persistence, observability, and cost management — AG2 provides the building blocks but doesn&rsquo;t solve these problems out of the box. For most teams building internal tools, automation pipelines, or customer-facing agents, v0.4 is stable enough to ship.</p>
]]></content:encoded></item></channel></rss>