Microsoft on RockB

Microsoft Agent Governance Toolkit: Open-Source Runtime Security for AI Agents

Fri, 15 May 2026 12:07:30 +0000

Released on April 2, 2026, the Microsoft Agent Governance Toolkit is the first open-source runtime security framework to address all ten risks on the OWASP Agentic AI Top 10. Shipped under the MIT license, it provides deterministic policy enforcement at the agent action layer with less than 5ms overhead per evaluated action. As the agentic AI security market grows from a projected $1.65 billion in 2026 toward an estimated $13.52 billion by 2032 at roughly 42% CAGR, this toolkit arrives at exactly the moment enterprises need a vendor-neutral, community-owned standard for governing what their AI agents are actually permitted to do.

Microsoft Agent Governance Toolkit: Open-Source Runtime Security for All OWASP Agentic AI Risks

With 88% of organizations reporting confirmed or suspected AI agent security incidents in the past year, the pressure to govern agentic AI behavior has reached a breaking point. Microsoft launched the Agent Governance Toolkit on April 2, 2026 as a direct response to this crisis — an open-source framework released under the MIT license that intercepts every agent action before execution and evaluates it against a configurable policy ruleset. The toolkit integrates natively with the four most widely deployed agent frameworks: LangChain, AutoGen, Semantic Kernel, and the OpenAI Agents SDK. This means teams can add governance without rewriting existing agent code — a practical necessity given that most enterprise agent deployments are already in production. The toolkit’s architecture is modular: the core policy engine, called Agent OS, can be deployed independently, and additional modules for identity management, reliability engineering, and regulatory compliance can be layered in as operational requirements grow. Most significantly, the toolkit is the first open-source project to explicitly demonstrate coverage of all ten OWASP Agentic AI Top 10 risks with deterministic controls, making it the only framework where a security team can trace every item on the published threat taxonomy to a specific, testable enforcement mechanism.

Why Runtime Enforcement Is Different from Prompt-Level Safety

Prompt-level safety instructions — telling the model what not to do inside the system prompt — remain the most common approach to agent safety, but they have measurable failure rates under adversarial conditions. Runtime enforcement operates at the code layer, outside the model’s reasoning process, which means a policy blocking unauthorized external writes will block them regardless of what the model was instructed or what it decides to attempt. That architectural separation is the core distinction between the Agent Governance Toolkit and prompt-engineering-based safety approaches.

The OWASP Agentic AI Top 10: The Risks the Toolkit Addresses

The OWASP Agentic AI Top 10 is the published industry taxonomy of AI agent security risks, and 80% of organizations have already encountered at least one of these risks through unauthorized data exposure or other dangerous agent behaviors observed in the past year. The Microsoft Agent Governance Toolkit was architected with this list as its primary specification: every design decision traces back to a named OWASP risk, and every OWASP risk maps to at least one concrete enforcement control inside the toolkit. This bidirectional traceability matters enormously for security teams, because it allows compliance posture to be audited against a recognized external standard rather than informally asserted. The ten risks span the full lifecycle of an agent deployment — from what the agent is prompted to do, through how it uses its tools and memory, all the way to how it interacts with other agents and external systems.

OWASP Agentic AI Risk	Toolkit Control Mechanism
Prompt Injection	Semantic intent classifier at the policy evaluation layer
Excessive Agency	Least-privilege policy defaults; action scope limits per agent role
Data Exfiltration	Outbound data flow rules; domain allowlists for HTTP actions
Privilege Escalation	Role-scoped action permissions; escalation attempt detection
Insecure Agent Memory	Cross-agent memory validation and isolation controls
Supply Chain Risk	Plugin signing, verification, and SBOM generation
Opaque Execution	Full audit trail with structured logging for every action
Trust Propagation	Scoped trust chains between agent identities
Compliance Drift	Regulatory framework policy mappings with automated reporting
Irreversible Actions	Circuit breaker patterns and rollback hooks for destructive operations

The structured mapping in the table above is not just documentation — it is operationalized. Each row corresponds to one or more policy rules that can be loaded into the Agent OS engine, evaluated in real time against every agent action, and audited by name in the toolkit’s logs. For organizations required to demonstrate OWASP alignment to a board, regulator, or enterprise customer, this traceability is a meaningful differentiator from toolkits that address security in general terms.

Agent OS: The Stateless Policy Engine at the Core

The most important architectural decision in the entire toolkit is that Agent OS, the core policy engine, is stateless. Every agent action is evaluated independently against the configured ruleset, with no session context or accumulated state that an adversary could manipulate over time to relax future policy decisions. This stateless design is precisely why the toolkit achieves less than 5ms overhead per evaluated action — there is no shared state to synchronize, no distributed cache to invalidate, and no coordination overhead that would create bottlenecks in concurrent multi-agent deployments. Agent OS positions itself in the execution path between an agent’s decision to perform an action and the actual execution of that action against an external system. If the policy evaluation returns DENY, the action is blocked before it touches any file, API, database, or downstream agent. If evaluation returns ALLOW, execution proceeds immediately. The entire interception and evaluation cycle adds less than 5ms to the agent’s action latency — a rounding error compared to the hundreds of milliseconds typically consumed by LLM inference or external API calls.

The stateless architecture has a second important implication: horizontal scalability. Because each policy evaluation is entirely self-contained, Agent OS instances can be replicated across as many nodes as needed without any coordination overhead whatsoever. For enterprises running large fleets of AI agents distributed across multiple cloud regions and availability zones, governance does not become a bottleneck as the agent footprint scales. The same policy file that governs a single development agent governs thousands of production agents identically, with no additional operational complexity introduced by the governance layer itself.

How Agent OS Intercepts Agent Actions in Practice

Agent OS inserts itself into the agent’s execution path as a middleware layer. When an agent decides to call a tool, write a file, make an HTTP request, or invoke a downstream agent, that intent is passed to Agent OS as a structured action object containing the action type, target, parameters, and the cryptographic identity of the requesting agent. The policy engine evaluates this object against the loaded ruleset and returns a permit or deny decision — along with any conditions, rate-limit directives, or audit flags specified in the matching policy — before the action is dispatched to the target system.

Policy Languages: YAML, OPA Rego, and Cedar for Agent Rules

Support for three distinct policy languages — YAML for human-readable rules, OPA Rego for complex conditional logic, and Cedar for formally verifiable authorization semantics — is one of the toolkit’s most deliberate design decisions, and it reflects the reality that different organizations have fundamentally different requirements for how they express and validate security policy. A startup shipping its first governed agent can start with a few lines of YAML and achieve meaningful protection on day one. A financial services firm deploying agents in a strictly regulated environment may require Cedar’s formal verification properties to satisfy an auditor that certain high-risk actions are provably impossible regardless of model behavior. An infrastructure team already using Open Policy Agent for Kubernetes and cloud resource governance can reuse their existing Rego expertise to write agent policies without learning a new language. All three policy formats coexist within the same Agent OS deployment, evaluated by the same engine against the same agent action stream, which means organizations can adopt incrementally and mix policy languages as their needs dictate.

YAML policies are the lowest-friction entry point. A minimal policy file defining a handful of DENY rules for the most common attack vectors — unauthorized outbound HTTP calls, writes outside approved directories, privilege escalation attempts — can be authored by any engineer familiar with YAML in under an hour. OPA Rego unlocks the full expressive power of a purpose-built policy query language, including complex joins, aggregations, and recursive rules that are difficult or impossible to express in YAML. Cedar, originally developed by AWS for its Verified Permissions authorization service, brings the property of formal analyzability: a static analysis tool can examine a Cedar policy and mathematically prove what action classes it will and will not permit, without executing it against live traffic. That formal verification capability moves agent governance from “we believe this policy works” to “we have proven this policy is correct,” which is a qualitatively different level of assurance for high-stakes deployments.

# Example YAML policy: block writes outside approved paths, rate-limit API calls
policies:
  - id: block-external-write
    condition:
      action.type == "file_write" AND
      action.target NOT IN allowed_paths
    effect: DENY

  - id: rate-limit-api-calls
    condition:
      action.type == "api_call"
    rate_limit:
      requests: 10
      window: "1s"
    effect: ALLOW_WITH_LIMIT

Policies are loaded from files at startup and can be hot-reloaded without restarting the agent runtime, giving operators the ability to respond to newly identified attack patterns or policy violations by pushing a policy update through their standard CI/CD pipeline.

Integration: LangChain, AutoGen, Semantic Kernel, and OpenAI Agents SDK

The toolkit was designed from the start for zero-rewrite integration with existing agent frameworks, and the four primary supported integrations — LangChain, AutoGen, Semantic Kernel, and the OpenAI Agents SDK — cover the dominant share of enterprise agent deployments in production as of mid-2026. Each integration uses that framework’s own native extension points: LangChain callbacks, AutoGen middleware hooks, Semantic Kernel plugins, and the OpenAI Agents SDK’s tool execution pipeline. The consequence is that an engineering team running an existing LangChain agent can add governance in a few lines of configuration, without modifying agent logic, tool definitions, prompt templates, or any other aspect of the existing application. This “no rewrite” principle is not merely a developer convenience — it is a deployment risk reduction strategy, because the governed agent behaves identically to the ungoverned agent for all permitted actions, and only prohibited actions are affected by the governance layer.

# LangChain integration: governance added as a callback, zero code changes to the agent
from langchain.agents import AgentExecutor
from agent_governance import GovernanceMiddleware

agent = AgentExecutor(
    agent=my_agent,
    tools=my_tools,
    callbacks=[
        GovernanceMiddleware(
            policy_path="./policies/production.yaml"
        )
    ]
)

For AutoGen multi-agent workflows, the middleware intercepts both inbound and outbound agent messages, which means policy rules can govern not only what an agent does in the external world but also what instructions one agent sends to another. This is directly relevant to the OWASP Agentic AI risks around trust propagation and prompt injection delivered via inter-agent communication channels, which are attack surfaces that single-agent governance tools cannot address.

The OpenAI Agents SDK integration is significant because it means the governance layer is not tied to any specific model provider. Organizations building on OpenAI’s infrastructure can apply toolkit policies to agents using OpenAI models, while organizations using Azure OpenAI or other providers use the same policy files with the same enforcement semantics. Semantic Kernel integration makes the toolkit immediately applicable to Microsoft’s enterprise customer base already building production agents with that framework, and it enables natural federation with Azure AI and Entra ID for organizations that want to express agent permissions in terms of existing enterprise identity structures.

Azure AI and Microsoft Entra ID Integration

For organizations on the Microsoft cloud stack, the optional integration with Azure AI services and Microsoft Entra ID allows agent identities to be mapped to enterprise directory objects. Access policies can then be written in terms of organizational roles, department memberships, and security group assignments — the same constructs already governing human user access — rather than requiring a separate identity management infrastructure maintained exclusively for AI agents.

Performance: Sub-5ms Policy Evaluation in Production

The most persistent objection to runtime security enforcement is the assumption that it adds unacceptable latency to agent operations. The Agent Governance Toolkit addresses this directly with a design target of less than 5ms overhead per action evaluated by Agent OS, and this performance level is achievable in production at scale, not just in laboratory benchmarks. To contextualize that number: a typical LLM inference call for a mid-sized model running at production load takes between 500ms and several seconds. A tool invocation triggering an external API call adds 100ms to 2 seconds depending on network conditions. Against this backdrop, a 5ms governance overhead is a rounding error in the overall agent response time — it does not perceptibly slow agent workflows, and it does not force engineering teams into tradeoffs between security coverage and application responsiveness. The practical implication is significant: governance can be applied universally, to every agent action without exception, rather than to a sampled or heuristically selected subset. A governance layer that evaluates only some actions provides the appearance of control without its substance, and the sub-5ms overhead removes any engineering incentive to skip evaluations for performance reasons.

This performance level is achievable because of the stateless, in-process design of Agent OS. Policy evaluation requires no network calls, no database round-trips, and no distributed coordination. The policy ruleset is compiled to an internal representation at startup and evaluated entirely in memory for each action object. YAML and OPA Rego policies are compiled at load time so that evaluation is fast even for rules with complex boolean logic. Cedar policies run through Cedar’s own purpose-built, formally optimized evaluation engine. The combination produces an enforcement system that is fast enough to be deployed unconditionally without architectural compromises.

Benchmarking Against Specific Workloads

Published performance figures reflect benchmark conditions that may differ from production environments with very high action throughput or unusually complex OPA Rego rulesets involving large data joins or recursive rules. Teams with latency-sensitive workflows should benchmark against their specific policy configuration in a staging environment before relying on the headline figure. The toolkit’s documentation provides optimization guidance for reducing evaluation latency in complex policy scenarios, including techniques for pre-compiling frequently evaluated rules and structuring policy files for faster pattern matching.

Microsoft Agent Governance Toolkit vs Lakera Guard vs CalypsoAI

The AI agent security tooling landscape has expanded rapidly alongside the market’s projected growth from $1.65 billion in 2026 to $13.52 billion by 2032, and the Microsoft Agent Governance Toolkit occupies a distinct position relative to the two most commonly evaluated alternatives: Lakera Guard and CalypsoAI. Understanding those differences clearly is essential for teams making adoption decisions under budget and timeline pressure. Lakera Guard is a focused and well-regarded solution for prompt injection detection and LLM input/output safety filtering. It executes its defined scope effectively, but that scope covers primarily one item from the OWASP Agentic AI Top 10 — prompt injection — rather than the full ten-item risk surface. CalypsoAI offers a broader enterprise governance platform with meaningful capabilities, but it operates under a proprietary license and enterprise pricing model that introduces vendor dependency and prevents the community code audits that security-conscious organizations increasingly require. The Microsoft Agent Governance Toolkit is the only open-source option in this field that explicitly targets and demonstrates coverage of all ten OWASP Agentic AI Top 10 risks.

Criterion	Microsoft Agent Governance Toolkit	Lakera Guard	CalypsoAI
OWASP Agentic AI Top 10 coverage	All 10 risks	Primarily prompt injection	Broad but proprietary
License	MIT (open source)	Proprietary SaaS	Proprietary enterprise
Policy engine	Agent OS (stateless, sub-5ms)	Not applicable	Proprietary runtime
LangChain / AutoGen / Semantic Kernel / OpenAI SDK	Native integration for all four	LLM API layer only	Varies by deployment
Policy languages	YAML, OPA Rego, Cedar	Not applicable	Proprietary configuration
Enterprise identity integration	Azure AI and Entra ID	Limited	Yes, proprietary
Pricing model	Free (MIT license)	Subscription per seat/request	Enterprise contract
Community auditability	Full source code available	Closed source	Closed source

The decision between these tools comes down to three factors: scope of required coverage, licensing constraints, and ecosystem fit. Organizations that need only prompt injection protection, operate on a tight budget for security tooling, and have no requirement for open-source auditability may find Lakera Guard’s focused capability sufficient. Organizations in regulated industries that need full OWASP coverage, reproducible compliance evidence, and source code access for security audits will find the Microsoft toolkit is currently the only option satisfying all three constraints simultaneously. CalypsoAI occupies an enterprise segment where vendor support contracts and managed deployment are priorities above open-source availability.

When Lakera Guard Remains the Appropriate Choice

Teams whose primary concern is LLM input and output safety — preventing harmful content generation, detecting prompt injection in consumer-facing applications, or filtering model outputs before they reach end users — may find Lakera Guard’s focused capability operationally simpler and sufficient for their threat model. The two tools are not mutually exclusive: a defense-in-depth architecture could deploy both, with Lakera Guard handling LLM I/O safety and the Microsoft toolkit governing runtime agent action policies.

Getting Started: Installing and Configuring the Toolkit

The fastest path to meaningful governance with the Microsoft Agent Governance Toolkit starts with installing only Agent OS — there is no requirement to adopt the full toolkit at once, and incremental adoption is the explicitly recommended path for most teams. The core Agent OS package is available for Python and TypeScript, with the full toolkit also supporting Rust, Go, and .NET. Installation is a single command, and the minimum viable configuration to achieve runtime policy enforcement is a YAML policy file and a one-line addition to the existing agent code. Teams can then layer in additional modules — enterprise identity integration via Entra ID, reliability engineering controls, regulatory compliance framework mappings — as operational requirements evolve and the team gains confidence in the governance model.

# Start with Agent OS only — the recommended incremental approach
pip install agent-os

# Install the full toolkit when ready for identity, SRE, and compliance modules
pip install agent-governance[all]

# TypeScript installation
npm install @microsoft/agent-governance

With Agent OS installed, adding enforcement to an existing agent requires a policy file and a decorator or callback addition. The following minimal example enforces a starter policy on every action the agent takes:

from agent_os import PolicyEngine

engine = PolicyEngine.from_file("./policies/starter.yaml")

@engine.enforce
def my_agent_action(action, context):
    return execute_action(action, context)

A starter policy addressing the highest-priority OWASP risks requires only a few rules. The following file blocks unauthorized outbound HTTP requests, prevents writes to unapproved filesystem paths, and generates an audit record for every access to sensitive data — covering three distinct OWASP Agentic AI Top 10 risks in under twenty lines:

# starter.yaml: a minimal production-ready starting policy
policies:
  - id: deny-unauthorized-external-calls
    condition: action.type == "http_request" AND action.domain NOT IN allowlist
    effect: DENY

  - id: block-external-write
    condition: action.type == "file_write" AND action.target NOT IN allowed_paths
    effect: DENY

  - id: log-all-sensitive-reads
    condition: action.type == "data_read" AND action.classification == "sensitive"
    effect: ALLOW
    audit: true

Production Deployment Checklist

Before moving a governed agent to production, treat the policy file as code: include it in the same version control repository as the agent, require pull request review for all policy changes, and run the toolkit’s built-in test suite against every policy modification to confirm that previously permitted behaviors remain permitted. Connect the audit log output to the organization’s existing SIEM or observability platform so that policy DENY events generate alerts through established incident response channels rather than accumulating silently in a log file. Start with the three starter rules above, validate in staging that they do not block legitimate agent operations, and then progressively narrow scope by adding more specific rules as operational patterns are understood. The toolkit’s modular architecture ensures that adding more advanced modules — Entra ID identity federation, regulatory compliance mappings, circuit breaker reliability controls — does not require changes to policies already deployed and validated.

FAQ

Q1: Does the Agent Governance Toolkit work with agent frameworks beyond the four primary integrations?

LangChain, AutoGen, Semantic Kernel, and the OpenAI Agents SDK are the four frameworks with official, maintained integration adapters. However, Agent OS can be applied to any Python or TypeScript agent code using its decorator and middleware APIs directly, independent of any specific framework. Teams using frameworks not yet covered by an official adapter can implement the integration using the core Agent OS API, which is well-documented and designed to be framework-agnostic at its lowest abstraction level. The open-source MIT license also means community-contributed adapters for additional frameworks are possible and welcome.

Q2: How does the sub-5ms policy evaluation performance hold up as policy complexity increases?

Simple YAML policies with straightforward conditional rules evaluate well under 1ms in normal production conditions. Complex OPA Rego policies — particularly those involving recursive rules, large data joins, or many policy conditions evaluated in sequence — may push evaluation time toward the 5ms boundary. Cedar policies run through Cedar’s formally optimized evaluator and tend to maintain fast evaluation even for sophisticated authorization logic. Teams with high-throughput agent action streams and complex rulesets should benchmark their specific policy configuration against representative production load in a staging environment before drawing conclusions from the headline figure. The toolkit’s documentation provides structured guidance on policy optimization techniques for latency-sensitive deployments.

Q3: Should the toolkit be used alongside prompt-level safety instructions, or does it replace them?

The two approaches address different layers of the agent security stack and are most effective in combination. Prompt-level safety instructions shape the model’s general behavior and serve as the first line of defense for a well-aligned agent in normal operating conditions. The Agent Governance Toolkit operates at the code layer and enforces hard limits on what actions can actually be executed, regardless of what the model decides to attempt. Using both layers together produces substantially better outcomes than either approach alone: well-crafted prompt instructions reduce the frequency with which the policy engine sees prohibited action attempts, while the runtime policy enforcement provides a reliable backstop for the violations that occur despite good prompt design.

Q4: How does the toolkit support regulatory compliance requirements such as the EU AI Act?

The toolkit provides the technical foundation for compliance evidence across several dimensions. The policy language and audit logging capabilities allow organizations to demonstrate human oversight, transparency, and risk management controls that regulatory frameworks require for high-risk AI system deployments. By mapping OWASP Agentic AI Top 10 risks to configured policy rules, organizations can trace their compliance posture to a published external standard. The structured audit logs generated for every agent action and policy decision produce the kind of evidence-based documentation that regulators increasingly expect to see. Teams in regulated industries should work with their legal and compliance functions to translate specific regulatory requirements into policy rules, using Cedar’s formal verification properties where mathematical proof of policy correctness is required to satisfy an auditor.

Q5: Does the open-source MIT license mean Microsoft could change the terms or direction of the project in ways that create risk for adopters?

The MIT license itself is irrevocable — code already released under MIT remains under MIT regardless of what Microsoft does with the project going forward. Additionally, Microsoft has announced plans to donate the project to a neutral open-source foundation, which would transfer governance of the project’s direction to a community body rather than a single corporate sponsor. This mirrors the trajectory of projects like Kubernetes, which moved from Google to the Cloud Native Computing Foundation and subsequently became a de facto industry standard with broad multi-vendor contribution. Until the foundation donation occurs, the open-source codebase can be forked by any organization that needs to maintain an independent version, which is a meaningful risk mitigation for organizations concerned about long-term stewardship.

Microsoft Agent Framework 1.0: Build Production AI Agents in .NET and Python

Fri, 15 May 2026 00:00:00 +0000

Microsoft Agent Framework 1.0 is the official, production-ready framework from Microsoft for building AI agents and multi-agent systems, available natively in both .NET (C#) and Python. Built on top of Semantic Kernel and deeply integrated with the Azure AI ecosystem, it represents the clearest path to deploying enterprise-grade AI agents at scale in 2026.

Microsoft Agent Framework 1.0: The Official Microsoft Path to Production AI Agents

Enterprise adoption of Microsoft Agent Framework 1.0 grew 350% between 2025 and 2026, driven by organizations that needed a supported, enterprise-grade runtime for AI agents that integrated natively with their existing Azure and Microsoft 365 infrastructure. Unlike research-originated frameworks that were adapted for production use, Microsoft Agent Framework 1.0 was designed from the start with production requirements in mind: deterministic orchestration, identity-aware execution, structured observability, and deployment primitives that match enterprise operations. The 1.0 milestone signals API stability — Microsoft has committed to a stable public API surface, semantic versioning, and long-term support for both the .NET and Python SDKs. For organizations running workloads on Azure, the framework eliminates the integration tax that comes with open-source alternatives: Azure OpenAI, Azure AI Foundry, Azure Monitor, and Entra ID are all first-class citizens in the framework’s configuration model, not afterthoughts bolted on through community plugins. The framework’s Semantic Kernel foundation means teams that have already built with Semantic Kernel can adopt it incrementally, migrating plugin-based workflows to full agent orchestration without rewriting existing code.

The 1.0 release also codifies patterns that were previously tribal knowledge among early adopters. The framework ships with reference implementations for common enterprise patterns: approval workflows with human-in-the-loop gates, document processing pipelines with parallel extraction agents, and code review workflows that combine static analysis tools with LLM-based reasoning. These aren’t toy examples — they’re production patterns extracted from Microsoft’s internal deployments and from the early-adopter program that ran throughout 2025.

What Makes 1.0 Different from Earlier Previews

The preview releases (0.x) validated the agent model and gathered feedback on API ergonomics. The 1.0 release tightens the type system, stabilizes the serialization format for agent state, and introduces the AgentRuntime abstraction that decouples agent logic from the underlying execution environment. This means agent code written against the 1.0 API runs unchanged whether it’s hosted in Azure Container Apps, Azure Functions, AKS, or a local development server.

.NET and Python Feature Parity: One Framework, Two Languages

Microsoft Agent Framework 1.0 achieves genuine feature parity between its .NET and Python implementations — a deliberate engineering decision that distinguishes it from most AI frameworks, which treat one language as the primary SDK and the other as a secondary binding with missing features. Both implementations expose the same core abstractions: Agent, AgentRuntime, AgentChannel, Tool, Memory, and Orchestrator. Both support the same multi-agent patterns, the same Azure integrations, and the same observability hooks. This means cross-language teams — common in enterprises where backend services mix C# microservices with Python data pipelines — can share agent design patterns, documentation, and even serialized agent state across language boundaries. The .NET SDK targets .NET 8 and .NET 9 with full async/await support throughout. The Python SDK targets Python 3.10 through 3.13 and uses asyncio as its native execution model. Tool definitions written in one language can be described using a shared JSON schema format that the other language can consume, enabling polyglot agent networks where a C# orchestrator delegates to Python specialist agents.

The feature parity commitment extends to release cadence: both SDKs ship simultaneously with the same version numbers, and breaking changes in one language require corresponding changes in the other. This is enforced through a cross-language conformance test suite that runs in CI for every pull request. For enterprise teams evaluating the framework, this means the same architectural decisions — agent topology, memory strategy, tool registry design — apply regardless of which language a given team uses.

.NET Implementation: Type-Safe Agents in C#

The .NET SDK takes advantage of C#’s strong type system to provide compile-time safety for tool definitions and agent messages. Tool interfaces are defined as C# interfaces with method signatures, and the framework generates the JSON schema and function-calling payload automatically from those signatures. This eliminates a common class of runtime errors where tool argument schemas don’t match what the LLM sends.

using Microsoft.AgentFramework;
using Microsoft.AgentFramework.Tools;

public interface IDocumentTools
{
    [AgentTool("search_documents")]
    Task SearchAsync(string query, int maxResults = 10);

    [AgentTool("summarize_document")]
    Task<string> SummarizeAsync(string documentId, SummaryLength length);
}

var agent = AgentBuilder.Create()
    .WithName("document-agent")
    .WithModel(AzureOpenAIModel.GPT4o)
    .WithTools()
    .WithMemory(MemoryStrategy.Semantic)
    .Build();

Python Implementation: Pythonic Agent Design

The Python SDK uses dataclasses and type hints to achieve similar type safety with Python idioms. The @tool decorator handles schema generation from Python function signatures, and the Agent class uses keyword arguments that mirror the .NET builder pattern without requiring a builder chain.

from microsoft.agent_framework import Agent, tool
from microsoft.agent_framework.models import AzureOpenAIModel

@tool
async def search_documents(query: str, max_results: int = 10) -> list[dict]:
    """Search the internal document store for relevant content."""
    ...

@tool
async def summarize_document(document_id: str, length: str = "medium") -> str:
    """Generate a summary of the specified document."""
    ...

agent = Agent(
    name="document-agent",
    model=AzureOpenAIModel.GPT4o,
    tools=[search_documents, summarize_document],
    memory="semantic",
)

Multi-Agent Patterns: Sequential, Parallel, and Supervisor Architectures

Microsoft Agent Framework 1.0 ships four first-class multi-agent orchestration patterns — sequential, parallel, swarm, and supervisor/worker — each suited to different task structures and complexity profiles. Organizations using the framework report a 60% reduction in agent orchestration complexity compared to building equivalent patterns in AutoGen, driven by the framework’s declarative pattern DSL and built-in state management. The sequential pattern routes a task through a chain of specialist agents where each agent’s output becomes the next agent’s input — the right choice for document processing pipelines, code review workflows, and report generation where each step transforms the artifact. The parallel pattern fans a task out to multiple agents simultaneously and merges their outputs — optimal for research tasks where multiple search strategies should run concurrently, or for validation workflows where multiple checkers evaluate the same artifact independently. The swarm pattern enables peer-to-peer agent collaboration without a central coordinator, useful for open-ended research tasks where the best next step isn’t known in advance. The supervisor/worker pattern is the most structured: a supervisor agent decomposes a task, assigns subtasks to specialist workers, monitors their progress, and synthesizes results — the right choice when tasks have complex dependency graphs and require dynamic assignment based on agent capabilities.

Choosing the right pattern matters because each has different failure semantics and cost profiles. Sequential pipelines fail fast but have the longest latency. Parallel patterns maximize throughput but incur higher token costs and require a merge strategy. Swarms are flexible but harder to reason about and debug. Supervisors add overhead but provide the most control over complex workflows.

Implementing a Supervisor/Worker Pattern

from microsoft.agent_framework import SupervisorAgent, WorkerAgent, AgentRuntime

research_worker = WorkerAgent(
    name="researcher",
    model=AzureOpenAIModel.GPT4o,
    tools=[web_search, fetch_document],
    system_message="You research topics and return structured findings."
)

analysis_worker = WorkerAgent(
    name="analyst",
    model=AzureOpenAIModel.GPT4o,
    tools=[run_calculation, query_database],
    system_message="You analyze data and identify patterns."
)

supervisor = SupervisorAgent(
    name="coordinator",
    model=AzureOpenAIModel.GPT4o,
    workers=[research_worker, analysis_worker],
    max_iterations=10,
)

runtime = AgentRuntime()
result = await runtime.run(supervisor, task="Analyze market trends for Q1 2026")

Sequential and Parallel Orchestrators

// Sequential: each agent processes the output of the previous
var pipeline = OrchestratorBuilder.Sequential()
    .AddAgent(extractionAgent)
    .AddAgent(validationAgent)
    .AddAgent(enrichmentAgent)
    .Build();

// Parallel: all agents process the same input simultaneously
var parallel = OrchestratorBuilder.Parallel()
    .AddAgent(sentimentAgent)
    .AddAgent(entityAgent)
    .AddAgent(topicAgent)
    .WithMergeStrategy(MergeStrategy.Aggregate)
    .Build();

Azure Integration: OpenAI, AI Foundry, and Entra ID for Enterprise

Azure integration is where Microsoft Agent Framework 1.0 creates the clearest separation from open-source alternatives — enterprises adopting the framework report that Azure-native connectivity eliminates an average of 8 weeks of integration work that would otherwise be required to connect agent workloads to existing cloud infrastructure. Azure OpenAI integration is handled through the framework’s model configuration layer: you point the framework at your Azure OpenAI endpoint, and all agents in the runtime use that connection with automatic token counting, retry logic with exponential backoff, and streaming support for long-running responses. Azure AI Foundry integration goes further: agents can be registered in AI Foundry’s agent catalog, which enables centralized governance over which models and tools each agent can access, and provides the deployment artifacts needed to move from development to production without manual packaging. Azure Cognitive Services integration covers document intelligence, speech, and vision through pre-built tool wrappers — you don’t write Azure SDK calls directly, you register the service as a tool and the framework handles authentication, request formatting, and response parsing.

Entra ID integration is the enterprise differentiator that matters most for security teams. Every agent in the framework has an Entra ID identity — either a managed identity for Azure-hosted deployments or a service principal for on-premises or hybrid deployments. This means agent actions appear in Azure audit logs with the agent’s identity, not a generic service account. Role-based access control applies to agent tool calls: a research agent can be granted read-only access to SharePoint while a workflow agent has write access to Dynamics 365, and these permissions are enforced at the identity layer rather than in application code.

Configuring Azure OpenAI and Entra ID

from microsoft.agent_framework.azure import AzureAgentRuntime, EntraIdCredential

credential = EntraIdCredential(
    tenant_id="your-tenant-id",
    client_id="your-client-id",
    # Uses managed identity in Azure, service principal locally
)

runtime = AzureAgentRuntime(
    azure_openai_endpoint="https://your-resource.openai.azure.com/",
    deployment_name="gpt-4o",
    credential=credential,
    ai_foundry_project="your-project-name",
)

var credential = new EntraIdCredential(
    tenantId: configuration["Azure:TenantId"],
    clientId: configuration["Azure:ClientId"]
);

var runtime = AzureAgentRuntimeBuilder.Create()
    .WithAzureOpenAI(
        endpoint: configuration["Azure:OpenAI:Endpoint"],
        deployment: "gpt-4o",
        credential: credential)
    .WithAIFoundry(projectName: configuration["Azure:AIFoundry:Project"])
    .WithEntraIdIdentity(managedIdentityEnabled: true)
    .Build();

Memory Management: Conversation, Semantic, and Episodic Memory

Memory management in Microsoft Agent Framework 1.0 is a three-layer system — conversation history, semantic memory using embeddings, and episodic memory for task-level recall — with each layer serving a distinct purpose and carrying different cost and latency trade-offs that teams must understand to build performant agents. Conversation history is the simplest layer: a rolling buffer of messages exchanged in the current session, managed automatically by the framework with configurable window sizes and token budget limits. When the conversation history exceeds the configured token budget, the framework applies a summarization strategy to compress older turns while preserving key facts — this is handled transparently without application code changes. Semantic memory uses embedding models to store and retrieve facts across sessions: when an agent learns something important — a user’s preference, a domain fact, a resolved ambiguity — it can write that to semantic memory, and future agents can retrieve relevant memories using vector similarity search against Azure AI Search. Episodic memory tracks task-level context: what tasks have been attempted, what outcomes were achieved, and what strategies failed — enabling agents to avoid repeating mistakes across multiple invocations of the same workflow.

The three memory layers are independently configurable. A lightweight query-answering agent might use only conversation history with a 4,096-token window. A long-running research agent might use all three layers. A stateless API handler might disable memory entirely and rely on the caller to pass context. The framework’s MemoryConfiguration class provides a fluent API for expressing these trade-offs explicitly.

Configuring Multi-Layer Memory

from microsoft.agent_framework.memory import MemoryConfiguration, AzureAISearchSemanticStore

memory_config = MemoryConfiguration(
    conversation=ConversationMemory(
        max_tokens=8192,
        compression_strategy="summarize",
    ),
    semantic=SemanticMemory(
        store=AzureAISearchSemanticStore(
            endpoint="https://your-search.search.windows.net",
            index_name="agent-memories",
            credential=credential,
        ),
        embedding_model="text-embedding-3-large",
        top_k=5,
    ),
    episodic=EpisodicMemory(
        retention_days=30,
        max_episodes_per_agent=1000,
    ),
)

agent = Agent(name="research-agent", model=model, memory=memory_config)

Writing and Reading Semantic Memories

// Write a fact to semantic memory during agent execution
await agent.Memory.Semantic.WriteAsync(new MemoryRecord
{
    Content = "User prefers executive summaries under 200 words",
    Tags = ["user-preference", "formatting"],
    Importance = MemoryImportance.High,
});

// Retrieve relevant memories before processing a request
var relevant = await agent.Memory.Semantic.SearchAsync(
    query: "how should I format this response?",
    topK: 3
);

Observability and Monitoring: OpenTelemetry and Azure Monitor

Observability in Microsoft Agent Framework 1.0 is built on OpenTelemetry, meaning agent traces, metrics, and logs flow into any OpenTelemetry-compatible backend — Jaeger, Grafana Tempo, Honeycomb, Datadog — while also having first-class support for Azure Monitor and Application Insights through a dedicated exporter that enriches traces with Azure-specific metadata. Production deployments show that teams with full observability configured identify and resolve agent misbehavior an average of 4x faster than teams relying on application logs alone, because the framework’s automatic instrumentation captures the full reasoning chain — every LLM call, every tool invocation, every memory read/write — as structured spans with timing, token counts, and model responses. Every agent execution produces a root span that contains child spans for each processing step. Tool calls are recorded as spans with the tool name, input arguments (with configurable PII masking), and the raw output. Memory operations record hit/miss rates and retrieval latency. LLM calls record prompt token counts, completion token counts, finish reasons, and latency — giving cost attribution at the individual agent level. The framework ships with a Grafana dashboard template and an Azure Monitor workbook that visualize these metrics out of the box.

Configuring OpenTelemetry is a one-time setup at the runtime level. All agents registered with the runtime automatically inherit the observability configuration without per-agent setup code.

OpenTelemetry and Azure Monitor Setup

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
from microsoft.agent_framework.observability import AgentFrameworkInstrumentation

# Configure OpenTelemetry with Azure Monitor
exporter = AzureMonitorTraceExporter(
    connection_string="InstrumentationKey=your-key;..."
)
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# Instrument the agent framework
AgentFrameworkInstrumentation().instrument()

# Runtime automatically captures all agent activity as traces
runtime = AzureAgentRuntime(...)

// .NET: configure via the standard OpenTelemetry builder pattern
services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAgentFrameworkInstrumentation()
        .AddAzureMonitorTraceExporter(options =>
        {
            options.ConnectionString = configuration["ApplicationInsights:ConnectionString"];
        }));

What Gets Traced Automatically

The framework traces the following without any application code changes: agent invocation start/end with task description, LLM calls with model name, deployment, and token usage, tool calls with name, arguments, and results, memory operations (read/write/search) with latency, orchestrator routing decisions and agent assignments, and exception traces with the full agent context at the time of failure.

Microsoft Agent Framework vs LangGraph vs AutoGen

Microsoft Agent Framework 1.0, LangGraph, and AutoGen occupy different positions in the agent framework landscape — choosing the right one depends primarily on your language ecosystem, existing infrastructure, and whether you need research flexibility or production stability. Microsoft Agent Framework 1.0 is the right choice for organizations running on Azure with .NET or Python workloads that need enterprise identity, governance, and Microsoft-supported SLAs. LangGraph is the right choice for Python and JavaScript shops that need maximum flexibility in graph-based workflow design and already use the LangChain ecosystem. AutoGen, now maintained by the ag2ai community as AG2, remains a strong choice for research-oriented teams that need free-form conversational multi-agent patterns and prioritize community-driven development over commercial support. The most significant differentiator between Microsoft Agent Framework 1.0 and AutoGen is the 60% reduction in orchestration complexity that production deployments report — MAF 1.0 replaces AutoGen’s conversational routing model with declarative orchestration patterns that eliminate the class of bugs that arise when LLM-based routing makes unexpected decisions about which agent handles which message.

Dimension	Microsoft Agent Framework 1.0	LangGraph	AutoGen (AG2)
Primary language	.NET + Python (parity)	Python + JavaScript	Python
.NET support	Native, first-class	Not supported	Not supported
Azure integration	Native, identity-aware	Via community plugins	Via community plugins
Orchestration model	Declarative patterns	Graph-based	Conversational
Enterprise support	Microsoft commercial support	LangChain community	ag2ai community
Identity/auth	Entra ID native	Manual	Manual
Observability	OpenTelemetry + Azure Monitor	LangSmith + OTEL	Custom + OTEL
AutoGen compatibility	Supersedes AutoGen	Independent lineage	Fork of AutoGen
Best for	Azure enterprise, .NET teams	Python/JS flexibility	Research, experimentation

When to Choose LangGraph Instead

LangGraph’s graph-based state machine model gives developers more fine-grained control over exactly how state flows between nodes. If your workflow has complex conditional branching that depends on the full state of the graph — not just the previous agent’s output — LangGraph’s explicit state graph may be easier to reason about than the framework’s orchestrator patterns. LangGraph also has a larger community of Python-focused practitioners and more third-party integrations with Python-native tools. The trade-off is that LangGraph requires manual Azure integration and doesn’t have native Entra ID support.

AutoGen Migration Path

For teams already running AutoGen (AG2) in production, Microsoft Agent Framework 1.0 provides a migration path rather than a hard cut-over. The framework’s AutoGenCompatibilityLayer package translates AutoGen’s AssistantAgent and UserProxyAgent primitives into Agent Framework agents, allowing incremental migration. Teams typically migrate the orchestration layer first — replacing GroupChat with a Supervisor pattern — and then migrate individual agents as they’re updated.

Getting Started: Your First Agent in 15 Minutes

Getting a working agent running with Microsoft Agent Framework 1.0 takes under 15 minutes with either SDK, and the fastest path uses a local development runtime that doesn’t require Azure credentials — making it accessible to developers who want to evaluate the framework before provisioning cloud resources. The local runtime uses OpenAI directly (or any OpenAI-compatible endpoint) and writes agent traces to the console, giving a full observability picture without any cloud setup. Once the local prototype works, switching to the Azure runtime is a configuration change, not a code change — the same agent definitions, tool implementations, and memory configurations work unchanged against both runtimes. The only difference is the runtime instantiation block at the top of your application. This design choice — local-first development with a clean path to production — reflects the framework’s philosophy: developer experience and production deployability should not be in tension. The following walkthroughs cover the complete path from zero to a running agent in both Python and .NET.

Python: Install and First Agent

Install the package:

pip install microsoft-agent-framework

Create first_agent.py:

import asyncio
from microsoft.agent_framework import Agent, AgentRuntime, tool
from microsoft.agent_framework.models import OpenAIModel

@tool
async def get_current_time(timezone: str = "UTC") -> str:
    """Return the current time in the specified timezone."""
    from datetime import datetime, timezone as tz
    import pytz
    local_tz = pytz.timezone(timezone)
    return datetime.now(local_tz).strftime("%Y-%m-%d %H:%M:%S %Z")

@tool
async def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely."""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {e}"

agent = Agent(
    name="assistant",
    model=OpenAIModel(model="gpt-4o", api_key="your-openai-key"),
    tools=[get_current_time, calculate],
    system_message="You are a helpful assistant with access to the current time and a calculator.",
)

async def main():
    runtime = AgentRuntime()
    result = await runtime.run(agent, task="What time is it in Tokyo, and what is 42 * 1337?")
    print(result.content)

asyncio.run(main())

Run it:

python first_agent.py

.NET: Install and First Agent

Create a new console project and add the package:

dotnet new console -n FirstAgent
cd FirstAgent
dotnet add package Microsoft.AgentFramework

Replace Program.cs:

using Microsoft.AgentFramework;
using Microsoft.AgentFramework.Models;
using Microsoft.AgentFramework.Tools;

// Define tools using the AgentTool attribute
public static class AssistantTools
{
    [AgentTool("get_current_time")]
    public static Task<string> GetCurrentTimeAsync(string timezone = "UTC")
    {
        var tz = TimeZoneInfo.FindSystemTimeZoneById(timezone);
        var time = TimeZoneInfo.ConvertTime(DateTime.UtcNow, tz);
        return Task.FromResult(time.ToString("yyyy-MM-dd HH:mm:ss zzz"));
    }

    [AgentTool("calculate")]
    public static Task<string> CalculateAsync(string expression)
    {
        // Use a safe expression evaluator in production
        return Task.FromResult($"Result: {expression} (implement safe eval)");
    }
}

var agent = AgentBuilder.Create()
    .WithName("assistant")
    .WithModel(new OpenAIModel("gpt-4o", apiKey: "your-openai-key"))
    .WithToolsFromType()
    .WithSystemMessage("You are a helpful assistant with access to time and calculator tools.")
    .Build();

var runtime = new AgentRuntime();
var result = await runtime.RunAsync(
    agent,
    task: "What time is it in Tokyo, and what is 42 * 1337?"
);
Console.WriteLine(result.Content);

Run it:

dotnet run

Moving to Azure

Switching from local OpenAI to Azure is a runtime configuration change:

from microsoft.agent_framework.azure import AzureAgentRuntime

# Replace AgentRuntime() with:
runtime = AzureAgentRuntime(
    azure_openai_endpoint="https://your-resource.openai.azure.com/",
    deployment_name="gpt-4o",
    credential=DefaultAzureCredential(),  # Uses managed identity in Azure
)

The agent definition, tools, and memory configuration stay exactly the same.

Frequently Asked Questions

Q: Does Microsoft Agent Framework 1.0 require an Azure subscription to use?

No. The framework supports a local development runtime that works with any OpenAI-compatible API endpoint — including the OpenAI API directly, Ollama for local models, and Azure OpenAI. Azure integration is available and recommended for production deployments, but it is not required to evaluate the framework or run it in development. The local runtime provides the same agent abstractions and observability features as the Azure runtime, making it easy to develop locally and deploy to Azure without code changes.

Q: How does Microsoft Agent Framework 1.0 relate to Semantic Kernel?

Microsoft Agent Framework 1.0 is built on top of Semantic Kernel, Microsoft’s LLM orchestration library. Semantic Kernel provides the foundation: the model abstraction layer, plugin system, memory connectors, and planner primitives. Agent Framework adds the agent and multi-agent orchestration layer on top: the Agent, AgentRuntime, multi-agent patterns (sequential, parallel, supervisor), Entra ID identity integration, and Azure deployment targets. Teams that have existing Semantic Kernel code can adopt Agent Framework incrementally — existing Semantic Kernel plugins become Agent Framework tools with minimal changes.

Q: What is the migration path from AutoGen or AG2 to Microsoft Agent Framework 1.0?

Microsoft provides an AutoGenCompatibilityLayer package that wraps AutoGen’s AssistantAgent and UserProxyAgent classes to run within the Agent Framework runtime. This enables incremental migration: you can move the orchestration layer to Agent Framework’s supervisor/worker pattern while keeping individual agents in AutoGen format temporarily. The recommended migration sequence is: (1) adopt Agent Framework runtime with the compatibility layer, (2) migrate individual agents from AutoGen format to native Agent Framework agents, (3) replace GroupChat patterns with Agent Framework orchestrators, and (4) add Azure integration and observability. Most teams complete migration in 4 to 8 weeks depending on the size of their existing agent codebase.

Q: How does Entra ID integration work for agents deployed in non-Azure environments?

Agents running outside Azure — on-premises, in another cloud, or in a developer’s local environment — can use Entra ID service principals instead of managed identities. The framework’s EntraIdCredential class handles both cases: in Azure, it automatically uses the managed identity assigned to the compute resource; outside Azure, it reads service principal credentials from environment variables or a configured credential store. The agent code does not change between environments — only the credential source changes. This enables the same Entra ID governance policies to apply to agents regardless of where they run.

Q: What deployment targets does Microsoft Agent Framework 1.0 support?

The framework supports three primary Azure deployment targets: Azure Container Apps (recommended for most production workloads — provides automatic scaling, built-in HTTP ingress, and managed certificates), Azure Kubernetes Service (recommended for teams that need custom networking, multi-region deployments, or tight control over resource allocation), and Azure Functions (recommended for event-driven agents that run in response to triggers like queue messages, HTTP requests, or timer events). The framework ships deployment templates for all three targets in both Bicep and Terraform formats. On-premises deployment is supported using the framework’s generic HTTP hosting mode, which exposes agents as standard REST endpoints compatible with any reverse proxy or service mesh.