<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Persistent-Memory on RockB</title><link>https://baeseokjae.github.io/tags/persistent-memory/</link><description>Recent content in Persistent-Memory on RockB</description><image><title>RockB</title><url>https://baeseokjae.github.io/images/og-default.png</url><link>https://baeseokjae.github.io/images/og-default.png</link></image><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 07 May 2026 12:00:00 +0000</lastBuildDate><atom:link href="https://baeseokjae.github.io/tags/persistent-memory/index.xml" rel="self" type="application/rss+xml"/><item><title>Mem0 Guide 2026: Add Persistent Memory to Your AI Agents</title><link>https://baeseokjae.github.io/posts/mem0-agent-memory-guide-2026/</link><pubDate>Thu, 07 May 2026 12:00:00 +0000</pubDate><guid>https://baeseokjae.github.io/posts/mem0-agent-memory-guide-2026/</guid><description>Add persistent memory to AI agents with Mem0: architecture, quick start, memory scoping, LangChain integration, pricing, and production best practices.</description><content:encoded><![CDATA[<p>AI agents without persistent memory lose 80% of context between interactions — every session starts cold, the agent has no recollection of user preferences, past decisions, or accumulated knowledge, and users pay both in frustration and in token costs. Mem0 solves this with a managed memory layer that combines vector search, knowledge graph storage, and key-value caching into a single API. With ~48,000 GitHub stars, a $24M Series A closed in October 2025, and YC backing, Mem0 has become the default choice for teams that want to bolt production-grade memory onto an existing agent in under a day. This guide covers everything you need to go from zero to a memory-enabled agent: architecture internals, quick start code, memory scoping patterns, integration with LangChain and AutoGen, pricing tiers, and how Mem0 compares to Zep and LangGraph Store.</p>
<h2 id="what-is-mem0-and-why-ai-agents-need-persistent-memory">What Is Mem0 and Why AI Agents Need Persistent Memory</h2>
<p>Mem0 is an open-source, managed memory layer for AI agents that persists information across sessions, retrieves relevant context on demand, and updates stored facts adaptively without duplicating entries. AI agents without memory are stateless by design — each LLM call is independent, and anything learned in session one is gone by session two unless the developer manually reconstructs context. At small scale this is annoying; at production scale it is expensive. Persistent memory reduces token usage by 30–60% for repeated tasks by replacing verbose context reconstruction with targeted memory retrieval. The practical impact is measurable: a customer support bot that remembers a user&rsquo;s past tickets cuts average handle time, reduces escalations, and eliminates the user&rsquo;s need to repeat themselves. A coding agent that persists codebase conventions and architectural decisions stops recommending patterns the team has already explicitly rejected. Memory is not a nice-to-have — it is the difference between an agent that improves with use and one that stays perpetually ignorant of everything it has encountered before.</p>
<h2 id="mem0-architecture-vector-graph-and-key-value-combined">Mem0 Architecture: Vector, Graph, and Key-Value Combined</h2>
<p>Mem0&rsquo;s storage architecture is a deliberate hybrid that matches retrieval strategy to information type. Vector storage handles semantic memories — free-form facts, preferences, and conversational context — using embedding similarity to surface relevant entries when a query arrives. Graph storage handles structured entity relationships: &ldquo;Alice works at Acme Corp&rdquo; and &ldquo;Acme Corp uses AWS&rdquo; are two separate facts linked through a shared entity node, making multi-hop retrieval possible without complex query engineering. Key-value storage handles exact-match lookups — flags, configurations, and short factual fields that should be retrieved deterministically rather than semantically. Most memory systems force engineers to pick one of these strategies. Mem0 runs all three simultaneously and applies a routing layer that selects the appropriate backend per retrieval request. This architecture is why Mem0 handles the personalization memory problem well: it can retrieve &ldquo;Alice prefers concise Python answers&rdquo; semantically, look up &ldquo;Alice&rsquo;s subscription tier&rdquo; exactly, and traverse &ldquo;Alice&rsquo;s relationship to her team&rsquo;s codebase conventions&rdquo; through the graph — all in a single memory call. The storage backends are pluggable; the hosted Mem0 Cloud service uses Qdrant for vectors, Neo4j for graph, and Redis for key-value, but the open-source library supports alternative backends including Chroma, Weaviate, Pinecone, and PostgreSQL with pgvector.</p>
<h2 id="quick-start-add-mem0-to-your-ai-agent-in-10-minutes">Quick Start: Add Mem0 to Your AI Agent in 10 Minutes</h2>
<p>Install Mem0 with a single pip command and you have a working memory store in under ten minutes — the hosted cloud tier requires only an API key, no infrastructure setup. The open-source library runs fully locally with default backends configured out of the box. Here is the minimal pattern for adding, searching, and using memory in an agent:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pip install mem0ai
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> Memory
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>m <span style="color:#f92672">=</span> Memory()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Store a memory for a specific user</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(<span style="color:#e6db74">&#34;User prefers Python and concise answers&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Store multiple facts in one call (list input)</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add([
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Alice is building a FastAPI backend&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Alice&#39;s team uses PostgreSQL, not MySQL&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Alice prefers async/await over threading&#34;</span>
</span></span><span style="display:flex;"><span>], user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Retrieve relevant memories via semantic search</span>
</span></span><span style="display:flex;"><span>results <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>search(<span style="color:#e6db74">&#34;user coding preferences&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> r <span style="color:#f92672">in</span> results:
</span></span><span style="display:flex;"><span>    print(r[<span style="color:#e6db74">&#34;memory&#34;</span>], r[<span style="color:#e6db74">&#34;score&#34;</span>])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Retrieve all memories for a user</span>
</span></span><span style="display:flex;"><span>memories <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>get_all(user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>context <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join([mem[<span style="color:#e6db74">&#34;memory&#34;</span>] <span style="color:#66d9ef">for</span> mem <span style="color:#f92672">in</span> memories])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Inject memory context into an LLM prompt</span>
</span></span><span style="display:flex;"><span>system_prompt <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;&#34;&#34;You are a coding assistant.
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Known facts about this user:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74"></span><span style="color:#e6db74">{</span>context<span style="color:#e6db74">}</span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Answer concisely and use Python unless the user asks otherwise.&#34;&#34;&#34;</span>
</span></span></code></pre></div><p>For the Mem0 Cloud API (required for graph memory and production rate limits), swap in the <code>MemoryClient</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> MemoryClient
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> MemoryClient(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-api-key&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client<span style="color:#f92672">.</span>add(<span style="color:#e6db74">&#34;User is migrating from Flask to FastAPI&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>results <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>search(<span style="color:#e6db74">&#34;framework preferences&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>, limit<span style="color:#f92672">=</span><span style="color:#ae81ff">5</span>)
</span></span></code></pre></div><p>The <code>add()</code> call is where Mem0 does its real work: it runs the input through an LLM extraction pipeline to identify discrete facts, embeds them, checks for conflicts with existing memories, and either inserts new entries or updates existing ones. This extraction step is why Mem0 can accept raw conversation turns rather than pre-formatted facts.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Add a raw conversation turn — Mem0 extracts facts automatically</span>
</span></span><span style="display:flex;"><span>messages <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;I always use black for Python formatting and isort for imports&#34;</span>},
</span></span><span style="display:flex;"><span>    {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">&#34;Got it, I&#39;ll follow black + isort conventions in all code I write for you.&#34;</span>}
</span></span><span style="display:flex;"><span>]
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(messages, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span></code></pre></div><h2 id="memory-scoping-user-session-and-agent-level-memory">Memory Scoping: User, Session, and Agent-Level Memory</h2>
<p>Mem0 supports three memory scopes — user-level, session-level, and agent-level — and the scope you choose determines both what gets shared and what gets isolated. User-level memory persists across all sessions for a given user ID: preferences, biographical facts, and long-term behavioral patterns belong here. Session-level memory is scoped to a single interaction window and is garbage-collected (or archived) when the session ends: transient context, in-progress task state, and decisions made during a specific conversation belong here. Agent-level memory is scoped to an agent ID independent of any user: shared knowledge about tools, external APIs, codebase conventions, and domain facts that all users of a given agent should benefit from belong here. Getting scoping wrong is a common source of bugs. If you store architectural decisions at session scope, the agent forgets them the next time it runs. If you store sensitive user data at agent scope, you create unintended data sharing across users. The right pattern for most production agents is a three-layer architecture: agent-level facts as a global knowledge base, user-level facts for personalization, and session-level facts for short-term task state.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> Memory
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>m <span style="color:#f92672">=</span> Memory()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Agent-level: shared codebase knowledge</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;This codebase uses domain-driven design with hexagonal architecture&#34;</span>,
</span></span><span style="display:flex;"><span>    agent_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;coding-assistant-v2&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;All database access must go through repository classes, never direct ORM queries in handlers&#34;</span>,
</span></span><span style="display:flex;"><span>    agent_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;coding-assistant-v2&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># User-level: personal preferences</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Alice prefers verbose docstrings with type hints&#34;</span>,
</span></span><span style="display:flex;"><span>    user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>,
</span></span><span style="display:flex;"><span>    agent_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;coding-assistant-v2&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Session-level: transient task context</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;Currently refactoring the payment module — do not suggest changes to auth&#34;</span>,
</span></span><span style="display:flex;"><span>    user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>,
</span></span><span style="display:flex;"><span>    agent_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;coding-assistant-v2&#34;</span>,
</span></span><span style="display:flex;"><span>    run_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;session-20260507-payment-refactor&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Retrieve combining user + agent context (scopes merge automatically)</span>
</span></span><span style="display:flex;"><span>results <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>search(
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;how should I structure this service class?&#34;</span>,
</span></span><span style="display:flex;"><span>    user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>,
</span></span><span style="display:flex;"><span>    agent_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;coding-assistant-v2&#34;</span>
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>Session IDs can be any string; UUIDs, timestamps, and task-descriptive slugs all work. The <code>run_id</code> parameter maps to session scope internally.</p>
<h2 id="adaptive-memory-updates-how-mem0-avoids-memory-bloat">Adaptive Memory Updates: How Mem0 Avoids Memory Bloat</h2>
<p>Adaptive memory updates are Mem0&rsquo;s answer to the memory bloat problem that plagues naive vector-store approaches. When you call <code>m.add()</code>, Mem0 does not blindly append a new embedding to the store. Instead, it runs a three-step pipeline: extract discrete facts from the input text, check each fact against existing memories for semantic overlap, and then decide whether to insert a new entry, update an existing entry, or discard the input as a duplicate. This behavior is controlled by the underlying LLM (configurable — OpenAI, Anthropic, or local models all work) and produces a resolution event for each fact: <code>ADD</code>, <code>UPDATE</code>, <code>DELETE</code>, or <code>NONE</code>. The result is a memory store that stays compact and accurate rather than growing unbounded with redundant entries. A user who says &ldquo;I prefer Python&rdquo; in session one and &ldquo;use Python, not JavaScript&rdquo; in session five doesn&rsquo;t accumulate two conflicting entries — Mem0 detects the semantic overlap and updates the single canonical preference. This adaptive behavior is particularly important for long-running agents in customer support and personal assistant contexts, where the same user may interact hundreds of times. Without update semantics, memory stores grow linearly with sessions and retrieval quality degrades as noise accumulates. Mem0&rsquo;s adaptive pipeline keeps the store pruned and coherent.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Demonstrate adaptive update behavior</span>
</span></span><span style="display:flex;"><span>m <span style="color:#f92672">=</span> Memory()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(<span style="color:#e6db74">&#34;Alice prefers Python&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>memories_before <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>get_all(user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Memories before: </span><span style="color:#e6db74">{</span>len(memories_before)<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)  <span style="color:#75715e"># 1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Adding a related fact triggers UPDATE, not INSERT</span>
</span></span><span style="display:flex;"><span>m<span style="color:#f92672">.</span>add(<span style="color:#e6db74">&#34;Alice only uses Python, never JavaScript or TypeScript&#34;</span>, user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>memories_after <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>get_all(user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>)
</span></span><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Memories after: </span><span style="color:#e6db74">{</span>len(memories_after)<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)  <span style="color:#75715e"># Still 1, updated</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Inspect the memory history for a specific memory ID</span>
</span></span><span style="display:flex;"><span>memory_id <span style="color:#f92672">=</span> memories_after[<span style="color:#ae81ff">0</span>][<span style="color:#e6db74">&#34;id&#34;</span>]
</span></span><span style="display:flex;"><span>history <span style="color:#f92672">=</span> m<span style="color:#f92672">.</span>history(memory_id)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> h <span style="color:#f92672">in</span> history:
</span></span><span style="display:flex;"><span>    print(h[<span style="color:#e6db74">&#34;event&#34;</span>], h[<span style="color:#e6db74">&#34;old_memory&#34;</span>], <span style="color:#e6db74">&#34;-&gt;&#34;</span>, h[<span style="color:#e6db74">&#34;new_memory&#34;</span>])
</span></span></code></pre></div><p>The <code>m.history()</code> call returns the full audit trail for any memory entry, which is important for debugging unexpected agent behavior and for compliance requirements in regulated environments.</p>
<h2 id="integrating-mem0-with-langchain-autogen-and-custom-agents">Integrating Mem0 with LangChain, AutoGen, and Custom Agents</h2>
<p>Mem0 ships first-party integrations for LangChain and AutoGen, and the REST API makes it straightforward to integrate with any custom agent architecture. The LangChain integration wraps Mem0 as a <code>BaseChatMessageHistory</code> subclass, which means it slots directly into any <code>RunnableWithMessageHistory</code> chain without modifying existing chain logic:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0.integrations.langchain <span style="color:#f92672">import</span> ZepMemory  <span style="color:#75715e"># LangChain-compatible wrapper</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain_openai <span style="color:#f92672">import</span> ChatOpenAI
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.chains <span style="color:#f92672">import</span> ConversationChain
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Mem0 as a LangChain message history backend</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> MemoryClient
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> langchain.memory <span style="color:#f92672">import</span> ConversationBufferMemory
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> MemoryClient(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-api-key&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">get_memory_context</span>(user_id: str, query: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;&#34;&#34;Retrieve relevant memories as a formatted string for LangChain prompts.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    results <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>search(query, user_id<span style="color:#f92672">=</span>user_id, limit<span style="color:#f92672">=</span><span style="color:#ae81ff">10</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> <span style="color:#f92672">not</span> results:
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;No prior context.&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join([<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;- </span><span style="color:#e6db74">{</span>r[<span style="color:#e6db74">&#39;memory&#39;</span>]<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#66d9ef">for</span> r <span style="color:#f92672">in</span> results])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>llm <span style="color:#f92672">=</span> ChatOpenAI(model<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;gpt-4o&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">run_agent_turn</span>(user_id: str, user_message: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Pull relevant memory</span>
</span></span><span style="display:flex;"><span>    memory_context <span style="color:#f92672">=</span> get_memory_context(user_id, user_message)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Build prompt with memory context injected</span>
</span></span><span style="display:flex;"><span>    messages <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>        (<span style="color:#e6db74">&#34;system&#34;</span>, <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;You are a helpful assistant.</span><span style="color:#ae81ff">\n\n</span><span style="color:#e6db74">What you know about this user:</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">{</span>memory_context<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>),
</span></span><span style="display:flex;"><span>        (<span style="color:#e6db74">&#34;human&#34;</span>, user_message)
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> llm<span style="color:#f92672">.</span>invoke(messages)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Persist this interaction to memory</span>
</span></span><span style="display:flex;"><span>    client<span style="color:#f92672">.</span>add([
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: user_message},
</span></span><span style="display:flex;"><span>        {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: response<span style="color:#f92672">.</span>content}
</span></span><span style="display:flex;"><span>    ], user_id<span style="color:#f92672">=</span>user_id)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> response<span style="color:#f92672">.</span>content
</span></span></code></pre></div><p>For AutoGen (AG2), Mem0 integrates as a custom <code>ConversableAgent</code> that intercepts message history and injects retrieved memories into the system prompt:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> autogen <span style="color:#f92672">import</span> ConversableAgent
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> MemoryClient
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> MemoryClient(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-api-key&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">class</span> <span style="color:#a6e22e">Mem0Agent</span>(ConversableAgent):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> __init__(self, user_id: str, <span style="color:#f92672">*</span>args, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>        self<span style="color:#f92672">.</span>mem0_user_id <span style="color:#f92672">=</span> user_id
</span></span><span style="display:flex;"><span>        super()<span style="color:#f92672">.</span>__init__(<span style="color:#f92672">*</span>args, <span style="color:#f92672">**</span>kwargs)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">generate_reply</span>(self, messages, sender<span style="color:#f92672">=</span><span style="color:#66d9ef">None</span>, <span style="color:#f92672">**</span>kwargs):
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Inject memory context before generating reply</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> messages:
</span></span><span style="display:flex;"><span>            last_user_msg <span style="color:#f92672">=</span> messages[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#34;content&#34;</span>, <span style="color:#e6db74">&#34;&#34;</span>)
</span></span><span style="display:flex;"><span>            memories <span style="color:#f92672">=</span> client<span style="color:#f92672">.</span>search(last_user_msg, user_id<span style="color:#f92672">=</span>self<span style="color:#f92672">.</span>mem0_user_id, limit<span style="color:#f92672">=</span><span style="color:#ae81ff">8</span>)
</span></span><span style="display:flex;"><span>            memory_text <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join([m[<span style="color:#e6db74">&#34;memory&#34;</span>] <span style="color:#66d9ef">for</span> m <span style="color:#f92672">in</span> memories])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Prepend memory context to the message list</span>
</span></span><span style="display:flex;"><span>            memory_message <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;system&#34;</span>,
</span></span><span style="display:flex;"><span>                <span style="color:#e6db74">&#34;content&#34;</span>: <span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Relevant context from prior sessions:</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">{</span>memory_text<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>            }
</span></span><span style="display:flex;"><span>            messages <span style="color:#f92672">=</span> [memory_message] <span style="color:#f92672">+</span> list(messages)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        reply <span style="color:#f92672">=</span> super()<span style="color:#f92672">.</span>generate_reply(messages, sender, <span style="color:#f92672">**</span>kwargs)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Store this exchange</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">if</span> reply <span style="color:#f92672">and</span> messages:
</span></span><span style="display:flex;"><span>            client<span style="color:#f92672">.</span>add([
</span></span><span style="display:flex;"><span>                {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: messages[<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>]<span style="color:#f92672">.</span>get(<span style="color:#e6db74">&#34;content&#34;</span>, <span style="color:#e6db74">&#34;&#34;</span>)},
</span></span><span style="display:flex;"><span>                {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: reply}
</span></span><span style="display:flex;"><span>            ], user_id<span style="color:#f92672">=</span>self<span style="color:#f92672">.</span>mem0_user_id)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> reply
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>agent <span style="color:#f92672">=</span> Mem0Agent(
</span></span><span style="display:flex;"><span>    user_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;alice&#34;</span>,
</span></span><span style="display:flex;"><span>    name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;MemoryAgent&#34;</span>,
</span></span><span style="display:flex;"><span>    system_message<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;You are a helpful coding assistant.&#34;</span>,
</span></span><span style="display:flex;"><span>    llm_config<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;model&#34;</span>: <span style="color:#e6db74">&#34;gpt-4o&#34;</span>}
</span></span><span style="display:flex;"><span>)
</span></span></code></pre></div><p>For custom agents built on raw LLM API calls, the pattern is the same: call <code>client.search()</code> before the LLM call, inject results into the system prompt, and call <code>client.add()</code> after the response is generated.</p>
<h2 id="mem0-pricing-free-tier-pro-and-when-the-graph-features-matter">Mem0 Pricing: Free Tier, Pro, and When the Graph Features Matter</h2>
<p>Mem0&rsquo;s pricing has three tiers that map cleanly to deployment stage: Free, Developer at $19/month, and Pro at $249/month. The Free tier gives you 10,000 memories with vector search, full SDK access, and no credit card required — enough to validate a use case and run a non-trivial pilot. The Developer tier at $19/month raises the ceiling to 50,000 memories and adds priority support. The Pro tier at $249/month is where graph memory features unlock: unlimited memories, knowledge graph queries, multi-hop entity resolution, and the SOC 2 Type II and HIPAA compliance documentation required for enterprise procurement. The key decision point is whether your use case requires graph memory. If you are building a personalization layer — storing user preferences, past conversation context, behavioral patterns — the vector tier at Developer pricing handles it efficiently. If you are building an agent that needs to reason about relationships between entities (customer X has contract with vendor Y, who has a known issue with product Z), the graph features in Pro become load-bearing. HIPAA compliance is a hard requirement for any healthcare-adjacent deployment; Mem0 Cloud Pro is one of the few managed memory services with the compliance documentation to satisfy enterprise security reviews. Self-hosting the open-source library is always an option for teams that cannot send data to a third-party cloud regardless of compliance certifications.</p>
<table>
  <thead>
      <tr>
          <th>Tier</th>
          <th>Price</th>
          <th>Memories</th>
          <th>Graph Memory</th>
          <th>Compliance</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Free</td>
          <td>$0</td>
          <td>10K</td>
          <td>No</td>
          <td>—</td>
      </tr>
      <tr>
          <td>Developer</td>
          <td>$19/mo</td>
          <td>50K</td>
          <td>No</td>
          <td>—</td>
      </tr>
      <tr>
          <td>Pro</td>
          <td>$249/mo</td>
          <td>Unlimited</td>
          <td>Yes</td>
          <td>SOC 2 Type II, HIPAA</td>
      </tr>
  </tbody>
</table>
<h2 id="mem0-vs-zep-vs-langgraph-store-when-to-use-each">Mem0 vs Zep vs LangGraph Store: When to Use Each</h2>
<p>Mem0, Zep, and LangGraph Store solve overlapping but distinct problems, and choosing the wrong one creates technical debt that compounds over time. Mem0 leads on developer experience and deployment speed — the API is simple, the free tier is generous, the SDK covers Python and TypeScript, and you can be in production in hours rather than days. Zep&rsquo;s Graphiti engine scores 63.8% on LongMemEval versus Mem0&rsquo;s 49.0%, a 15-point gap that comes entirely from temporal fact tracking: Zep stores when facts became true and when they stopped being true, which is essential for any use case where the world changes and historical accuracy matters. LangGraph Store is the right answer if your team is already committed to LangGraph for orchestration — it shares state management with your graph nodes, reduces the number of moving parts, and the integration overhead is near zero. The decision framework: pick Mem0 if you need personalization memory deployed quickly across any LLM or framework. Pick Zep if your agent needs to reason about entities whose facts change over time (regulatory compliance, product catalogs, organizational structures). Pick LangGraph Store if your agent is a LangGraph graph and keeping the stack homogeneous matters more than specialized memory features.</p>
<table>
  <thead>
      <tr>
          <th>Criterion</th>
          <th>Mem0</th>
          <th>Zep</th>
          <th>LangGraph Store</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>LongMemEval score</td>
          <td>49.0%</td>
          <td>63.8%</td>
          <td>Not published</td>
      </tr>
      <tr>
          <td>Temporal fact tracking</td>
          <td>Limited</td>
          <td>Native (Graphiti)</td>
          <td>Limited</td>
      </tr>
      <tr>
          <td>Developer experience</td>
          <td>Excellent</td>
          <td>Good</td>
          <td>Good (within LangGraph)</td>
      </tr>
      <tr>
          <td>Standalone deployment</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>LangGraph-only</td>
      </tr>
      <tr>
          <td>Graph memory</td>
          <td>Pro tier</td>
          <td>Yes</td>
          <td>No</td>
      </tr>
      <tr>
          <td>Open-source</td>
          <td>Yes (MIT)</td>
          <td>Yes (Apache 2.0)</td>
          <td>Yes (MIT)</td>
      </tr>
      <tr>
          <td>Best fit</td>
          <td>Personalization, fast deployment</td>
          <td>Entity-heavy, temporal facts</td>
          <td>LangGraph-first teams</td>
      </tr>
  </tbody>
</table>
<p>For teams evaluating both Mem0 and Zep, the practical test is to run LongMemEval on your own data. The benchmark uses multi-session conversation histories that require temporal fact retrieval — if your use case resembles the benchmark (customer support, personal assistants, research agents), Zep&rsquo;s 15-point lead will show up in production metrics. If your use case is simpler (preference retrieval, conversation summarization), Mem0&rsquo;s lower complexity and better developer experience often win.</p>
<h2 id="production-considerations-performance-privacy-and-compliance">Production Considerations: Performance, Privacy, and Compliance</h2>
<p>Running Mem0 in production surfaces three categories of concern that are easy to overlook during prototyping: latency budgets, data residency, and memory quality degradation. On latency: Mem0 Cloud&rsquo;s vector search returns in 10–50ms for typical memory stores under 100K entries; graph queries run 80–200ms for single-hop lookups and 200–500ms for multi-hop. If your agent is latency-sensitive, run memory retrieval in parallel with the LLM call using async, not sequentially before it. The <code>add()</code> call — which runs LLM-based fact extraction and deduplication — runs 300–800ms and should always be done asynchronously after the response is returned to the user, not in the hot path.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> asyncio
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> mem0 <span style="color:#f92672">import</span> AsyncMemoryClient
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>client <span style="color:#f92672">=</span> AsyncMemoryClient(api_key<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;your-api-key&#34;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">async</span> <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">agent_turn</span>(user_id: str, user_message: str) <span style="color:#f92672">-&gt;</span> str:
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Retrieve memory and call LLM in parallel</span>
</span></span><span style="display:flex;"><span>    memory_task <span style="color:#f92672">=</span> asyncio<span style="color:#f92672">.</span>create_task(
</span></span><span style="display:flex;"><span>        client<span style="color:#f92672">.</span>search(user_message, user_id<span style="color:#f92672">=</span>user_id, limit<span style="color:#f92672">=</span><span style="color:#ae81ff">8</span>)
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Start LLM call with placeholder context, or await memory first</span>
</span></span><span style="display:flex;"><span>    memories <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> memory_task
</span></span><span style="display:flex;"><span>    context <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;</span><span style="color:#ae81ff">\n</span><span style="color:#e6db74">&#34;</span><span style="color:#f92672">.</span>join([m[<span style="color:#e6db74">&#34;memory&#34;</span>] <span style="color:#66d9ef">for</span> m <span style="color:#f92672">in</span> memories])
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Run LLM (simplified)</span>
</span></span><span style="display:flex;"><span>    response <span style="color:#f92672">=</span> <span style="color:#66d9ef">await</span> call_llm(user_message, context)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Store the exchange asynchronously — don&#39;t await in hot path</span>
</span></span><span style="display:flex;"><span>    asyncio<span style="color:#f92672">.</span>create_task(
</span></span><span style="display:flex;"><span>        client<span style="color:#f92672">.</span>add([
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;user&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: user_message},
</span></span><span style="display:flex;"><span>            {<span style="color:#e6db74">&#34;role&#34;</span>: <span style="color:#e6db74">&#34;assistant&#34;</span>, <span style="color:#e6db74">&#34;content&#34;</span>: response}
</span></span><span style="display:flex;"><span>        ], user_id<span style="color:#f92672">=</span>user_id)
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> response
</span></span></code></pre></div><p>On data residency: Mem0 Cloud stores data on AWS infrastructure. If your use case involves EU user data, verify that Mem0&rsquo;s data processing agreement covers GDPR requirements for your jurisdiction. For regulated data (PHI under HIPAA, financial data under SOC 2 requirements), the Pro tier&rsquo;s compliance certifications are the starting point — not the ending point. Review the shared responsibility model, configure data deletion hooks for right-to-erasure requests, and audit which LLM provider Mem0 uses for its fact extraction pipeline (OpenAI by default; this can be configured to Anthropic or local models for data residency reasons).</p>
<p>On memory quality: memory stores degrade when agents add low-quality or contradictory entries at high volume. Three mitigations work well in practice. First, filter what you add — not every message turn deserves to be stored. Use a lightweight classifier (even a simple regex or keyword filter) to identify turns with extractable facts before calling <code>m.add()</code>. Second, set TTLs on session-scoped memories to prevent stale task context from polluting future sessions. Third, review memory stores periodically using <code>m.get_all()</code> and build a manual curation interface for production deployments where memory quality directly affects user outcomes.</p>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Q: Does Mem0 work with local LLMs like Ollama or llama.cpp?</strong></p>
<p>Yes. Mem0&rsquo;s fact extraction and deduplication pipeline uses a configurable LLM. You can point it at any OpenAI-compatible endpoint, including Ollama running locally. Set the <code>llm</code> config block in your <code>Memory()</code> constructor to use <code>openai</code> provider with a custom <code>base_url</code> and <code>model</code> matching your local server. Note that fact extraction quality depends on the LLM&rsquo;s instruction-following capability; models under 7B parameters often produce lower-quality extractions.</p>
<p><strong>Q: How does Mem0 handle PII in stored memories?</strong></p>
<p>Mem0 does not automatically redact PII before storage. If users say &ldquo;my social security number is X,&rdquo; that fact will be extracted and stored. For production deployments handling PII, implement a pre-processing step that runs a PII detection library (Microsoft Presidio is a common choice) on user input before calling <code>m.add()</code>. On the cloud tier, configure data deletion webhooks to handle right-to-erasure requests from users.</p>
<p><strong>Q: What happens when two agents access the same user&rsquo;s memories simultaneously?</strong></p>
<p>Mem0 Cloud handles concurrent reads safely — multiple agents can call <code>m.search()</code> for the same user simultaneously without conflict. Concurrent writes to the same user&rsquo;s memory store are serialized internally to prevent race conditions in the deduplication pipeline, which means two simultaneous <code>m.add()</code> calls for the same user will queue rather than execute in parallel. In high-throughput scenarios, batch your <code>add()</code> calls or use a queue to control write concurrency.</p>
<p><strong>Q: Can I migrate from Mem0 Cloud to self-hosted without losing memories?</strong></p>
<p>Yes. Use <code>m.get_all()</code> with pagination to export all memories to JSON, then re-import them into a self-hosted Mem0 instance using <code>m.add()</code>. The memory IDs will change (the self-hosted instance generates new IDs), so update any external references accordingly. Mem0 does not currently offer a native export/import CLI command, so the migration script needs to be written manually — it is around 30 lines of Python.</p>
<p><strong>Q: How does Mem0 compare to simply storing chat history in a database and summarizing it?</strong></p>
<p>Naive chat history + summarization works at small scale and fails in three ways at production scale. Summarization is lossy — facts get dropped or distorted as conversation length grows. Summaries don&rsquo;t support semantic retrieval — you can&rsquo;t efficiently answer &ldquo;what did this user say about their database preferences?&rdquo; against a summary blob. And the token cost of passing full summaries into every LLM call grows linearly with session count. Mem0&rsquo;s approach — discrete extracted facts with semantic retrieval and adaptive deduplication — solves all three problems. The tradeoff is that Mem0 adds an LLM call per <code>add()</code> operation (for fact extraction), whereas raw summary storage does not. For most use cases, the token savings from targeted retrieval more than offset the extraction overhead.</p>
]]></content:encoded></item></channel></rss>