MemPalace is an open-source AI memory framework that scored 96.6% on the LongMemEval benchmark — the highest result ever recorded by a free, self-hosted memory system. It launched on April 5, 2026, gained 23,000+ GitHub stars within 48 hours, and now powers persistent memory for thousands of Claude Code, LangChain, and custom agent deployments. This review covers how it works, what the benchmark score actually means, how to set it up in five minutes, and when to pick a paid alternative instead.
What Is MemPalace? The Open-Source AI Memory System Taking GitHub by Storm
MemPalace is a free, MIT-licensed AI memory framework that gives agents persistent, queryable memory across sessions — without sending data to any external API. Unlike Mem0 ($19–249/month) or Zep ($25+/month), MemPalace runs entirely on your own hardware using ChromaDB for vector search and SQLite for metadata storage. It was built by developer Ben Sigman and is named after the actress Milla Jovovich, who reportedly inspired the “Method of Loci” metaphor at the core of its architecture. The project launched on April 5, 2026, and hit 23,000+ GitHub stars in 48 hours — becoming the fastest-growing AI memory repository of the year. Its core differentiator is verbatim storage: instead of asking an LLM to summarize or extract facts from conversations (which introduces lossy transformations), MemPalace stores full conversation chunks verbatim and retrieves the most relevant ones at query time using vector similarity. This approach trades storage efficiency for accuracy — and the benchmark results show it works. The system exposes 19 MCP (Model Context Protocol) tools covering memory search, storage, knowledge graph queries, and agent diaries, making it plug-and-play with Claude Code and any MCP-compatible agent framework.
How MemPalace Works: The Method of Loci Meets AI Agent Memory
MemPalace works by organizing agent memory into a four-level spatial hierarchy — Wings, Rooms, Halls, and Drawers — inspired by the ancient “Method of Loci” technique where orators memorized speeches by mentally placing information in distinct rooms of an imagined building. In MemPalace, a Wing holds an agent’s entire memory domain (e.g., one Wing per user or per project), Rooms represent topic clusters within that domain, Halls are individual conversation threads, and Drawers are the atomic memory chunks that store verbatim text. This hierarchy isn’t just branding: it directly shapes how ChromaDB indexes and retrieves vectors. When a query arrives, MemPalace first narrows scope to the relevant Wing and Room before running vector similarity search — a structural optimization that reduces retrieval latency significantly compared to a flat vector scan over every stored chunk. The system initializes with just 170 tokens of startup context, making it one of the lightest-weight memory frameworks available. ChromaDB handles all vector operations on CPU via FAISS, while SQLite manages the palace structure metadata and drawer-to-room-to-wing relationships. No GPU is required, and no data is transmitted to external services at any point in the pipeline.
Verbatim Storage vs. LLM Extraction
The biggest architectural decision in MemPalace is storing conversations verbatim rather than asking an LLM to extract facts. Mem0, by contrast, runs every conversation through an LLM call to extract structured facts — which costs money per interaction and introduces transformation errors. If the LLM misses a nuance (“user prefers dark mode in VS Code only, not web apps”), that nuance is permanently lost. MemPalace avoids this by storing the raw text and letting the retrieval step surface the relevant passage when it’s actually needed. The tradeoff: verbatim storage grows fast. A year of daily AI conversations produces roughly 10 million tokens of uncompressed memory data. MemPalace’s AAAK compression dialect compresses 19.5M tokens to ~650K (30x reduction), but comes with a 12.4% accuracy regression on LongMemEval.
Startup Cost and Context Footprint
One underappreciated advantage: MemPalace initializes with just 170 tokens of startup context. Many memory frameworks inject large system prompts or full memory dumps into every conversation. MemPalace’s lightweight initialization means you’re not burning context window or increasing cost on every agent turn — only the retrieved memory chunks relevant to the current query are injected.
MemPalace Benchmark Results: What the 96.6% LongMemEval Score Really Means
MemPalace scored 96.6% on LongMemEval in raw (uncompressed) mode — the highest score ever recorded by a free, open-source AI memory framework, and higher than most paid alternatives as well. Mem0 scores approximately 85% on the same benchmark; Zep hovers around 82%; and the broader field of commercial memory services clusters in the 80–92% range. The 11-percentage-point gap over Mem0 is substantial in practice: in memory-intensive agent tasks, that gap translates directly to fewer hallucinations, better task continuity, and more reliable behavior across long-horizon multi-session workflows. LongMemEval is the standard benchmark for evaluating AI memory systems — it tests how accurately an agent can answer questions about information disclosed in prior sessions, simulating real use cases such as a coding assistant remembering a user’s TypeScript preferences, a customer service agent recalling an account history, or a personal productivity tool tracking long-running project goals. The benchmark was designed specifically to expose the failure modes of LLM-extracted summary approaches, which tend to lose nuance during the extraction step. MemPalace’s verbatim storage sidesteps this class of error entirely, which explains its benchmark advantage at the cost of greater storage consumption.
The 100% Controversy: What Actually Happened
The original launch materials claimed a 100% LongMemEval score. Within 24 hours, independent researchers identified that the benchmark methodology didn’t involve the palace hierarchy at all — the 100% result came from a controlled test that didn’t reflect production retrieval. The corrected figure of 96.6% is the score under realistic conditions. The creator acknowledged this publicly and the README was updated. The 96.6% figure is real and independently reproducible — but it applies only in raw (non-compressed) mode. Enabling AAAK compression drops accuracy to 84.2%, which is still competitive with many paid systems but significantly lower than the headline figure.
What LongMemEval Doesn’t Test
LongMemEval measures retrieval accuracy at moderate scale. It doesn’t test SQLite stability at 116K+ drawers (where crashes are documented), concurrent write throughput under multi-agent load, or latency under high query volume. The benchmark is a useful proxy for retrieval quality — not a complete production readiness test.
Setting Up MemPalace: Step-by-Step Integration with Claude Code
MemPalace installs in under five minutes via pip and integrates with Claude Code as an MCP server that auto-discovers all 19 memory tools automatically — no custom tool definitions, no YAML configuration files, and no API keys required. This setup guide covers the full path from pip install to your first memory query inside a Claude Code session. MemPalace requires Python 3.10+ and approximately 500MB of initial disk space for the ChromaDB installation. All vector operations run on CPU using FAISS, so there is no GPU requirement. The system stores everything locally in ~/.mempalace/ by default — a ChromaDB vector database for embeddings and a SQLite file for palace structure metadata. When you connect MemPalace to Claude Code via MCP, Claude can read from and write to memory during any conversation without any additional prompting — the tools are automatically available for Claude to use when relevant context needs to be stored or retrieved. This is the fastest path to giving a Claude Code agent persistent cross-session memory at zero ongoing cost.
Installation
pip install mempalace
mempalace init --wing my-agent
The init command creates a local palace structure, initializes the ChromaDB vector store, and creates the SQLite metadata database — all in ~/.mempalace/ by default.
MCP Integration with Claude Code
Add MemPalace as an MCP server in your Claude Code settings (~/.claude/settings.json):
{
"mcpServers": {
"mempalace": {
"command": "mempalace",
"args": ["serve", "--wing", "my-agent"]
}
}
}
Restart Claude Code. It will auto-discover all 19 MCP tools including memory_search, memory_store, diary_write, and knowledge_graph_query. No additional configuration required.
Basic Usage in an Agent
from mempalace import Palace
palace = Palace(wing="my-agent")
# Store a memory verbatim
palace.store(
content="User prefers TypeScript strict mode with eslint-airbnb config",
room="preferences",
hall="coding-setup"
)
# Retrieve relevant memories for a query
memories = palace.search("what does the user prefer for TypeScript?", top_k=3)
for mem in memories:
print(mem.content)
Environment Requirements
MemPalace requires Python 3.10+ and approximately 500MB of disk space for the base ChromaDB installation. No GPU is required — vector operations run on CPU via FAISS. There’s no API key, no external dependency, and no data leaves your machine.
MemPalace vs. Mem0 vs. Zep: Full Feature and Cost Comparison
MemPalace is the only fully free, self-hosted option among the three major AI memory frameworks — and for most individual developers and small teams, the cost difference alone makes it the default choice. Mem0 charges $19/month for its Starter plan and up to $249/month for the Business tier, with cloud-managed infrastructure and LLM-extraction-based memory. Zep starts at $25/month and scales based on user count, offering hybrid memory storage with strong GraphQL query support. MemPalace costs $0 in perpetuity, runs entirely on your own hardware, and outperforms both on the LongMemEval accuracy benchmark in raw mode. The meaningful trade-offs emerge around scale, concurrency, and semantic consolidation — areas where the paid services invest heavily in engineering. The table below compares the three frameworks across dimensions that matter most for production decision-making, followed by concrete recommendations for when each makes sense.
| Feature | MemPalace | Mem0 | Zep |
|---|---|---|---|
| Price | Free (MIT) | $19–249/month | $25+/month |
| LongMemEval score | 96.6% (raw) | ~85% | ~82% |
| Storage approach | Verbatim | LLM extraction | Hybrid |
| Startup context cost | 170 tokens | ~800 tokens | ~600 tokens |
| MCP tools | 19 | 6 | 4 |
| Knowledge graph | Yes | Yes (paid tiers) | Yes (paid) |
| Self-hosted | Yes (only option) | Optional (paid) | Optional (paid) |
| Max scale (free) | ~116K drawers | N/A | N/A |
| Concurrent writes | Limited (SQLite) | High (cloud) | High (cloud) |
| AAAK compression | Yes (accuracy cost) | N/A | N/A |
When MemPalace Wins
For individual developers, small teams, and projects where data privacy matters (legal, medical, personal tools), MemPalace is the clear choice. Zero cost, 96.6% retrieval accuracy, and full local control are hard to beat when you don’t need cloud scale.
When Mem0 or Zep Win
If you’re running a multi-tenant SaaS product where thousands of users each have persistent memory, SQLite becomes the bottleneck. Mem0 and Zep are purpose-built for high-concurrency, high-user-count deployments. Zep in particular offers GraphQL-based memory queries that are easier to integrate with complex data pipelines. The per-month cost becomes negligible at scale compared to the engineering cost of working around SQLite limits.
MemPalace Limitations: Scalability, Storage Growth, and AAAK Compression Trade-offs
MemPalace has three documented production limitations that developers must understand before committing to it at scale: SQLite crashes above 116,000 drawers, unbounded storage growth from verbatim memory accumulation, and a 12.4% accuracy regression when AAAK compression is enabled to control that growth. These aren’t theoretical risks — all three are documented in the MemPalace GitHub issue tracker with reproducible test cases. For individual developers and small teams operating within the documented limits, none of these are blockers. For multi-user production deployments or high-throughput agent pipelines, they require active mitigation strategies or a migration to a cloud-managed alternative. Understanding these failure modes upfront lets you design around them from the start — periodic drawer archiving, storage capacity planning, and clear criteria for when to switch to Mem0 or Zep. The MemPalace maintainers are actively working on SQLite fixes and a pluggable backend interface that would allow swapping ChromaDB/SQLite for PostgreSQL with pgvector, which would eliminate both the concurrency and the scale limit issues. That work is tracked in the public roadmap but not yet merged as of May 2026.
SQLite Crashes at 116K+ Drawers
The most serious production issue is a documented SQLite crash that occurs when the drawer count exceeds 116,000 (SQLITE_MAX_VARIABLE_NUMBER exceeded). GitHub issue #1106 tracks this — the root cause is SQLite’s limit on bound parameters in a single query. The MemPalace maintainers are working on a batched query fix, but as of May 2026 it’s not merged. If you’re running a high-throughput agent that stores many small memories, you’ll hit this limit faster than you expect. Mitigation: periodically archive old drawers to cold storage and reset active palace state.
Verbatim Storage Growth Rate
A year of daily AI conversations at typical usage produces roughly 10 million tokens of raw text. With ChromaDB’s overhead, this translates to several gigabytes of disk per year per user. For single-user setups on developer hardware, this is manageable. For multi-user deployments — even small teams — disk growth becomes a planning concern. AAAK compression addresses this (30x reduction), but introduces the 12.4% accuracy regression documented below.
The AAAK Compression Trade-off
AAAK (Adaptive Asymmetric Attention Kernel) is MemPalace’s built-in compression dialect that reduces token storage by 30x. The trade-off is accuracy: enabling AAAK drops LongMemEval performance from 96.6% to 84.2%. That 12.4% regression means the system occasionally fails to retrieve the most relevant memory and surfaces an older, less relevant one instead. For most conversational AI use cases, 84.2% is acceptable. For high-stakes applications where memory accuracy is critical — medical history, financial preferences, legal context — stick with raw mode and plan for storage growth.
No Semantic Consolidation
MemPalace doesn’t consolidate semantically similar memories. If a user mentions “I prefer dark mode” in 50 separate conversations, MemPalace stores 50 separate chunks rather than updating a single preference record. This means the knowledge graph can become cluttered over time, and retrieval may surface redundant information. Mem0’s LLM-extraction approach naturally handles this (each new mention updates the existing fact), though at the cost of LLM API calls per interaction.
Who Should Use MemPalace in 2026? (And Who Should Look Elsewhere)
MemPalace is the best free AI memory system available in 2026 for individual developers and small teams, full stop. It scores 96.6% on LongMemEval — outperforming every free alternative and most paid ones — at zero ongoing cost, with full local data control and 19 MCP tools that integrate seamlessly with Claude Code. Independent benchmarks confirm the score is real and reproducible in raw mode. But best-in-class accuracy on a benchmark doesn’t automatically mean the right fit for every production scenario. The correct decision between MemPalace, Mem0, and Zep comes down to three factors: your user count (single vs. multi-tenant), your data privacy requirements (self-hosted vs. cloud-managed), and your tolerance for the known SQLite scalability limits. Below is a structured breakdown of which deployments should use MemPalace and which should pay for managed alternatives — with specific thresholds rather than vague recommendations.
Use MemPalace If:
- You’re building a personal AI assistant, developer tool, or small-team internal tool
- Data privacy is a hard requirement (legal, medical, enterprise compliance)
- You want MCP integration with Claude Code without API cost overhead
- You’re prototyping an agent and want to validate memory architecture before paying for cloud services
- Your drawer count will stay under 100K (safe margin before the SQLite limit)
Look Elsewhere If:
- You’re building multi-tenant SaaS with thousands of users each needing persistent memory
- You need concurrent write throughput from multiple parallel agents
- Your use case demands semantic consolidation (tracking single preferences that update over time)
- You need production SLAs, managed infrastructure, or enterprise support contracts
The Verdict
For the $0 price point, MemPalace delivers remarkable accuracy. The SQLite scalability ceiling is real but avoidable at most usage levels. If you’re running agents on Claude Code today and want persistent memory that actually works — not a 70% benchmark from two years ago — install MemPalace, use raw mode, and start building. Switch to Mem0 or Zep when your user count or memory volume genuinely demands it.
FAQ
Is MemPalace really free? Yes. MemPalace is MIT-licensed and runs entirely on local hardware. There are no API costs, subscription fees, or usage limits. The only cost is disk space and compute for ChromaDB vector operations, which run on CPU without GPU requirements.
What’s the difference between MemPalace raw mode and AAAK compression mode? Raw mode stores conversations verbatim without compression, achieving 96.6% LongMemEval accuracy but consuming more disk space (~10M tokens/year for daily use). AAAK compression reduces storage 30x but drops accuracy to 84.2%. Use raw mode for best results; enable AAAK only when storage is genuinely constrained.
How does MemPalace integrate with Claude Code?
MemPalace acts as an MCP server, and Claude Code auto-discovers all 19 memory tools when you add mempalace serve to your MCP server config. No custom tool definitions needed. The integration covers memory search, storage, knowledge graph queries, and agent diary writing.
Is MemPalace production-ready? For single-user or small-team deployments with under 100K memory drawers, yes. For high-concurrency multi-tenant deployments, SQLite becomes a bottleneck and the 116K+ drawer crash is a known production risk. The maintainers are actively working on fixes, but as of May 2026 this is an open issue.
How does MemPalace compare to Mem0 on accuracy? MemPalace scores 96.6% on LongMemEval in raw mode; Mem0 scores approximately 85%. The 11-point gap is meaningful for long-horizon agent tasks. However, Mem0 handles semantic consolidation (updating existing facts rather than appending new chunks) and scales better for multi-user deployments.
