Archon AI Benchmark: Open-Source Harness Builder for Reproducible AI Coding

Archon AI Benchmark: Open-Source Harness Builder for Reproducible AI Coding

Archon is an open-source AI coding harness builder that wraps LLMs like Claude Code and OpenAI Codex inside deterministic YAML workflows, lifting the PR acceptance rate from a raw 6.7% to nearly 70% — without changing the underlying model. If you’ve ever wondered why AI-generated code works brilliantly one day and catastrophically fails the next, the answer is the absence of structure. Archon provides that structure. What Is Archon? The First Open-Source AI Coding Harness Builder Archon is an open-source framework that converts ad-hoc AI coding sessions into reproducible, version-controlled workflows by wrapping LLM calls in a directed acyclic graph (DAG) of YAML-defined steps. Released by Cole Medin in early 2026 and rewritten entirely in TypeScript in April 2026, Archon reached 21,600+ GitHub stars — briefly trending #1 on GitHub — because it addresses a problem every developer using AI coding tools encounters immediately: the same prompt produces wildly different results across runs. Instead of accepting that variance as inevitable, Archon treats the workflow itself as a first-class engineering artifact. A .archon/workflows/ directory in your repository holds YAML files that define exactly how the AI plans, implements, tests, reviews, and submits a change. These workflow files are reviewed in pull requests alongside the code they generate. The analogy to Dockerfiles for infrastructure is deliberate: Archon is what Dockerfiles did for reproducible environments, applied to AI-generated code. ...

May 19, 2026 · 10 min · baeseokjae