Jellyfish AI Coding Productivity Study 2026: More Tokens ≠ Better Output

Jellyfish AI Coding Productivity Study 2026: More Tokens ≠ Better Output

The Jellyfish AI Engineering Trends study of 7,548 engineers found a stark pattern: the heaviest AI token users produced twice the PR throughput but consumed ten times the token budget. More tokens do not equal more productivity — they equal a steeper cost curve that most engineering leaders aren’t measuring. What Is the Jellyfish AI Engineering Benchmark — and Why Should You Care? The Jellyfish AI Engineering Benchmark is the largest continuous dataset of real-world AI coding behavior ever assembled: as of early 2026 it covers 1,000+ companies, 200,000 engineers, and 37 million pull requests analyzed over rolling quarters. Unlike survey-based studies that capture developer sentiment, Jellyfish pulls instrumented telemetry — actual PRs merged, code churn rates, token consumption logs, and review cycles — making it a ground-truth view of what AI coding tools actually produce rather than what developers believe they produce. The benchmark is updated quarterly and published at jellyfish.co/ai-engineering-trends. ...

June 7, 2026 · 11 min · baeseokjae
How to Measure AI Coding ROI: Beyond Vanity Metrics

How to Measure AI Coding ROI: Beyond Vanity Metrics

Most teams measuring AI coding ROI are looking at the wrong numbers. Developers feel faster, acceptance rates look great, and vendor dashboards show impressive gains — but when you trace those numbers back to shipped features and business outcomes, the story falls apart. The disconnect is real. The METR study found developers felt 24% faster with AI coding tools but were actually 19% slower — and still reported 20% perceived improvement afterward. That gap between perception and reality isn’t just a curiosity; it’s where your ROI evaporates. ...

June 1, 2026 · 15 min · baeseokjae
Tokenmaxxing: The Hidden AI Coding Productivity Trap

Tokenmaxxing: The Hidden AI Coding Productivity Trap Costing Millions

Tokenmaxxing is the practice of maximizing AI token consumption as a proxy for engineering productivity — and it’s quietly destroying code quality, blowing AI budgets, and making developers measurably less effective. If your team celebrates high token usage without tracking what that code actually does downstream, you’re already in the trap. What Is Tokenmaxxing? The AI Productivity Myth That’s Costing Millions Tokenmaxxing refers to the organizational pattern where engineers and teams treat raw AI token consumption — the volume of text fed to and generated by AI models — as evidence of productivity and AI adoption. First surfaced in enterprise engineering analytics reports in early 2026, the term describes a management antipattern analogous to measuring developer output by lines of code: plausible on the surface, actively harmful in practice. In a Jellyfish Q1 2026 study of 7,548 engineers, teams with the largest AI token budgets achieved only 2x throughput despite spending 10x as many tokens compared to disciplined peers — meaning they paid ten times more for twice the output. Organizations embracing tokenmaxxing have burned through enterprise AI budgets at catastrophic rates. Uber exhausted its entire $3.4 billion annual AI budget in just four months. Meta created a public leaderboard ranking 85,000 employees by token consumption, crowning one developer a “Token Legend” after they burned 281 billion tokens in 30 days. The incentive structure is broken: when token consumption is rewarded, engineers optimize for token consumption rather than outcomes. The result is inflated AI spend, degraded code quality, and a productivity illusion that evaporates the moment you track downstream metrics. ...

June 1, 2026 · 15 min · baeseokjae
AI Developer Productivity Metrics 2026: Real Data From TELUS, Zapier, and Stripe

AI Developer Productivity Metrics 2026: Real Data From TELUS, Zapier, and Stripe

AI developer productivity in 2026 is no longer theoretical — companies like TELUS, Stripe, and Zapier have published hard numbers showing 30–250% productivity improvements, though the data reveals a troubling pattern: individual gains rarely translate to organizational delivery wins without deliberate measurement and workflow redesign. Why Developer Productivity Metrics Are Broken in the AI Era Developer productivity measurement in the AI era is fundamentally broken because the tools that generate value are also the tools that break traditional measurement. DORA metrics — deployment frequency, lead time for changes, change failure rate, time to restore — were designed for human-paced engineering workflows. When Stripe’s autonomous agents merge 1,300 pull requests per week with zero human-written code, deployment frequency spikes without reflecting genuine human productivity. When AI generates 41–46% of all code (GitHub’s 2026 data), lines of code per developer becomes meaningless as a baseline metric. The Harness engineering report found 89% of teams believe their current metrics accurately reflect AI’s impact — yet 94% of those same teams admit key factors like tech debt accumulation, AI validation time, and developer burnout are completely absent from their dashboards. This contradiction is the central measurement crisis in 2026 engineering: orgs feel productive, their tools tell them they’re productive, but the underlying delivery system is flying partially blind. The gap between self-reported and actual gains is real: METR’s survey of 349 technical workers found median self-reported speed increases of 3x, while organizational delivery metrics showed far more modest improvements. Understanding this paradox is the starting point for building measurement that actually works. ...

May 16, 2026 · 17 min · baeseokjae