Pentesting on RockB

ProjectDiscovery Neo Review: Nuclei-Based AI Pentest Agent That Found 66 Exploitable Vulnerabilities

Sat, 25 Apr 2026 21:05:09 +0000

ProjectDiscovery Neo is an autonomous AI security engineer that runs real exploit chains, not just detection passes. In a three-application benchmark spanning banking, healthcare, and insurance targets, Neo confirmed 66 exploitable vulnerabilities — the highest count of any tool tested — including 24 findings that no other scanner or agent caught.

What Is ProjectDiscovery Neo? (The Nuclei-Powered AI Security Engineer)

ProjectDiscovery Neo is an autonomous penetration testing platform built on the Nuclei toolchain, designed to behave like a senior security engineer: it plans attack chains, executes exploits, validates impact, and returns proof packs that your team can replay. Unlike traditional scanners that flag potential issues, Neo confirms whether a vulnerability is actually exploitable before reporting it. The platform launched commercially at RSAC 2026 in March after ProjectDiscovery won the RSAC 2025 Innovation Sandbox — the highest-profile pre-launch validation any AI security startup has received. Underneath Neo sits Nuclei, the open-source engine that has completed over 10 billion scans and is maintained by a community of 100,000+ security engineers with 9,000+ YAML templates covering CVEs, misconfigurations, and custom attack patterns. Neo takes this attack-pattern library — which no new AI security startup can replicate overnight — and wraps it inside an agentic loop powered by Claude Opus 4.5, running 30+ agent-native security tools inside isolated sandboxes. The result is a tool that combines breadth (every CVE template Nuclei ships) with depth (multi-step reasoning to chain vulnerabilities into working exploits).

How Neo Works: The Autonomous Pentest Loop Explained

Neo operates on a perceive–plan–act loop that mirrors how a human penetration tester approaches a target, but runs continuously without fatigue. When you give Neo a target — a URL, a repository, or an API spec — it begins with reconnaissance using Nuclei’s fingerprinting templates to map the attack surface. It then formulates a testing strategy, prioritizing high-value attack paths like authentication bypass, injection points, and sensitive data exposure. The agent dispatches exploit attempts using Nuclei templates plus custom code it writes on the fly inside sandboxes, then evaluates whether each attempt succeeded. Critically, if an initial approach fails, Neo dynamically changes strategy — a behavior observed in hands-on testing where the agent tried multiple authentication bypass techniques before finding one that worked. When exploitation succeeds, Neo captures full evidence: HTTP request/response pairs, extracted data, and a replayable proof pack that your security or engineering team can verify independently without needing to re-run the tool. This proof-first architecture is what separates Neo from tools that return CVSS scores without evidence. A finding in Neo’s output comes with a working exploit, not a probability estimate. The isolated sandbox environment means these exploit chains run safely without touching production state, making Neo suitable for continuous testing on live targets.

Benchmark Results Deep Dive: Breaking Down the 66 Exploitable Vulnerabilities

Neo’s public benchmark covered three full-stack AI-generated applications in banking, healthcare, and insurance — verticals where vulnerability impact maps directly to regulatory and financial risk. The benchmark methodology required full exploit confirmation, not just detection: a finding counted only if Neo could demonstrate actual impact. Neo confirmed 66 exploitable vulnerabilities total — the highest count of any tool in the comparison set, which included Semgrep, Snyk, CodeQL, and other AI security platforms. The 24 findings no other tool caught are the most commercially significant result: these were real vulnerabilities in production-representative codebases that an entire class of existing tools missed. Among the unique findings were an arbitrary refund vulnerability — a business logic flaw that traditional pattern-matching scanners cannot detect — and password hash exposure through an API endpoint. Traditional static analysis tools like Semgrep and CodeQL excel at known vulnerability classes but struggle with multi-step business logic flaws that require understanding application state across multiple requests. Neo’s agentic approach, which chains reconnaissance → injection → impact verification, is designed precisely for this gap. For security teams running quarterly manual pentests, the implication is clear: Neo can surface entire categories of vulnerabilities between pentests at a fraction of the cost.

What the 24 Unique Findings Tell Us

The 24 findings no other tool caught reveal a structural limitation in signature-based security tools. Business logic vulnerabilities — arbitrary price manipulation, privilege escalation through parameter tampering, state confusion attacks — require an attacker to understand how the application works before they can identify what to break. Nuclei templates encode attack patterns, but Neo’s Claude Opus 4.5 backbone adds the reasoning layer that connects those patterns into working multi-step exploits. When Neo encounters a checkout endpoint, it doesn’t just run a price-injection template; it first maps the authentication flow, identifies session handling, tests discount code logic, and then attempts manipulation at each layer. This is closer to how a human tester works than how any scanner works.

Real CVE Discoveries: 22 CVEs Across 12 Projects (Including Faraday SSRF)

Neo autonomously discovered 22 confirmed CVEs across 12 open source projects — findings that earned Neo the status of a credited security researcher, not just a scanning tool. The most documented example is an SSRF (Server-Side Request Forgery) vulnerability in Faraday, a widely used HTTP client library in the Ruby ecosystem. Neo identified that Faraday’s URL handling could be manipulated to make the application issue requests to internal network addresses, a class of vulnerability that enables cloud metadata theft, internal service enumeration, and in some architectures, full SSRF-to-RCE escalation. The Faraday finding matters because the library is embedded in tens of thousands of active Ruby deployments — the blast radius of an unpatched SSRF extends far beyond any single application. Neo’s CVE discovery workflow illustrates what autonomous AI pentesting looks like at research quality: it doesn’t stop at confirming the vulnerability exists, it traces the impact chain to understand exploitability and documents findings in a format that satisfies CVE disclosure requirements. For security teams, the CVE track record is a proof point that Neo’s autonomous agent isn’t just running pre-written templates — it’s capable of discovering novel vulnerabilities in widely deployed software. The 22 CVEs across 12 projects were found in software with tens of thousands of active deployments, meaning these weren’t obscure research targets; they were real software that real organizations run in production today.

Why the Faraday SSRF Discovery Matters for AppSec Teams

The Faraday SSRF is a concrete illustration of AI-driven vulnerability research outpacing traditional disclosure timelines. Human researchers typically discover SSRF vulnerabilities during manual code reviews or targeted testing engagements — processes that take days to weeks. Neo identified the Faraday SSRF autonomously as part of a broader scan, without a human researcher specifying “look for SSRF in Ruby HTTP clients.” This unsupervised discovery mode is what makes Neo strategically interesting for organizations that depend on third-party libraries: Neo can continuously scan your dependency graph for newly exploitable patterns without requiring a pentest engagement to be scheduled in advance.

Neo vs. Competitors: Pentera, Horizon3 NodeZero, XBOW, and Burp Suite

Neo competes in the autonomous penetration testing platform category alongside Pentera, Horizon3 NodeZero, XBOW, and Escape. Each tool takes a different architectural approach that shapes its coverage and best-fit use case. Pentera and NodeZero are network-focused autonomous pentest platforms optimized for internal network segmentation testing, lateral movement, and Active Directory attack paths — they excel in enterprise infrastructure environments but are not primarily designed for web application and API testing. Neo, built on Nuclei’s web-focused template library, has deeper coverage for HTTP-layer attacks including OWASP Top 10, business logic flaws, and API security issues. XBOW positions itself as an AI-native web application security tool with a similar proof-based approach to Neo, but lacks the Nuclei ecosystem’s 9,000+ template library and 10-billion-scan training corpus. Escape focuses specifically on API security testing with strong GraphQL coverage. Burp Suite remains the professional manual testing standard — it provides the deepest control for a skilled human tester but requires significant expertise and time to produce comparable coverage to what Neo delivers autonomously. For teams with dedicated security engineers who run manual pentests, Neo is a force multiplier rather than a replacement: it handles continuous scanning and CVE template coverage while human testers focus on complex business logic and architecture-level findings. For teams without in-house security expertise, Neo provides pentest-quality findings without requiring a full-time pentester on staff.

Tool	Focus Area	Proof-Based	Nuclei-Backed	Best For
Neo	Web/API/AppSec	Yes	Yes	Continuous AppSec, CVE coverage
Pentera	Network/Infrastructure	Yes	No	Internal network, AD testing
NodeZero	Network/Infrastructure	Yes	No	Enterprise lateral movement
XBOW	Web/API	Yes	No	AI-native web testing
Escape	API/GraphQL	Partial	No	API-first organizations
Burp Suite	Web/API (manual)	Manual	No	Expert manual testing

Enterprise Features, Pricing Model, and How to Get Access

Neo launched commercially in March 2026 with a usage-based pricing model built around tokens and infrastructure consumption — a departure from the flat per-application pricing that traditional PTaaS vendors like Cobalt, Synack, and Bishop Fox use. Traditional full PTaaS with manual testing typically runs $5,999+ per application per year; Neo’s token-based model allows teams to run continuous testing at significantly lower cost for initial deployments, though costs scale with target complexity and scan frequency. Enterprise features in the Neo platform include replayable proof packs for every confirmed finding, integration with GitHub and GitLab for shift-left testing in CI/CD pipelines, isolated sandbox execution to prevent production impact during live testing, and a dashboard that tracks vulnerability status across multiple target applications over time. The proof pack feature is particularly valuable for compliance-driven organizations: instead of a static PDF pentest report, Neo produces machine-readable evidence that can be attached to a Jira ticket, reviewed in a pull request, or submitted to a compliance auditor. Access to Neo is through ProjectDiscovery’s platform at projectdiscovery.io — the company offers a waitlist and direct sales engagement for enterprise deployments. For teams already using Nuclei open source, Neo represents a cloud-managed upgrade path that adds the agentic reasoning layer without requiring teams to build and maintain their own orchestration infrastructure.

Shift-Left Security: Running Neo in CI/CD

Neo’s GitHub and GitLab integration enables a security testing pattern that traditional pentesting cannot support: testing every pull request against a staging environment before code merges. A typical CI/CD security gate with Neo runs in minutes — Nuclei’s parallel template execution means that a 9,000-template scan completes far faster than a human tester could review the same code. When Neo finds a confirmed vulnerability in a PR, it creates a finding with full evidence, blocking the merge until the issue is resolved. This shifts security left in the software delivery lifecycle — catching exploitable vulnerabilities at the point where they’re cheapest to fix, before they reach production.

Limitations: Where Neo Falls Short and What to Watch

Neo’s documented limitations matter for enterprise buyers evaluating deployment at scale. Hands-on testing has shown that Neo excels on single targets and small-to-medium assessments but may struggle with environments containing 200+ servers or resource-heavy infrastructure targets. This isn’t surprising for a tool designed around web application and API security — Nuclei’s template architecture is optimized for HTTP-layer testing, not infrastructure-scale network scanning. The token-based pricing model, while flexible, creates cost unpredictability for teams running continuous testing at high frequency across large application portfolios. Unlike flat-rate PTaaS pricing, token costs scale with scan complexity and depth, which means a thorough test of a complex application with many endpoints and authentication states will cost more than a shallow scan of a simple API. AI reasoning errors are a third limitation to monitor: while Claude Opus 4.5 provides strong multi-step reasoning, no LLM-based system eliminates false positives entirely. Neo’s proof-based approach reduces false positives significantly compared to signature scanners, but the requirement to review proof packs for each finding adds analyst time to the workflow. Finally, Neo’s coverage is strongest for web applications and APIs — organizations with significant mobile, desktop, or network infrastructure security requirements will still need specialized tools to complement Neo’s coverage.

When Not to Choose Neo

Neo is not the right tool for network penetration testing, Active Directory security assessments, or physical security evaluations. If your primary security concern is lateral movement inside a corporate network or privilege escalation in Windows domains, Pentera or NodeZero will provide better coverage. Neo is also not a compliance scanner — it doesn’t produce PCI DSS, SOC 2, or ISO 27001 compliance reports. If your team needs compliance-mapped findings rather than raw vulnerability evidence, you’ll need to map Neo’s output to compliance frameworks manually or use a dedicated GRC platform alongside it.

Should Your Team Use Neo? Verdict and Recommendations

Neo is the most compelling AI pentesting platform for web application and API security in 2026. The benchmark result — 66 confirmed exploitable vulnerabilities including 24 findings no other tool caught — is not a marketing claim; it reflects a structural advantage that the Nuclei ecosystem provides over tools built from scratch. The 22 autonomous CVE discoveries demonstrate that Neo is operating at research quality, not just scan quality. For development teams that ship web applications or APIs, Neo provides a continuous security testing capability that would otherwise require either a full-time security engineer or expensive quarterly pentests. The CI/CD integration makes it practical to adopt a shift-left security posture without restructuring engineering workflows. For security teams running existing manual pentest programs, Neo is a force multiplier: deploy it for continuous coverage between scheduled engagements, and redirect human testers toward architecture review and complex business logic testing where AI agents still struggle. The primary caveats are cost predictability for large-scale deployments and the 200+ server limitation for infrastructure-heavy environments. Teams should pilot Neo on a representative subset of their application portfolio before committing to full continuous coverage, to calibrate token costs against finding value before scaling.

FAQ

What makes ProjectDiscovery Neo different from traditional vulnerability scanners like Nessus or Qualys?

Traditional scanners like Nessus and Qualys detect the presence of known vulnerability signatures — they tell you a vulnerability might exist. Neo confirms that a vulnerability is actually exploitable by running real exploit chains and returning proof packs with full evidence. This proof-first approach eliminates the high false-positive rates that slow down remediation workflows in traditional scanning programs.

Does Neo replace manual penetration testing?

Neo automates the coverage layer of penetration testing — running known vulnerability patterns, CVE templates, and exploit chains continuously and at scale. It does not replace the judgment of a skilled human tester for architecture review, complex business logic attacks, or social engineering assessments. Most security teams use Neo to extend continuous coverage between scheduled manual pentests, not to eliminate the manual engagement entirely.

What programming languages and frameworks does Neo support?

Neo’s testing surface is application behavior at the HTTP/API layer, which means it is language- and framework-agnostic. Whether your application is built in Python, Java, Ruby, Node.js, or Go, Neo tests the deployed endpoints rather than the source code. For source code analysis, Nuclei also supports code-scanning templates, but Neo’s primary strength is black-box and grey-box network testing.

How does Neo handle false positives?

Neo’s proof-based architecture is specifically designed to minimize false positives. A finding appears in Neo’s output only when the agent has confirmed exploitability — not just detected a potential issue. In practice, this means Neo’s finding lists are shorter than those from traditional scanners but significantly higher in actionability. Security teams report that nearly all Neo findings require remediation, compared to the 30-50% false positive rates common in signature-based scanners.

What is Neo’s pricing and how does it compare to PTaaS vendors?

Neo uses a token-based pricing model tied to scan complexity and infrastructure consumption, rather than flat per-application fees. Traditional PTaaS with manual testing typically starts at $5,999+ per application per year. Neo’s token model is generally more cost-effective for teams running continuous testing at moderate frequency, but costs scale with target complexity. ProjectDiscovery offers enterprise pricing through direct sales for organizations running large application portfolios.

RunSybil AI Pentesting Review 2026: IAM and Container Security Testing Evaluated

Sat, 25 Apr 2026 19:04:45 +0000

RunSybil is an AI-native offensive security platform that autonomously chains IAM misconfigurations, container escapes, and CI/CD secret exposures into full attack paths — operating black-box against live cloud environments the same way a real attacker would, with no source code or agent credentials required.

What Is RunSybil? The AI-Native Pentesting Platform Explained

RunSybil is an AI-native penetration testing platform founded in 2023 by Ari Herbert-Voss — OpenAI’s first security research hire — and Vlad Ionescu, formerly of Meta’s Red Team X. The company raised $40M in a Series A in March 2026, backed by Khosla Ventures, the Anthropic Anthology Fund, Menlo Ventures, Conviction, and Elad Gil, with angels from OpenAI, Palo Alto Networks, Stripe, and Google. The product centers on an autonomous AI agent called Sybil that operates against live cloud environments in pure black-box mode — no source code, no privileged credentials, no static playbook. Sybil observes what access it can gain, adapts its attack path accordingly, and chains multiple vulnerability classes together the way an actual human attacker would. This is a fundamentally different model from legacy automated scanners that run pre-defined scripts or check configuration against a compliance checklist. The platform specifically targets the attack surface that dominates modern cloud breaches: IAM misconfiguration, non-human identities (NHIs), container workloads, and CI/CD pipeline secrets — the four categories that together account for over 80% of cloud security incidents in 2026.

The founding team’s pedigree matters here. Herbert-Voss spent years modeling adversarial AI at OpenAI before pivoting to offensive security tooling. Ionescu ran red team operations at Meta at enterprise scale. The combination shows in the product philosophy: RunSybil treats pentesting as an inference problem, not a rule-matching problem. Sybil generates hypotheses about what a lateral movement chain might look like, tests them, and updates based on what it observes — the same loop a skilled red teamer runs, but automated and continuous.

Core Capabilities: IAM Misconfigs, Container Escapes, and CI/CD Secret Exposure

RunSybil’s core capabilities center on the three attack surfaces most frequently exploited in modern cloud breaches: IAM policy misconfigurations, container and Kubernetes privilege escalation, and secrets exposed through CI/CD pipelines. Over 80% of cloud breaches involve misconfigured IAM policies, excessive permissions, or compromised credentials, and non-human identities — service accounts, CI runners, workload identities — now outnumber human users by 80:1 in mature cloud environments, with most carrying no automatic expiry and broader permissions than their actual function requires. RunSybil’s Sybil agent is purpose-built to find and chain these misconfigurations in a way static scanners cannot: it requests real tokens, attempts real role assumptions, and validates actual exploitability rather than theoretical risk.

IAM Attack Surface Analysis

Sybil enumerates cloud IAM configurations — AWS IAM, GCP IAM, Azure RBAC — and then actively tests which permissions are exploitable from the attacker’s current position. Rather than flagging every policy that deviates from least-privilege (which produces thousands of low-signal findings), it identifies which specific misconfigurations create an actionable privilege escalation path. The distinction matters: a iam:PassRole permission on a Lambda execution role is low risk in isolation but catastrophic if that role can assume an admin-level role elsewhere. Sybil maps these chains automatically.

Container and Kubernetes Exploitation

82% of container users now run Kubernetes in production, and nearly 9 in 10 organizations reported a Kubernetes-related security incident in the prior year. RunSybil targets the entire container attack surface: privileged container misconfigurations, host path mounts, service account token scope, network policy gaps, and runtime API exposure. It tests for container escape paths — writable host mounts, privileged namespaces, kernel capability abuse — and chains successful container escapes into subsequent IAM or node-level privilege escalation.

CI/CD Pipeline Secret Exposure

Build pipelines are the new crown jewels. GitHub Actions, GitLab CI, CircleCI, and similar systems routinely expose secrets via environment variables, artifact stores, or misconfigured OIDC trust policies. Sybil specifically probes CI/CD pipeline configurations for exposed secrets, over-broad OIDC audience claims, and pipeline-to-cloud privilege relationships that create implicit lateral movement paths from source control into production infrastructure.

How Sybil Works: Agentic Black-Box Testing Step by Step

Sybil, RunSybil’s core AI agent, operates as a closed-loop offensive security system that begins from an attacker-equivalent starting position — a single low-privilege credential or network endpoint — and autonomously plans, executes, and adapts multi-step attack chains. Unlike traditional automated scanners that match known vulnerability signatures, Sybil reasons about what it observes, generates hypotheses about reachable attack paths, and updates its strategy based on each action’s outcome. This agentic loop runs continuously, not as a point-in-time scan, which means Sybil catches vulnerabilities introduced by configuration drift between scheduled assessments. The platform reports results as an attack graph: each node is an action Sybil took, each edge is the permission or misconfiguration that made the action possible. Security teams see not just what is vulnerable, but the precise chain of steps an attacker would follow to reach their most sensitive assets.

Step 1: Reconnaissance and Surface Mapping

Sybil begins by enumerating the exposed attack surface: cloud account metadata, publicly accessible endpoints, IAM entities and their attached policies, container registries, and pipeline configurations. It builds an internal graph of resources and their relationships before attempting any exploitation.

Step 2: Hypothesis Generation

Based on the surface map, Sybil generates a prioritized list of candidate attack chains — sequences of actions that, if each step succeeds, would result in privilege escalation or lateral movement. This is the AI-native layer: rather than running every possible check, Sybil reasons about which chains are most likely exploitable given what it has observed.

Step 3: Active Exploitation and Adaptation

Sybil executes the highest-priority chain, records the result, and updates its internal model. If a iam:CreateRole attempt is blocked by an SCP, it eliminates that branch and pivots to alternatives. If a Kubernetes service account token proves to have cluster-admin scope, it extends the chain to enumerate all resources accessible from that position. Each result informs the next action — the same feedback loop that makes human red teamers effective.

Step 4: Attack Graph Reporting

Results are delivered as an interactive attack graph with full action replay, finding severity scores, and remediation guidance mapped to the specific misconfiguration (not just a generic “fix IAM policies” recommendation). Reports are exportable for compliance evidence and integrate with ticketing systems for remediation tracking.

RunSybil vs. Pentera vs. NodeZero: Head-to-Head Comparison

RunSybil, Pentera, and NodeZero represent the three dominant approaches to automated enterprise pentesting in 2026, and they differ substantially in architecture, target environment, and cost model. Pentera costs approximately $120,000 per year and targets on-premises and hybrid environments with a broad vulnerability coverage model. NodeZero operates via a single agentless Docker container, requires no persistent credentials, has executed over 170,000 pentests, and became the first AI to solve the GOAD benchmark in 14 minutes — outperforming GPT-4o and Gemini 2.5 Pro on that benchmark. RunSybil is the newest entrant, the most cloud-native of the three, and specifically optimized for AWS/GCP/Azure environments with Kubernetes workloads and GitHub/GitLab-based CI/CD pipelines. The right choice depends primarily on where your attack surface lives.

Feature	RunSybil	Pentera	NodeZero
Architecture	Agentic AI (Sybil)	Automated scanner + AI	Agentless Docker container
Primary Target	Cloud-native (AWS, GCP, Azure)	On-prem / hybrid	Hybrid / cloud
Kubernetes Testing	Deep	Limited	Moderate
IAM Chain Analysis	Core feature	Basic	Moderate
CI/CD Pipeline Testing	Yes	Limited	Limited
Black-box Mode	Pure black-box	Credentials required	Agentless
Pricing	Custom (Series A, ~enterprise)	~$120K/year	Flat unlimited subscription
Onboarding	~2 weeks to first report	4–6 weeks	Days
On-prem Coverage	Weak	Strong	Moderate
Enterprise Scale Validation	Limited (newer product)	Extensive	170,000+ pentests

When to Choose RunSybil

RunSybil is the strongest choice for organizations running cloud-native infrastructure on AWS, GCP, or Azure with significant Kubernetes footprint and GitHub/GitLab-based CI/CD. If your primary concern is the IAM misconfiguration → container escape → lateral movement chain, and you want continuous testing rather than quarterly point-in-time scans, RunSybil’s agentic model is the most purpose-built for that use case.

When to Choose NodeZero

NodeZero is better validated at enterprise scale and offers a simpler deployment model (single container, no credentials). If you need rapid deployment, broad coverage across hybrid environments, and an established track record, NodeZero is the safer enterprise bet for 2026.

When to Choose Pentera

Pentera remains the strongest option for organizations with substantial on-premises infrastructure or mixed legacy/cloud environments. Its ~$120K price point is steep but reflects a mature product with deep on-prem coverage that neither RunSybil nor NodeZero matches.

Pricing, Onboarding, and Integrations

RunSybil’s pricing is not publicly listed — the company uses an enterprise sales model consistent with other Series A security startups targeting mid-market and enterprise buyers. Based on competitor positioning and the $40M raise, the platform is priced at the enterprise tier, likely in the $50K–$150K annual range depending on cloud account scope and assessment frequency. The onboarding timeline is approximately two weeks from initial access grant to first full attack report, which is faster than Pentera’s typical 4–6 week deployment but requires coordination with the RunSybil customer success team to configure Sybil’s initial access scope and connect cloud account roles.

Supported Cloud Integrations

AWS: IAM, EC2, ECS, EKS, Lambda, S3, Secrets Manager, CodePipeline
GCP: Cloud IAM, GKE, Cloud Run, Secret Manager, Cloud Build
Azure: RBAC, AKS, Container Instances, Key Vault, Azure DevOps
CI/CD: GitHub Actions, GitLab CI, CircleCI, Jenkins
Ticketing: Jira, ServiceNow, Linear

Continuous vs. Point-in-Time

One of RunSybil’s most operationally significant features is continuous assessment. Traditional pentests run quarterly or annually; in the interim, configuration drift, new deployments, and IAM policy changes create unvalidated exposure. Sybil runs continuously, which means a misconfigured IAM role created Tuesday at 3am is tested by Wednesday morning — not at the next scheduled engagement.

Pros, Cons, and Final Verdict: Is RunSybil Right for Your Cloud Stack?

RunSybil is one of the most technically credible AI-native pentesting platforms available in 2026, built by a founding team with genuine offensive security expertise at OpenAI and Meta scale, backed by investors who understand the AI security space. Its core strength — agentic IAM chain analysis and container security testing in pure black-box mode — is directly aligned with the attack surface responsible for the majority of cloud breaches. For cloud-native engineering organizations running AWS or GCP with Kubernetes and GitHub Actions, RunSybil tests the exact risk surface that keeps security teams awake at night. The $4.44M average breach cost from cloud misconfiguration alone makes a continuous agentic testing platform economically defensible against almost any enterprise pricing. The primary weaknesses are product maturity (founded 2023, limited independent enterprise-scale validation), on-premises coverage (essentially absent), and the early-stage sales process that comes with any pre-IPO security vendor. Organizations requiring broad hybrid coverage or on-prem network pentesting should evaluate NodeZero or Pentera alongside RunSybil.

Pros

Purpose-built for cloud-native IAM + container attack surface
Agentic black-box testing: no static playbooks, adapts dynamically
Two-week onboarding to first full attack report
Continuous assessment catches configuration drift between scheduled tests
Founded by OpenAI and Meta Red Team veterans with real offensive security credibility
$40M Series A backing signals runway and continued R&D investment

Cons

Limited independent enterprise-scale validation (newer product)
No meaningful on-premises or legacy infrastructure coverage
Pricing is enterprise-only with no self-serve tier
Feature set still maturing relative to Pentera’s decade-plus track record
Regional data residency options unclear at Series A stage

Final Verdict

For cloud-native organizations on AWS, GCP, or Azure running Kubernetes workloads and CI/CD pipelines, RunSybil is the most purpose-built agentic pentesting platform available in 2026. The IAM chain analysis and container escape testing capabilities are genuinely differentiated from static scanners. For organizations with on-prem or hybrid environments, start with NodeZero and revisit RunSybil as the product matures.

FAQ

What is RunSybil used for? RunSybil is an AI-native penetration testing platform that autonomously tests cloud infrastructure for IAM misconfigurations, container escape paths, and CI/CD pipeline secret exposure, using an agentic AI model that adapts its attack strategy based on what access it gains in real time.

How does RunSybil differ from traditional automated scanners? Traditional scanners match known signatures against static configurations. RunSybil’s Sybil agent starts from a low-privilege position, generates hypotheses about exploitable attack chains, executes them against live systems, and adapts based on results — the same loop a human red teamer runs, but automated and continuous.

How does RunSybil compare to NodeZero and Pentera? RunSybil is the most cloud-native of the three, with the deepest IAM and Kubernetes coverage. NodeZero is more validated at enterprise scale and easier to deploy. Pentera leads on on-premises and hybrid infrastructure. Choose RunSybil for pure-cloud environments, NodeZero for hybrid, and Pentera if on-prem coverage is a requirement.

What cloud platforms does RunSybil support? RunSybil supports AWS, GCP, and Azure, with integrations for Kubernetes (EKS, GKE, AKS), serverless functions, container registries, and CI/CD pipelines including GitHub Actions, GitLab CI, CircleCI, and Jenkins.

Who founded RunSybil and when did they raise funding? RunSybil was founded in 2023 by Ari Herbert-Voss (OpenAI’s first security research hire) and Vlad Ionescu (Meta Red Team X). In March 2026, the company raised $40M in a Series A led by Khosla Ventures, with participation from the Anthropic Anthology Fund, Menlo Ventures, Conviction, and Elad Gil.