Whitepaper · June 2026

Autonomous Pentesting
Benchmark Report 2026

We ran an autonomous pentest on a live app, then measured it against the field. One public target, independent ground truth, every result backed by 31,400 logged telemetry events and independently validated.

Free · 25-minute read · Sent straight to your inbox

strobesBenchmark 2026
Strobes AI · Benchmark 2026

Autonomous
Pentesting
Benchmark

One live target. Independent validation.
Full run telemetry.

Results at a glance

45
Validated findings
0
False positives
37
Exploitable issues
189s
To admin takeover
~$1.1k
Total cost
strobes.co© 2026
17 PAGES · FULL TELEMETRY
45
Validated findings
0
False positives
37
Exploitable live
189s
To verified admin takeover
~$1.1k
Est. total cost

01 · What we ran

One target. Independent ground truth.

Over June 10 to 11, 2026, Strobes AI ran a fully autonomous assessment of Fider v0.33.0, a production-grade open-source feedback platform with real authentication, file uploads, webhooks, OAuth, and an admin console. The security firm Doyensec independently assessed the same application and published validated results for two commercial AI security platforms. That shared, third-party reference is what makes the comparison checkable.

Every platform was evaluated against the same target and ground truth, within the same testing window and the same evaluation standard. Findings were independently validated, false positives independently reviewed, and results deduplicated before comparison.

01

A real, public target

Fider is a production-grade multi-tenant SaaS, open source, so anyone can stand up the same instance and reproduce the work — not a CTF box or a deliberately vulnerable training app.

02

Independent third party

Doyensec assessed the exact application and published validated results for two commercial AI platforms. The AI platform figures are Doyensec's, not ours.

03

Every result from telemetry

The cost basis is per-layer: every run agent, tool call, reasoning step, and credit was logged — 31,400 events in total. The figures are computed directly from that trace.

02 · The marquee result

189s to verified admin takeover

A confirmed multi-step attack chain, executed end to end with zero human intervention. The combined six-scanner field confirmed zero exploitable findings on the same target.

Step 01

OTP endpoint, no rate limit

Recon surfaces an unprotected one-time-password endpoint — the entry into the admin account.

Step 02189s

Code brute-forced

The admin session is captured in 189 seconds against the unprotected code.

Step 03

Session verified

The captured genuine admin session is confirmed and replayed.

Step 04

Pivot to webhooks

Continuous admin access pivots into the webhook functionality.

Step 05

SSRF

Blind SSRF → AWS IMDS. Outbound webhook reaches the cloud metadata service, exposing instance credentials.

03 · The comparison

Measured against six scanners and two AI pentesting platforms

All figures are post-validation and deduplicated. The AI platform numbers come from Doyensec's independent study of the same target. Lower false-positive count is better.

SourceValidatedFalse positivesExploitable
Six scanners (deduplicated)13143
AI platform · Aikido174
AI platform · XBOW261
Strobes AI45037

The six-scanner field deduplicated overlapping findings to 13 validated, with 14 false positives, and confirmed 3 as exploitable. The autonomous platforms flagged probable issues but did not confirm exploitability.

04 · The economics

70 to 75% lower cost per validated finding

The figures below are per validated finding. Scanner pricing reflects public list rates; the Strobes figure is measured AI-credit consumption on the same target. The cost bases differ, so read the comparison as directional even though the gap is large.

Scanner A
$235

13 findings · $4,000 / scan

XBOW
$167

24 findings · $4,000 / scan

Strobes AI70–75% lower
$22–27

45 findings · ~$1.1k credits

Based on public scanner pricing and measured Strobes AI credit consumption on the same target.

Why the gap is structural, not tuning

A signature scanner emits fixed candidates. Strobes AI authenticates, holds session state, chains weaknesses into a shared exploit, reasons about business logic, and routes each result to a confirm-or-discard step — so every candidate runs through a separate validation agent that must confirm it before it is recorded. The credit is the mechanism behind the zero false positives.

From the benchmark to your stack

From quarterly pentest to continuous validation

If autonomous assessment costs what this benchmark shows, your release cadence becomes the only limit. Talk to us about running this against your own target — network, API, cloud, or source-level review in one workspace.