Whitepaper · June 2026

Autonomous Pentesting Benchmark Report 2026

We ran an autonomous pentest on a live app, then measured it against the field. One public target, independent ground truth, every result backed by 31,400 logged telemetry events.

strobes

Benchmark · 2026

Strobes AI · Benchmark 2026

Autonomous Pentesting

Benchmark

One live target. Independent validation. Full run telemetry.

Results at a Glance

45Validated Findings

0False Positives

37Exploitable Issues

189sTo Admin Takeover

~$1.1kTotal Cost

strobes.co

The results, independently validated

0Validated findings

0False positives

0Exploitable live

0To verified admin takeover

0Est. total cost

01 · What we ran

One target. Independent ground truth.

Over June 10 to 11, 2026, Strobes AI ran a fully autonomous assessment of Fider v0.33.0, a production-grade open-source feedback platform with real authentication, file uploads, webhooks, OAuth, and an admin console. The security firm Doyensec independently assessed the same application and published validated results for two commercial AI security platforms. That shared, third-party reference is what makes the comparison checkable.

Every platform was evaluated against the same target and ground truth, within the same testing window and the same evaluation standard. Findings were independently validated, false positives independently reviewed, and results deduplicated before comparison.

02 · The marquee result

189s to verified admin takeover

A confirmed multi-step attack chain, executed end to end with zero human intervention. The combined six-scanner field confirmed zero exploitable findings on the same target.

OTP endpoint, no rate limit

Recon surfaces an unprotected one-time-password endpoint, the entry into the admin account.

Code brute-forced in 189s

The admin session is captured in 189 seconds against the unprotected code.

Session verified and replayed

The captured genuine admin session is confirmed and replayed for continuous access.

Pivot to webhooks

Continuous admin access pivots into the outbound webhook functionality.

SSRF to cloud metadata

Blind SSRF reaches AWS IMDS. The outbound webhook hits the cloud metadata service, exposing instance credentials.

03 · The comparison

Measured against six scanners and two AI pentesting platforms

Assessment Dimension

6 scanners

AI platforms

Strobes AI

Validated findingspost-dedup

False positiveslower is better

1 to 4

Confirmed exploitable

Time to admin takeover

none

189s

04 · The economics

70 to 75% lower cost per validated finding

Figures are per validated finding. Scanner pricing reflects public list rates; the Strobes figure is measured AI-credit consumption on the same target. The cost bases differ, so read the comparison as directional even though the gap is large.

$235 per finding

Scanner A: 13 validated findings at $4,000 per scan.

$167 per finding

XBOW: 24 validated findings at $4,000 per scan.

$22 to $27 per finding

Strobes AI: 45 findings on about $1.1k of credits, 70 to 75% lower cost per validated finding.

Why the gap is structural, not tuning

A signature scanner emits fixed candidates. Strobes AI authenticates, holds session state, chains weaknesses into a shared exploit, reasons about business logic, and routes each result to a confirm-or-discard step, so every candidate runs through a separate validation agent that must confirm it before it is recorded. The credit is the mechanism behind the zero false positives.

From quarterly pentest to continuous validation

If autonomous assessment costs what this benchmark shows, your release cadence becomes the only limit. Talk to us about running this against your own target across network, API, cloud, or source-level review in one workspace.

Book a demo Start free trial

Join 150+ security teams already reducing exposure with Strobes