What is agentic pentesting?

Agentic pentesting uses AI agents that don't just answer questions about vulnerabilities — they actively execute tests, read responses, decide what to try next, confirm exploitability with working proof-of-concept payloads, and document findings without human hand-holding. The key distinction from AI-assisted pentesting is genuine autonomous execution across a full assessment methodology.

How many vulnerabilities did Strobes AI find in its OWASP WSTG assessment?

In the autonomous run described in this post, Strobes AI completed 32 tasks across 21 OWASP WSTG phases and confirmed 42 vulnerabilities, each with a working proof-of-concept payload. The entire assessment ran without manual guidance between tasks.

What is the PoC approach in agentic pentesting?

The PoC (proof-of-concept) approach means an AI pentesting agent only reports a finding after confirming it is actually exploitable — not just theoretically possible. Strobes AI is engineered to confirm every finding with a working exploit before filing it, eliminating the noise of unverified scanner alerts.

How does Strobes AI handle parallel pentesting tasks?

Strobes AI uses persistent multi-agent workspaces where specialized agents — web pentesting, auth handling, API testing, and others — can run in parallel, each with defined tools and methodology scope. Agents share workspace memory so findings from one agent inform the strategy of others throughout the engagement.

Back to Blog

Offensive Security Penetration Testing

Agentic Pentesting with Strobes AI

Prakash AshokMarch 25, 20269 min read

Authors

Prakash Ashok

TL;DR

Ask any pentester what kills an engagement, and it's rarely the technical difficulty. You scope the target, fire up Burp, start crawling, find something interesting on endpoint #3, spend forty minutes confirming it, write it up, then realize you've burned half a day and haven't touched authentication testing yet.

Ask any pentester what kills an engagement, and it's rarely the technical difficulty. It's the clock. You scope the target, fire up Burp, start crawling, find something interesting on endpoint #3, spend forty minutes confirming it, write it up, then realize you've burned half a day and haven't touched authentication testing yet.

We've all been there. The coverage gaps aren't from lack of skill; they're from the simple fact that one person can only type one command at a time.

That's the problem our Research & Engineering team set out to solve with Strobes AI. Not by replacing pentesters, but by giving them something they've never had: persistent, multi-agent workspaces that can run an entire OWASP WSTG assessment autonomously while you focus on the parts that actually need a human brain.

How Does "Strobes Agentic AI" Better Pentests?

The term gets thrown around a lot, so let's be specific. When we say agentic pentesting, we mean AI agents that don't just answer your questions about CVEs or suggest payloads. They actually execute. They run the scan, read the response, decide what to try next, confirm the vulnerability with a working proof-of-concept, and file the finding. No hand-holding required.

The key design decision — and this was deliberate from day one — was to engineer the AI the way an expert pentester thinks. Strobes AI loads default skills and methodologies upfront, breaks the assessment into sub-agent tasks, crawls the target, and operates on what we internally call a "PoC approach": confirm it's exploitable, document the evidence, move on. No wasting cycles on theoretical findings that can't be demonstrated.

Here's how Strobes AI is set up to solve the pentest frictions:

Agents — Specialized AI personas, each built for a specific job. Web pentesting, network pentesting, API testing, code review, login & auth handling. They don't share a single monolithic prompt — each one has targeted capabilities and tooling.
Workspaces — Think of these as your engagement folder with a set of pentest projects, but persistent and queryable. Every asset, credential, file, shared table, finding, and task lives here. Day 3 of an engagement has full context from day 1.
Skills — Modular instruction sets following the open SKILL.md standard. They teach agents how to use specific tools or follow specific methodologies. You can write your own or use the built-in library.
Learnings — Knowledge extracted from prior workspace activity. If the agent mapped your auth flow on Monday, it remembers that on Wednesday without being told again.
Human in the Loop (HITL) — A governed approval layer. The agents are autonomous, not unsupervised. More on this later.

Strobes AI overview screen SMART mode enabled HITL toggle — Strobes AI overview screen — SMART mode enabled, HITL toggle

Strobes AI chat interface with quick actions surfacing common workflows — The chat interface where it all starts. Quick actions surface common workflows, so you're not typing prompts from scratch.

The Real Test: A Full Web App Pentest

To demonstrate what this looks like in practice, the security research & engineering team pointed Strobes AI at a real-world hosted app — a common benchmark target that makes results comparable across tools and testers.

Here's the important part: the team gave minimal instructions and no hand-crafted prompts. No step-by-step playbooks fed into the chat. They selected the Web App Pentest workflow template and let the agents figure out the rest. The whole point was to test whether the platform could operate the way an expert pentester would — load the right skills, break the assessment into phases, design test cases, execute them, and report findings — without someone babysitting every step.

It ran 32 independent agent tasks across 21 structured web app pentesting phases pre-loaded as per industry standards & security skills inherited at Strobes (All autonomous).

Tasks Completed

Workflow Phases

Vulnerabilities

Evidence Files

Shared Tables

Workspace overview — all 21 phases completed, 42 findings, 41 files. 6.8 AI credits consumed for the entire engagement.

Web App Pentest workflow template loaded by agents

The Web App Pentest workflow template — this is what gets loaded when you select the workflow. The agents take it from here.

How the Phases Played Out

The workflow maps directly to OWASP WSTG v4.2 categories & the Strobes security knowledge base. Each phase feeds into the next — the output of reconnaissance becomes the input for test case design, which becomes the task list for execution.

Phase	Description	Time
Phase 0 — Scope & Auth	Define scope, authenticate & understand target	12m 48s
Phase 1 — Info Gathering	Tech stack analysis & fingerprinting	5m 4s
Phase 2 — Dynamic Crawling	Endpoint discovery & attack surface mapping	5m 26s
Phase 3a — Attack Surface	Endpoint categorization & credential sweep	5m 4s
Phase 3b — WSTG Design	11 WSTG categories designed in parallel	~15 min
Phase 3c — Test Plan Merge	Merge test cases & create workspace tasks	2s
Phases 6–17 — Full WSTG	Config, auth, session, injection, client-side, crypto, API, business logic	Parallel
Phases 18–20 — Wrap-up	Finding validation, submission, and the pentest report	14s total

Phase 3b is where it gets interesting. The platform spun up 11 concurrent sub-agents, one for each WSTG test category — CONF, IDNT, ATHN, SESS, ATHZ, INPV, CLNT, CRYP, ERRH, BUSL, APIT — all designing test cases in parallel. A human pentester would work through these one at a time. That's not a minor speedup; it's a different operating model.

Workflow execution — phase progression with individual timings. Every phase supports restart if you need to re-run.

WSTG test execution running across multiple categories simultaneously.

Configuration, authentication, and session management testing phases.

Input validation, client-side testing, and cryptography checks.

Business logic API testing and final reporting phase 14 seconds

Business logic, API testing, and the final reporting phase — 14 seconds to validate, submit, and generate the report.

Inside a task agent tool invocations and decision chain during execution

Inside a task — the agent's actual tool invocations and decision chain during execution.

Structured findings output with WSTG test IDs and evidence.

Vulnerability detail view — request/response pairs, working payloads, severity classification.

42 Vulnerabilities. With Working Payloads.

Not theoretical findings. Not "possible" vulnerabilities flagged by a scanner with a confidence score. Every one of these 42 findings came with a working payload, request/response evidence, an OWASP WSTG test ID, and a severity classification. The "PoC" approach in action.

Severity	Count	Examples
Critical	22	UNION SQLi on /artists.php, /listproducts.php, /categories.php, /cart.php, /userinfo.php, /guestbook.php, /search.php; Error-based SQLi on /AJAX/; Auth Bypass on /login.php; Plaintext Cookie Forgery
High	8	IDOR on /userinfo.php (Horizontal Privilege Escalation); Stored XSS on /guestbook.php; Path Traversal/LFI on /showimage.php; Admin directory listing exposing /admin/create.sql
Medium	12	Reflected XSS on /search.php, /artists.php, /listproducts.php; CSRF on guestbook & login forms; Missing HttpOnly/Secure/SameSite cookie flags

22 criticals is a big number, but keep in mind this is a deliberately vulnerable app. The real takeaway isn't the count — it's that the agents found UNION-based SQLi across 7 different endpoints, each confirmed with extracted data. That's the kind of thoroughness that usually requires a pentester to manually test each parameter individually.

The findings list — 42 documented vulnerabilities with severity, WSTG mapping, and status tracking.

Individual finding detail payload evidence remediation guidance filed directly into CTEM pipeline

Individual finding detail — payload, evidence, remediation guidance, all filed directly into the CTEM pipeline.

What Gets Left Behind (In a Good Way)

A pentest that finds vulnerabilities but doesn't produce clean evidence is only half done. Here's what the workspace contained when the agents finished:

41 files organized across /access, /auth, /discovery, /docs, /phase2-crawl, /scope, and /test_cases — with markdown summaries for each phase
4 shared tables: auth_tokens, Pentest Findings, Auth Testing, Attack Surface — Endpoints — all queryable by any agent in the workspace
2 Learnings the platform extracted automatically: "Authentication Flow" (form-based login, no MFA, no CSRF protection) and "Attack Surface Map" (19 endpoints across 11 WSTG categories, with top SQLi candidates flagged)
A full pentest report generated in the Dashboard INSIGHT widget — executive summary, metrics, remediation guidance. The kind of deliverable that usually takes a day to write after the engagement ends.

Workspace file tree — 41 evidence files organized into structured folders.

Shared tables — structured data that persists across the engagement and is accessible to all agents.

The auto-generated pentest report in the Dashboard INSIGHT widget.

Why This Matters (Beyond the Demo)

It's easy to be impressed by a demo against a deliberately vulnerable app. We get that. But the architecture underneath is what matters for real-world use.

Minimal instructions, maximum coverage. Strobes's team didn't write a 500-line prompt. They selected a workflow template and the platform did what an expert pentester would do: loaded the right skills, broke the work into phases, designed test cases at the sub-agent level, and executed. The "PoC" philosophy means the agents don't waste cycles on theoretical findings; they confirm exploitability and move on.

Parallel execution changes the math. When 11 WSTG categories get tested simultaneously instead of sequentially, you're not saving 10% of the time. You're compressing what would be a multi-day manual effort into minutes. And unlike a human context-switching between test categories, each sub-agent has full focus on its specific domain.

Persistent memory across the engagement. Every crawled endpoint, every tested parameter, and every credential attempt is stored in shared tables and learnings. When a sub-agent picks up a new task on day 3, it has the full context from day 1 without anyone re-briefing it.

The evidence package writes itself. No more spending the day after an engagement assembling screenshots and writing the report. The workspace contains organized files, structured findings, and a generated report by the time the agents mark the engagement complete.

Knowledge-based & Skill-based testing. Strobes AI is engineered to run efficiently at default to ensure it meets industry standards for pentest, and at the same time is customisable enough to add more agent skills, knowledge base, and learnings in a pentest journey to bring out maximum efficiency.

Where This Is Going

What we showed here is a sample web app pentest. The same architecture powers network pentesting, API testing, cloud security reviews, and code analysis — all within the same workspace model, all with the same PoC-first, evidence-driven & pentester skills inherited approach.

For pentesters: this doesn't replace what you do. It removes the ceiling on how much of your expertise you can apply at once. You're still the one who decides what's in scope, reviews edge cases, and makes the judgment calls that require experience. The agents handle the breadth. You handle the depth.

For security teams: continuous offensive validation stops being something you budget for quarterly and starts being something that runs alongside your CI/CD pipeline. Same methodology, same rigor, without the scheduling bottleneck.

Want to go deeper on the architecture powering this? Read how we built the AI harness for offensive security — the orchestration, tooling, and validation layers that make agentic pentesting reliable at production scale. And if you're curious how the crawling phase works in detail, check out why crawling is the hardest part of AI-powered pen testing.

Based on live workspace data from the Strobes AI exposure management platform. Engagement led by Prakash Ashok and the Agentic Security Engineering team.

Back to Blog

Offensive Security Penetration Testing

Agentic Pentesting with Strobes AI

Prakash AshokMarch 25, 20269 min read

Authors

Prakash Ashok

TL;DR

We've all been there. The coverage gaps aren't from lack of skill; they're from the simple fact that one person can only type one command at a time.

How Does "Strobes Agentic AI" Better Pentests?

Here's how Strobes AI is set up to solve the pentest frictions:

Agents — Specialized AI personas, each built for a specific job. Web pentesting, network pentesting, API testing, code review, login & auth handling. They don't share a single monolithic prompt — each one has targeted capabilities and tooling.
Workspaces — Think of these as your engagement folder with a set of pentest projects, but persistent and queryable. Every asset, credential, file, shared table, finding, and task lives here. Day 3 of an engagement has full context from day 1.
Skills — Modular instruction sets following the open SKILL.md standard. They teach agents how to use specific tools or follow specific methodologies. You can write your own or use the built-in library.
Learnings — Knowledge extracted from prior workspace activity. If the agent mapped your auth flow on Monday, it remembers that on Wednesday without being told again.
Human in the Loop (HITL) — A governed approval layer. The agents are autonomous, not unsupervised. More on this later.

The Real Test: A Full Web App Pentest

It ran 32 independent agent tasks across 21 structured web app pentesting phases pre-loaded as per industry standards & security skills inherited at Strobes (All autonomous).

Tasks Completed

Workflow Phases

Vulnerabilities

Evidence Files

Shared Tables

Workspace overview — all 21 phases completed, 42 findings, 41 files. 6.8 AI credits consumed for the entire engagement.

The Web App Pentest workflow template — this is what gets loaded when you select the workflow. The agents take it from here.

How the Phases Played Out

Phase	Description	Time
Phase 0 — Scope & Auth	Define scope, authenticate & understand target	12m 48s
Phase 1 — Info Gathering	Tech stack analysis & fingerprinting	5m 4s
Phase 2 — Dynamic Crawling	Endpoint discovery & attack surface mapping	5m 26s
Phase 3a — Attack Surface	Endpoint categorization & credential sweep	5m 4s
Phase 3b — WSTG Design	11 WSTG categories designed in parallel	~15 min
Phase 3c — Test Plan Merge	Merge test cases & create workspace tasks	2s
Phases 6–17 — Full WSTG	Config, auth, session, injection, client-side, crypto, API, business logic	Parallel
Phases 18–20 — Wrap-up	Finding validation, submission, and the pentest report	14s total

Workflow execution — phase progression with individual timings. Every phase supports restart if you need to re-run.

WSTG test execution running across multiple categories simultaneously.

Configuration, authentication, and session management testing phases.

Input validation, client-side testing, and cryptography checks.

Business logic, API testing, and the final reporting phase — 14 seconds to validate, submit, and generate the report.

Inside a task — the agent's actual tool invocations and decision chain during execution.

Structured findings output with WSTG test IDs and evidence.

Vulnerability detail view — request/response pairs, working payloads, severity classification.

42 Vulnerabilities. With Working Payloads.

Severity	Count	Examples
Critical	22	UNION SQLi on /artists.php, /listproducts.php, /categories.php, /cart.php, /userinfo.php, /guestbook.php, /search.php; Error-based SQLi on /AJAX/; Auth Bypass on /login.php; Plaintext Cookie Forgery
High	8	IDOR on /userinfo.php (Horizontal Privilege Escalation); Stored XSS on /guestbook.php; Path Traversal/LFI on /showimage.php; Admin directory listing exposing /admin/create.sql
Medium	12	Reflected XSS on /search.php, /artists.php, /listproducts.php; CSRF on guestbook & login forms; Missing HttpOnly/Secure/SameSite cookie flags

The findings list — 42 documented vulnerabilities with severity, WSTG mapping, and status tracking.

Individual finding detail — payload, evidence, remediation guidance, all filed directly into the CTEM pipeline.

What Gets Left Behind (In a Good Way)

A pentest that finds vulnerabilities but doesn't produce clean evidence is only half done. Here's what the workspace contained when the agents finished:

41 files organized across /access, /auth, /discovery, /docs, /phase2-crawl, /scope, and /test_cases — with markdown summaries for each phase
4 shared tables: auth_tokens, Pentest Findings, Auth Testing, Attack Surface — Endpoints — all queryable by any agent in the workspace
2 Learnings the platform extracted automatically: "Authentication Flow" (form-based login, no MFA, no CSRF protection) and "Attack Surface Map" (19 endpoints across 11 WSTG categories, with top SQLi candidates flagged)
A full pentest report generated in the Dashboard INSIGHT widget — executive summary, metrics, remediation guidance. The kind of deliverable that usually takes a day to write after the engagement ends.

Workspace file tree — 41 evidence files organized into structured folders.

Shared tables — structured data that persists across the engagement and is accessible to all agents.

The auto-generated pentest report in the Dashboard INSIGHT widget.

Why This Matters (Beyond the Demo)

It's easy to be impressed by a demo against a deliberately vulnerable app. We get that. But the architecture underneath is what matters for real-world use.

Where This Is Going

Based on live workspace data from the Strobes AI exposure management platform. Engagement led by Prakash Ashok and the Agentic Security Engineering team.

Agentic Pentesting with Strobes AI

Table of Contents

Authors

Share

How Does "Strobes Agentic AI" Better Pentests?

The Real Test: A Full Web App Pentest

How the Phases Played Out

42 Vulnerabilities. With Working Payloads.

What Gets Left Behind (In a Good Way)

Why This Matters (Beyond the Demo)

Where This Is Going

Agentic Pentesting with Strobes AI

Table of Contents

Authors

Share

How Does "Strobes Agentic AI" Better Pentests?

The Real Test: A Full Web App Pentest

How the Phases Played Out

42 Vulnerabilities. With Working Payloads.

What Gets Left Behind (In a Good Way)

Why This Matters (Beyond the Demo)

Where This Is Going