
By Venu Rao, CEO, Strobes Security
If you run security, you've watched this play out. A frontier model ships, and by lunchtime your inbox has three emails saying the same thing. "AI can now find and exploit vulnerabilities faster than you can patch." "Attackers already have this capability." "You're behind, and here's the demo link." This month was the loudest it's been.
I get why it works. It's built to make you feel late, and feeling late makes you buy. But after a lot of conversations with security leaders who are actually living this, I'll tell you what I see. The story is mostly theater, and the teams that fall for it end up worse off than when they started. The gap between the pitch and the reality is where a lot of budget is about to die.
The narrative goes like this. A new class of AI models can autonomously discover zero-days and write working exploits. Therefore, we've entered a "vulnerability AI race," and the only winning move is to acquire AI-powered offensive capability immediately, before the same capability gets turned against you.
It's a clean fear loop. New capability appears, vendors amplify the threat, buyers panic-purchase to feel covered. The problem is that the loop optimizes for the vendor's quarter, not your risk posture. It compresses a complex operational problem into a single binary, buy now or be breached. Real exposure management doesn't work that way.
Strip away the marketing and a much calmer picture emerges. Here's how I read the current moment after a lot of conversations with security leaders who are living it.
This is now a real category, not a moment. LLM-driven vulnerability discovery, what's becoming known as agentic application security testing, is turning into a defined space, with frontier model providers and a wave of startups all competing in it. That's the right way to think about it. A new testing capability you evaluate like any other, on outcomes, integration, and cost. Not a launch-day emergency.
The model is not the moat. The harness is. This is the point the hype consistently buries. The effectiveness of an agentic testing product depends far more on the orchestration layer, the software that instruments the models and coordinates the agents, than on raw model performance. Two products built on the same underlying model can produce wildly different results depending on how the agents are directed, fed context, and validated. The headline benchmark on a model card tells you almost nothing about what a product will actually find in your environment.
Attackers are using AI, but not the way the pitch implies. Most adversaries are leaning on AI during the preparation phases of an attack, and the more advanced ones during post-breach activity like lateral movement and privilege escalation. That's meaningfully different from the cinematic image of an autonomous zero-day cannon aimed at you specifically. The realistic near-term threat is scale and speed applied to known, already-published vulnerabilities, not a flood of novel zero-days.
The thing that will actually hit your team is volume. The unglamorous near-term consequence of all this is more patches, more patching SLA exception requests, and more organizational chokepoints. And here's the uncomfortable part most vendor pitches skip. Your security team usually doesn't even own patching. Solution delivery and engineering teams do. A faster way to find problems does nothing for the bottleneck that actually determines your risk, which is how fast the right fixes get shipped.
Put those together and the "race" framing falls apart. The constraint was never discovery speed. It was, and still is, deciding what actually matters and getting it remediated.
If the problem were simply finding more issues faster, then sure, point an AI at your codebase and let it rip. But pour more findings into an already-overloaded program and you don't reduce risk. You inflate the backlog, deepen alert fatigue, and give your team more reasons to distrust the tooling.
I've sat across from enough security leaders to know the real metric on their mind isn't "how many vulnerabilities did we find this quarter." It's "can I stand in front of the board and credibly say our risk is going down." Those are very different problems. A tool that 10x's your finding count while remediation throughput stays flat has made your reporting worse, not better. You now have a larger number to explain and the same unresolved exposure underneath it.
This is why I'm skeptical of any AI security pitch that leads with discovery volume. Volume is the easy part now. The hard part, the part that actually moves risk, is everything downstream of the finding.
The framework that holds up here is Continuous Threat Exposure Management (CTEM). Not because it's fashionable, but because it forces the program to optimize for the right thing. A defensible, continuously validated view of which exposures genuinely put the business at risk, and a closed loop to drive those down.
Two disciplines inside CTEM matter most in the AI era.
Prioritization that reflects reality. A CVSS score in isolation is noise. Prioritization has to weigh exploitability, reachability, the surrounding control context, and live threat intelligence. Is this vulnerability actually reachable in your architecture? Is there a working exploit in the wild? Do compensating controls already blunt it? The Cyentia and FIRST study of EPSS data found that of roughly 238,000 published CVEs, only about 6% have ever shown exploitation activity. Without that context, you're triaging by severity label and hoping, which is how teams end up patching things attackers were never going to touch while a genuinely reachable exposure sits open.
Validation through exploitation. This is the step the market keeps skipping, and it's the one that separates a program that reduces risk from one that just produces reports. Prioritization tells you what might matter. Validation proves what actually does, by safely attempting to exploit the exposure and confirming it's reachable and impactful before a single ticket is filed. Prioritization without validation is just a prettier backlog. Validation is what lets you tell the board "these are the exposures that are genuinely exploitable in our environment, and here's the trend line," with evidence behind it.
When you run the full cycle, the frontier-model news stops being a threat to react to and becomes raw material you can fold into a discipline you already have. The questions shift from "do we need to panic-buy AI" to "where does AI-driven testing fit into our existing portfolio, and what does it cost at scale to run it well?" Those are answerable, unhurried questions, exactly the posture a serious leader should hold.
One more thing worth saying outright. Don't rip out your existing static and offensive testing to replace it with agentic testing. Add it to the portfolio. The mature move is additive, not a religious conversion.
The vendors selling you a race want you sprinting. The leaders who'll come out ahead are the ones who slowed down long enough to fix the part that was actually broken.
Ready to see what validation-through-exploitation looks like against your real attack surface? Request a Strobes AI CTEM demo.
Source: Cyentia Institute and FIRST, "A Visual Exploration of Exploitation in the Wild: The Inaugural Study of EPSS Data and Performance"