
TL;DR
On April 7, 2026, Anthropic officially unveiled Claude Mythos Preview - a frontier model that sits in a new tier called Capybara, above Opus. Within weeks of internal testing, Mythos autonomously discovered thousands of zero-day vulnerabilities across every major operating system and web browser, many of which had survived decades of human review.
On April 7, 2026, Anthropic officially unveiled Claude Mythos Preview - a frontier model that sits in a new tier called Capybara, above Opus. Within weeks of internal testing, Mythos autonomously discovered thousands of zero-day vulnerabilities across every major operating system and web browser, many of which had survived decades of human review. Cybersecurity stocks tanked. The industry panicked.
The question on everyone's mind: If a model can find zero-days in the Linux kernel, does anything else matter?
The answer is yes - and understanding why requires separating three layers that most people conflate: the model, the harness, and the platform.

A model is raw intelligence. It can reason, read code, hypothesize about vulnerabilities, and even write proof-of-concept exploits. Mythos is extraordinary at this - arguably a step-change over everything before it.
But a model alone is a brain in a jar. It has:
What Mythos demonstrated - finding bugs in C/C++ core code of Linux, browsers, and Apache - is genuinely remarkable. Fuzzing and auditing low-level system code has always been one of the hardest problems in security. Models are making that tractable for the first time.
But finding a memory corruption bug in a .c file is fundamentally different from executing a penetration test against a live enterprise environment.
A harness wraps a model with the infrastructure it needs to actually do things. Anthropic's own Mythos testing used a harness: they launched containers, invoked Claude Code, pointed it at source files, let it run experiments, and piped results through a validation agent.
A harness provides:
Think of it this way: Claude Code is a harness. It gives a model a terminal, file system access, and the ability to iterate. But Claude Code is a general-purpose harness. It knows nothing about cybersecurity workflows, attack methodologies, or how to maintain an engagement across multiple targets over multiple days.
A cybersecurity-specific harness needs to understand:
A platform is everything around the harness that makes it operationally useful:
No model provides this. No harness provides this. This is what a platform like Strobes provides.
Strobes is not a model. Strobes is not competing with Mythos.
Strobes is a Continuous Threat Exposure Management (CTEM) platform with an AI-native harness layer. Here's what that means in practice:
Attack Surface as the Foundation. Strobes' thesis is that the breadth and accuracy of attack surface discovery is the single most important differentiator in automated security. The best model in the world is useless if it's pointed at the wrong targets. Strobes' ASM continuously maps external and internal attack surfaces, including assets that organizations don't even know they have.
Harness-Level Intelligence. Strobes builds the orchestration layer that models need to perform real security work - not just code review in a container, but coordinated multi-tool, multi-stage assessments against live infrastructure. This includes:
Model Agnosticism. Strobes integrates with the best available models - today that includes Claude Opus 4.6 via AWS Bedrock, with the architecture ready to incorporate Mythos-class capabilities as they become available. When models get smarter, Strobes gets smarter. The harness and platform amplify whatever brain you plug in.
The Pentest-to-Patch Data Flywheel. Every engagement Strobes runs generates structured data about real-world vulnerabilities, attack paths, and remediation outcomes. This data compounds over time, creating a proprietary advantage that no model - however intelligent - can replicate from first principles. This is the moat.
Operational Reality. Enterprises don't buy models. They buy outcomes: fewer vulnerabilities, faster remediation, audit-ready reports, and continuous visibility into their exposure. Strobes delivers this as a platform, with AI as the engine - not the product.
There's a subtler distinction that gets lost in the Mythos hype, and it matters more than most people realize: the difference between a tool and an autonomous reasoning system.

Take Burp Suite, Nuclei, or any established security scanner. These tools are fundamentally deterministic. You configure a target, select your test cases or templates, hit run, and the tool executes a predefined sequence of checks. Nuclei runs YAML templates against endpoints. Burp's scanner crawls and fires payloads from a known library. OWASP ZAP follows the same pattern.
The workflow is linear: configure, execute, report. The tool doesn't think. It doesn't adapt mid-scan based on what it's finding. It doesn't reason about whether a 403 response on one endpoint implies a misconfigured access control pattern that might be exploitable on a different endpoint. It doesn't decide to pivot from web application testing to API enumeration because it noticed an undocumented GraphQL endpoint in a JavaScript bundle.
These tools are powerful - they've been the backbone of application security for over a decade. But they are fundamentally execution engines, not reasoning engines.
Mythos represents the other extreme. It's pure reasoning with a minimal execution scaffold. Anthropic's own testing setup was remarkably simple: launch a container with source code, point Claude Code at it, and say "find vulnerabilities." The model reads code, forms hypotheses, writes test cases, runs them, observes results, adjusts its approach, and iterates - all autonomously.
This is genuinely impressive. But it's also bounded in important ways:
Mythos is a tool - an incredibly intelligent one - but it's still a tool. A better hammer doesn't become a construction company.
AI-native pentesting is neither of these things. It's not a traditional scanner with a chatbot bolted on, and it's not a raw model pointed at code in a sandbox. It's a fundamentally different product category: an autonomous decision-making system for security testing.
Here's what that looks like in practice:
Reasoning about what to test, not just how to test it. A traditional scanner tests everything in its template library against every endpoint it can find. An AI-native system reasons about the target's architecture, technology stack, and business context to decide what matters. It prioritizes testing the OAuth implementation over the static marketing pages - not because a human configured it to, but because it understands the relative risk.
Adapting in real-time based on findings. When an AI-native system discovers a misconfigured CORS policy on one subdomain, it doesn't just log it and move on. It reasons: "If CORS is misconfigured here, the same team likely deployed other services with similar patterns. Let me expand my testing to related subdomains and check for the same class of issue." This kind of lateral reasoning is impossible with template-based tools and impractical with raw models that lack the orchestration to act on it.
Chaining tools, techniques, and findings into attack paths. Traditional tools generate isolated findings: "SQLi on /api/users," "exposed .git directory," "default credentials on admin panel." An AI-native system chains these: "The exposed .git directory reveals the database schema. The SQLi on /api/users can be used to extract credentials. Those credentials may grant access to the admin panel." This is what human pentesters do - and it's what requires a harness with memory, state, and an attack graph.
Operating continuously, not episodically. A Burp Suite scan is a point-in-time event. An AI-native system runs continuously against the full attack surface, incorporating new assets as they appear, re-testing after deployments, and correlating findings over time. This requires a platform layer that no model or traditional tool provides.
This framework clarifies who is actually threatened by Mythos-class models and who isn't:
Threatened: Traditional SaaS scanners. Tools that run static pattern-matching - SAST tools that grep for eval(), DAST tools that fire the same XSS payloads at every input field, dependency scanners that just check version numbers - are increasingly redundant. A model that can reason about code semantics will outperform them on accuracy, and the cost of model inference will continue to fall. These tools were already commoditized; Mythos accelerates their obsolescence.
Not threatened: Intelligent harnesses and platforms. Systems that provide the orchestration, memory, attack surface context, and operational infrastructure that models need to function in real-world engagements become more valuable as models improve. A better brain makes the body more capable, not less necessary.
The new differentiator: Evaluation and benchmarking. As models become interchangeable commodities, the ability to evaluate which model performs best for which security task becomes critical. This is why benchmarking infrastructure - like Strobes' pentest-bench - matters. Not all models reason equally well about authentication bypasses versus memory corruption versus business logic flaws. The harness that can dynamically select the right model for the right task, and prove that selection with data, has a structural advantage.
Claude Mythos is a genuine breakthrough. It proves that AI can find vulnerabilities that decades of human effort missed. Every security company - Strobes included - should be excited about integrating these capabilities.
But the panic is misplaced. A smarter brain doesn't eliminate the need for a body, a nervous system, and an environment to operate in.
Models find bugs. Harnesses execute engagements. Platforms run programs.
Strobes builds the harness and the platform. When the next Mythos drops - and the one after that - Strobes becomes more powerful, not obsolete.
The companies that should worry are the ones selling dumb pattern-matching wrapped in a dashboard. The companies that will thrive are the ones with the orchestration, data, and operational infrastructure to put increasingly powerful models to work.
That's Strobes.
Want to see how Strobes combines AI models with a purpose-built harness for real-world engagements? Read more about agentic pentesting with Strobes AI and our AI harness architecture for offensive security.