Strobesstrobes
Platform
Solutions
Resources
Customers
Company
Pricing
Book a Demo
Strobesstrobes

Strobes connects every exposure signal to autonomous action, so security teams fix what matters, prove what works, and stop chasing noise.

Book a DemoTalk to an expert
ISO 27001SOC 2CREST
  • Platform
  • Platform Overview
  • Agentic Exposure Management
  • AI Agents
  • Integrations
  • API & Developers
  • Workflows & Automation
  • Analytics & Reporting
  • Solutions
  • Exposure Assessment (EAP)
  • Attack Surface Management
  • Application Security Posture
  • Risk-Based Vulnerability Management
  • Adversarial Exposure Validation (AEV)
  • AI Pentesting
  • Pentesting as a Service
  • CTEM Framework
  • By Industry
  • Financial Institutions
  • Technology
  • Retail
  • Healthcare
  • Manufacturing
  • By Roles
  • CISOs
  • Security Directors
  • Cloud Security Leaders
  • App Sec Leaders
  • Resources
  • Quick Agentic Pentest
  • Blog
  • Customer Stories
  • eBooks
  • Datasheets
  • Videos & Demos
  • Exposure Management Academy
  • CTEM Maturity Assessment
  • Pentest Health Check
  • Security Tool ROI Calculator
  • Company
  • About Strobes
  • Meet the Team
  • Trust & Security
  • Contact Us
  • Careers
  • Become a Partner
  • Technology Partner
  • Partner Deal Registration
  • Press Release

Weekly insight for security leaders

CTEM research, agentic AI trends, and what's actually moving the needle.

© 2026 Strobes Security Inc. All rights reserved.

Privacy PolicyTerms of ServiceCookie PolicyAccessibilitySitemap
Back to Blog
What Is a Red Team Assessment? (And How It Differs From Pentesting)
Offensive SecurityPenetration Testing

What Is a Red Team Assessment? (And How It Differs From Pentesting)

Akhil ReniJanuary 5, 20268 min read

Table of Contents

  • What is a red team assessment?
  • What does a real red team assessment look like?
  • The most valuable output is the detection-gap report, not the flag
  • How does a red team assessment differ from a penetration test?
  • How do you score a red team assessment?
  • What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?
  • Why a red team assessment matters for defenders
  • Frequently asked questions
  • Sources and references

Authors

A
Akhil Reni

Share

Table of Contents

  • What is a red team assessment?
  • What does a real red team assessment look like?
  • The most valuable output is the detection-gap report, not the flag
  • How does a red team assessment differ from a penetration test?
  • How do you score a red team assessment?
  • What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?
  • Why a red team assessment matters for defenders
  • Frequently asked questions
  • Sources and references

Authors

A
Akhil Reni

Share

TL;DR
  • ✓A red team assessment is a goal-based attack simulation that emulates a real adversary to test whether your people, process, and technology detect and respond to an intrusion.
  • ✓It is scored by what the SOC saw, not by a count of vulnerabilities: dwell time, detection rate, and mean time to respond are the headline numbers.
  • ✓Operators work under realistic constraints (stealth, OPSEC, a defined threat profile) and map every action to MITRE ATT&CK so each step is a behavior defenders can hunt.
  • ✓Penetration testing answers 'what is broken here?'; red teaming answers 'would we catch a real attacker, and how far would they get before we stopped them?'

In Verizon's 2024 Data Breach Investigations Report, the median time for an organization to detect a breach is still measured in days to weeks, while the attacker needs only hours to reach their objective. That gap, between when an intruder lands and when anyone notices, is the single thing a red team assessment is built to measure. It is not a hunt for vulnerabilities. It is a test of whether you would see a capable adversary moving through your network in time to stop them.

This guide walks through what a red team assessment actually involves: a real attack narrative from phishing to objective, the detection gaps it surfaces, the metrics that score it, and where it differs from a penetration test. If you have a SOC, EDR, and an incident-response process you have never tested under realistic pressure, this is the assessment built for that.

Table of contents
  1. What is a red team assessment?
  2. What does a real red team assessment look like?
  3. The most valuable output is the detection-gap report, not the flag
  4. How does a red team assessment differ from a penetration test?
  5. How do you score a red team assessment?
  6. What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?
  7. Why a red team assessment matters for defenders

What is a red team assessment?

A red team assessment is an objective-driven exercise where testers emulate the tactics, techniques, and procedures (TTPs) of a real threat actor to reach a specific goal without being detected. Instead of enumerating flaws across a scope, the team picks a target outcome agreed in scoping and works toward it across whatever vectors are in play: external infrastructure, phishing, physical access, or a supplied foothold.

The objective is written down before anything starts, as a flag the white team can verify. Defining it precisely is what keeps the engagement honest and safe:

OBJECTIVE  Demonstrate ability to initiate a wire transfer
           from the treasury application (TREASURY-WEB01).
FLAG       Screenshot of payment-initiation screen
           + contents of \\fin-fs01\treasury\flag.txt
OUT OF     Real funds movement, DoS, destruction of
SCOPE      production data, any action on TREASURY-PROD.
WIN        Flag captured, OR red team detected and
CONDITION  ejected before capture (a win for blue).

That last line matters: getting caught is a result, not a failure. The blue team usually does not know the exercise is happening, or knows only that one may occur in a window, because a red team measures real detection. TTPs are mapped to MITRE ATT&CK so a foothold via phishing becomes T1566, reuse of stolen credentials becomes T1078 (Valid Accounts), host-to-host pivoting becomes T1021 (Remote Services), and credential theft from memory becomes T1003, each a behavior a defender can build a detection around.

Why detection, not coverage, is the test
5
ATT&CK techniques in a typical phishing-to-objective chain (T1566, T1078, T1021, T1003, T1486)
Days
Median dwell time before breaches are detected (Verizon DBIR)
Hours
Time a capable attacker needs to reach the objective once inside
30 days
Window to convert each missed technique into a durable detection

What does a real red team assessment look like?

The clearest way to understand a red team assessment is to watch one unfold against the treasury objective above. Here is a condensed narrative of how the campaign actually runs.

The team spends the first week on passive OSINT: harvesting employee names from LinkedIn, mapping the external attack surface, and identifying who handles finance. They send a spearphishing lure (T1566.001, Spearphishing Attachment) to three of those staff. One opens it, and a beacon checks in to a Sliver command-and-control server hosted behind a redirector, the channel shaped to look like routine HTTPS traffic. From that foothold the operators run BloodHound to collect Active Directory data, which renders the path to the objective as a graph:

$ # BloodHound shortest-path query result (abridged)
MATCH p=shortestPath((u:User {name:'JDOE@CORP'})
  -[*1..]->(g:Group {name:'DOMAIN ADMINS@CORP'}))

JDOE  --MemberOf-->  IT-SUPPORT
IT-SUPPORT  --GenericAll-->  SVC-BACKUP   <- over-privileged service acct
SVC-BACKUP  --AdminTo-->  FIN-JUMP01     <- finance jump host
FIN-JUMP01  --HasSession-->  treasury operator session

That GenericAll edge on a service account is the whole game. The operators harvest the SVC-BACKUP credential (T1078, Valid Accounts), use it to move laterally (T1021) to FIN-JUMP01, dump credentials from memory there (T1003), and ride an existing treasury-operator session to the payment screen. They never trigger ransomware-style impact (T1486) because the rules of engagement forbid it; they capture the flag file and screenshot the screen.

The result that matters is not the capture. It is the silence. Several steps SHOULD have fired an alert and did not.

The most valuable output is the detection-gap report, not the flag

The deliverable that earns a red team its fee is the gap analysis: a line-by-line account of what the attacker did, the ATT&CK technique behind it, and the detection that should have fired. For the treasury campaign above, the core of that table looked like this (rendered as a visual below). Each missed row is a concrete piece of detection-engineering work, not a vague recommendation.

The pattern is almost always the same. Perimeter and email controls catch the loud, well-known stuff. The interior, the lateral movement, the credential reuse, the service-account abuse, is where coverage collapses, and that is exactly where a real breach turns into a headline. A clean win for the defenders is not zero compromise; it is detecting the campaign at credential dumping (T1003) and containing it before the operators reach the finance zone. Mapping each gap to a technique ID lets you show coverage moving from red to green over successive engagements rather than guessing whether you improved.

Detection gaps the treasury scenario exposed
Attacker actionATT&CK IDDetection that SHOULD fire
Spearphishing attachment openedT1566.001Email gateway + macro/child-process EDR alert
Beacon C2 over HTTPS to redirectorT1071.001Outbound beaconing / JA3 anomaly on proxy logs
BloodHound AD collectionT1087 / T1069High-volume LDAP query from a workstation
Service-account reuse to pivotT1078 / T1021Logon from unusual host for a service identity
Credential dump on jump hostT1003LSASS handle-access / suspicious process alert

How does a red team assessment differ from a penetration test?

A penetration test is coverage-based and asks 'what vulnerabilities exist in this scope?', while a red team assessment is goal-based and asks 'can a real adversary reach this objective without us noticing?'. A pentest wants breadth across a defined target; a red team wants depth toward one outcome, and treats getting caught as a finding in itself.

That difference cascades through everything else. Scope: a pentest has a tight agreed list (these IPs, this app); a red team has a broad scope and narrow objective spanning network, social engineering, and physical vectors. Stealth: a pentester works loudly and efficiently and the blue team usually knows; a red team prioritizes evasion because detection is what they are testing. Duration: pentests run days to a couple of weeks, red team engagements run weeks to months to mirror a patient attacker. Output: a pentest delivers a ranked vulnerability list; a red team delivers an attack narrative, a detection-and-response gap analysis, and a timeline of what fired and what did not.

If you are still deciding which fits, our guide to the types of penetration testing covers where each sits, and the penetration testing overview sets the baseline a red team builds on. A common sequencing mistake: buying a red team before you have any detection to test. If the SOC cannot see anything, the team simply walks to the objective and the report tells you what you already knew.

How do you score a red team assessment?

You score a red team by detection and response, not by whether the flag was captured. Three numbers carry the verdict, and each has a concrete formula:

  • Dwell time = (timestamp of first SOC detection) minus (timestamp of initial access). In the narrative above, initial access was day 8 and the first true detection never came, so dwell time was the entire engagement, the worst possible result.
  • Detection rate = (ATT&CK techniques that generated an alert) divided by (techniques executed). Five techniques ran (T1566, T1078, T1021, T1003, ride-the-session); if only the phishing email flagged, detection rate is 1 of 5, or 20%.
  • MTTR = (timestamp of containment) minus (timestamp of first detection). It tests whether your incident-response playbook actually works under pressure, separate from whether you noticed at all.

A team that captures the objective in eight hours but is detected at hour two has given you a better outcome than one caught only in the final debrief, because the metrics, not the flag, are the product. The number that compounds is the conversion rate afterward: how many missed techniques became durable detections within 30 days. Track all four across engagements and you get a trend line for real-world resilience instead of a one-off war story.

Strobes insight
If your team can patch every finding from the last pentest and still has no answer to 'how fast would we detect a foothold?', you have outgrown coverage testing. That gap is exactly what dwell time measures.

What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?

Threat-led penetration testing (TLPT) is a regulated form of red teaming that uses real cyber threat intelligence to shape the scenario, so the simulated attack mirrors the actors most likely to target your organization. Instead of a generic adversary, a threat-intelligence provider profiles relevant groups, and the red team emulates their specific TTPs against live production systems.

The best-known frameworks are TIBER-EU (the European Central Bank's model, now reinforced by DORA for EU financial entities), the Bank of England's CBEST, and similar programs elsewhere. They share a structure: a threat-intelligence phase, a red team phase against production, and a tightly controlled white team coordinating both sides. These engagements are heavily governed precisely because they hit real systems, and they are usually reserved for systemically important institutions. In our experience the threat-intel phase is also where many programs first learn their actual perimeter is larger than their asset inventory said.

Why a red team assessment matters for defenders

A red team assessment matters because it tests the one thing a vulnerability list cannot: whether your defenders would actually catch and stop an intrusion in progress. You can patch every CVE a pentest finds and still lose to an attacker who phishes a credential, lands a foothold, and moves laterally for weeks because nobody was watching the right telemetry.

The lasting value is the debrief and the blue-team uplift that follows. A good red team hands defenders a timeline of every action mapped to MITRE ATT&CK, showing which techniques were detected, which were missed, and which alerts fired but were ignored. That feeds detection engineering directly: a Sigma-style rule for the BloodHound LDAP storm, an alert on service-account logons from unusual hosts, an LSASS-access detection for credential dumping, tighter segmentation around the finance zone. This is the foundation of purple teaming, where red and blue close each gap on the spot rather than waiting weeks for a report. Run continuously rather than once a year, that loop is where agentic pentesting changes the economics, keeping detections honest between set-piece exercises.

Frequently asked questions

What is the difference between a red team assessment and a penetration test?
A penetration test is coverage-based: it finds and ranks as many vulnerabilities as possible in a defined scope. A red team assessment is goal-based: it emulates a real adversary trying to reach a specific objective without being detected, testing your detection and response rather than your vulnerability count. Pentests run days; red team engagements run weeks to months.
How long does a red team assessment take and what does it cost?
Most engagements run three to twelve weeks, and pricing typically tracks that duration plus the seniority of the operators, so a full red team usually costs several times a comparable pentest. Threat-led engagements under TIBER-EU or CBEST run longer and cost more because they add a separate threat-intelligence phase before any attack begins.
Does a red team assessment use real attacks on production?
Yes, that is the point. A red team operates against live production systems and a real, unaware defending team, because detection and response can only be tested under realistic conditions. The work is governed by tight rules of engagement, a coordinating white team, and pre-agreed safety guardrails so destructive techniques like T1486 stay strictly out of scope.
Do I need a SOC before running a red team assessment?
Practically, yes. A red team measures whether your defenders detect and respond, so without a SOC, EDR, or logging the exercise succeeds unopposed and teaches little. Organizations without detection capability get more value from penetration testing first, then graduate to red teaming once there are real defenses to measure.
What is MITRE ATT&CK's role in a red team assessment?
MITRE ATT&CK is a public knowledge base of real-world attacker tactics and techniques. Red teams map their actions to ATT&CK technique IDs so the report ties each step to a specific behavior the blue team can hunt for, and so the debrief clearly shows which techniques were detected and which slipped through. It also turns coverage into something measurable across engagements.
How is a red team assessment scored if not by findings?
By detection and response metrics: dwell time (time from initial access to first detection), detection rate (share of executed techniques that fired an alert), and mean time to respond (time from detection to containment). The most durable metric is the conversion rate afterward, meaning how many missed techniques became real detections within 30 days.

Sources and references

  • MITRE ATT&CK
  • Verizon 2024 Data Breach Investigations Report
  • TIBER-EU Framework (ECB)
  • Bank of England CBEST
A
Akhil Reni
Co-founder and CTO, Strobes
Akhil Reni is co-founder and CTO of Strobes, building AI-driven penetration testing and exposure management for security teams.
Tags
Red TeamingOffensive SecurityPenetration Testing

Stop chasing vulnerabilities Start reducing exposure

See how Strobes AI agents validate and fix your most critical exposures automatically.

Book a Demo
Continue Reading

Related Posts

How to pentest single-page applications - React, Angular and Vue SPA security testing guide
Penetration TestingApplication Security

How to Pentest Single-Page Applications (React, Angular, Vue)

Learn how to pentest React, Angular, and Vue SPAs. Covers DOM XSS, client-side routing bypass, JS bundle secrets, and why traditional DAST scanners fail.

Jun 4, 202623 min
Bug bounty vs pentesting vs AI pentesting comparison featured image
Penetration TestingApplication Security

Bug Bounty vs. Pentesting vs. AI Pentesting: Which Model Fits Your AppSec Program?

Bug bounty vs pentesting vs AI pentesting: compare costs, coverage, compliance, and when to use each model. Build a layered AppSec testing strategy.

Jun 4, 202621 min
Pentesting in-house vs outsourcing comparison: cost, coverage, and the third option, AI pentesting
Penetration TestingPTaaS

Pentesting In-House vs. Outsourcing: Cost, Coverage, and the Third Option

Compare in-house vs outsourced pentesting on cost, coverage, and depth. Discover why AI pentesting is the third option that changes the math for security teams.

Jun 4, 202621 min