What Is a Red Team Assessment? (And How It Differs From Pentesting)

Akhil ReniJanuary 5, 20268 min read

Authors

Akhil Reni

TL;DR

✓A red team assessment is a goal-based attack simulation that emulates a real adversary to test whether your people, process, and technology detect and respond to an intrusion.
✓It is scored by what the SOC saw, not by a count of vulnerabilities: dwell time, detection rate, and mean time to respond are the headline numbers.
✓Operators work under realistic constraints (stealth, OPSEC, a defined threat profile) and map every action to MITRE ATT&CK so each step is a behavior defenders can hunt.
✓Penetration testing answers 'what is broken here?'; red teaming answers 'would we catch a real attacker, and how far would they get before we stopped them?'

In Verizon's 2024 Data Breach Investigations Report, the median time for an organization to detect a breach is still measured in days to weeks, while the attacker needs only hours to reach their objective. That gap, between when an intruder lands and when anyone notices, is the single thing a red team assessment is built to measure. It is not a hunt for vulnerabilities. It is a test of whether you would see a capable adversary moving through your network in time to stop them.

This guide walks through what a red team assessment actually involves: a real attack narrative from phishing to objective, the detection gaps it surfaces, the metrics that score it, and where it differs from a penetration test. If you have a SOC, EDR, and an incident-response process you have never tested under realistic pressure, this is the assessment built for that.

Table of contents

What is a red team assessment?
What does a real red team assessment look like?
The most valuable output is the detection-gap report, not the flag
How does a red team assessment differ from a penetration test?
How do you score a red team assessment?
What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?
Why a red team assessment matters for defenders

What is a red team assessment?

A red team assessment is an objective-driven exercise where testers emulate the tactics, techniques, and procedures (TTPs) of a real threat actor to reach a specific goal without being detected. Instead of enumerating flaws across a scope, the team picks a target outcome agreed in scoping and works toward it across whatever vectors are in play: external infrastructure, phishing, physical access, or a supplied foothold.

The objective is written down before anything starts, as a flag the white team can verify. Defining it precisely is what keeps the engagement honest and safe:

OBJECTIVE  Demonstrate ability to initiate a wire transfer
           from the treasury application (TREASURY-WEB01).
FLAG       Screenshot of payment-initiation screen
           + contents of \\fin-fs01\treasury\flag.txt
OUT OF     Real funds movement, DoS, destruction of
SCOPE      production data, any action on TREASURY-PROD.
WIN        Flag captured, OR red team detected and
CONDITION  ejected before capture (a win for blue).

That last line matters: getting caught is a result, not a failure. The blue team usually does not know the exercise is happening, or knows only that one may occur in a window, because a red team measures real detection. TTPs are mapped to MITRE ATT&CK so a foothold via phishing becomes T1566, reuse of stolen credentials becomes T1078 (Valid Accounts), host-to-host pivoting becomes T1021 (Remote Services), and credential theft from memory becomes T1003, each a behavior a defender can build a detection around.

Why detection, not coverage, is the test

ATT&CK techniques in a typical phishing-to-objective chain (T1566, T1078, T1021, T1003, T1486)

Days

Median dwell time before breaches are detected (Verizon DBIR)

Hours

Time a capable attacker needs to reach the objective once inside

30 days

Window to convert each missed technique into a durable detection

What does a real red team assessment look like?

The clearest way to understand a red team assessment is to watch one unfold against the treasury objective above. Here is a condensed narrative of how the campaign actually runs.

The team spends the first week on passive OSINT: harvesting employee names from LinkedIn, mapping the external attack surface, and identifying who handles finance. They send a spearphishing lure (T1566.001, Spearphishing Attachment) to three of those staff. One opens it, and a beacon checks in to a Sliver command-and-control server hosted behind a redirector, the channel shaped to look like routine HTTPS traffic. From that foothold the operators run BloodHound to collect Active Directory data, which renders the path to the objective as a graph:

$ # BloodHound shortest-path query result (abridged)
MATCH p=shortestPath((u:User {name:'JDOE@CORP'})
  -[*1..]->(g:Group {name:'DOMAIN ADMINS@CORP'}))

JDOE  --MemberOf-->  IT-SUPPORT
IT-SUPPORT  --GenericAll-->  SVC-BACKUP   <- over-privileged service acct
SVC-BACKUP  --AdminTo-->  FIN-JUMP01     <- finance jump host
FIN-JUMP01  --HasSession-->  treasury operator session

That GenericAll edge on a service account is the whole game. The operators harvest the SVC-BACKUP credential (T1078, Valid Accounts), use it to move laterally (T1021) to FIN-JUMP01, dump credentials from memory there (T1003), and ride an existing treasury-operator session to the payment screen. They never trigger ransomware-style impact (T1486) because the rules of engagement forbid it; they capture the flag file and screenshot the screen.

The result that matters is not the capture. It is the silence. Several steps SHOULD have fired an alert and did not.

The most valuable output is the detection-gap report, not the flag

The deliverable that earns a red team its fee is the gap analysis: a line-by-line account of what the attacker did, the ATT&CK technique behind it, and the detection that should have fired. For the treasury campaign above, the core of that table looked like this (rendered as a visual below). Each missed row is a concrete piece of detection-engineering work, not a vague recommendation.

The pattern is almost always the same. Perimeter and email controls catch the loud, well-known stuff. The interior, the lateral movement, the credential reuse, the service-account abuse, is where coverage collapses, and that is exactly where a real breach turns into a headline. A clean win for the defenders is not zero compromise; it is detecting the campaign at credential dumping (T1003) and containing it before the operators reach the finance zone. Mapping each gap to a technique ID lets you show coverage moving from red to green over successive engagements rather than guessing whether you improved.

Detection gaps the treasury scenario exposed

Attacker action	ATT&CK ID	Detection that SHOULD fire
Spearphishing attachment opened	T1566.001	Email gateway + macro/child-process EDR alert
Beacon C2 over HTTPS to redirector	T1071.001	Outbound beaconing / JA3 anomaly on proxy logs
BloodHound AD collection	T1087 / T1069	High-volume LDAP query from a workstation
Service-account reuse to pivot	T1078 / T1021	Logon from unusual host for a service identity
Credential dump on jump host	T1003	LSASS handle-access / suspicious process alert

How does a red team assessment differ from a penetration test?

A penetration test is coverage-based and asks 'what vulnerabilities exist in this scope?', while a red team assessment is goal-based and asks 'can a real adversary reach this objective without us noticing?'. A pentest wants breadth across a defined target; a red team wants depth toward one outcome, and treats getting caught as a finding in itself.

That difference cascades through everything else. Scope: a pentest has a tight agreed list (these IPs, this app); a red team has a broad scope and narrow objective spanning network, social engineering, and physical vectors. Stealth: a pentester works loudly and efficiently and the blue team usually knows; a red team prioritizes evasion because detection is what they are testing. Duration: pentests run days to a couple of weeks, red team engagements run weeks to months to mirror a patient attacker. Output: a pentest delivers a ranked vulnerability list; a red team delivers an attack narrative, a detection-and-response gap analysis, and a timeline of what fired and what did not.

If you are still deciding which fits, our guide to the types of penetration testing covers where each sits, and the penetration testing overview sets the baseline a red team builds on. A common sequencing mistake: buying a red team before you have any detection to test. If the SOC cannot see anything, the team simply walks to the objective and the report tells you what you already knew.

How do you score a red team assessment?

You score a red team by detection and response, not by whether the flag was captured. Three numbers carry the verdict, and each has a concrete formula:

Dwell time = (timestamp of first SOC detection) minus (timestamp of initial access). In the narrative above, initial access was day 8 and the first true detection never came, so dwell time was the entire engagement, the worst possible result.
Detection rate = (ATT&CK techniques that generated an alert) divided by (techniques executed). Five techniques ran (T1566, T1078, T1021, T1003, ride-the-session); if only the phishing email flagged, detection rate is 1 of 5, or 20%.
MTTR = (timestamp of containment) minus (timestamp of first detection). It tests whether your incident-response playbook actually works under pressure, separate from whether you noticed at all.

A team that captures the objective in eight hours but is detected at hour two has given you a better outcome than one caught only in the final debrief, because the metrics, not the flag, are the product. The number that compounds is the conversion rate afterward: how many missed techniques became durable detections within 30 days. Track all four across engagements and you get a trend line for real-world resilience instead of a one-off war story.

Strobes insight

If your team can patch every finding from the last pentest and still has no answer to 'how fast would we detect a foothold?', you have outgrown coverage testing. That gap is exactly what dwell time measures.

What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?

Threat-led penetration testing (TLPT) is a regulated form of red teaming that uses real cyber threat intelligence to shape the scenario, so the simulated attack mirrors the actors most likely to target your organization. Instead of a generic adversary, a threat-intelligence provider profiles relevant groups, and the red team emulates their specific TTPs against live production systems.

The best-known frameworks are TIBER-EU (the European Central Bank's model, now reinforced by DORA for EU financial entities), the Bank of England's CBEST, and similar programs elsewhere. They share a structure: a threat-intelligence phase, a red team phase against production, and a tightly controlled white team coordinating both sides. These engagements are heavily governed precisely because they hit real systems, and they are usually reserved for systemically important institutions. In our experience the threat-intel phase is also where many programs first learn their actual perimeter is larger than their asset inventory said.

Why a red team assessment matters for defenders

A red team assessment matters because it tests the one thing a vulnerability list cannot: whether your defenders would actually catch and stop an intrusion in progress. You can patch every CVE a pentest finds and still lose to an attacker who phishes a credential, lands a foothold, and moves laterally for weeks because nobody was watching the right telemetry.

The lasting value is the debrief and the blue-team uplift that follows. A good red team hands defenders a timeline of every action mapped to MITRE ATT&CK, showing which techniques were detected, which were missed, and which alerts fired but were ignored. That feeds detection engineering directly: a Sigma-style rule for the BloodHound LDAP storm, an alert on service-account logons from unusual hosts, an LSASS-access detection for credential dumping, tighter segmentation around the finance zone. This is the foundation of purple teaming, where red and blue close each gap on the spot rather than waiting weeks for a report. Run continuously rather than once a year, that loop is where agentic pentesting changes the economics, keeping detections honest between set-piece exercises.

Frequently asked questions

What is the difference between a red team assessment and a penetration test?

A penetration test is coverage-based: it finds and ranks as many vulnerabilities as possible in a defined scope. A red team assessment is goal-based: it emulates a real adversary trying to reach a specific objective without being detected, testing your detection and response rather than your vulnerability count. Pentests run days; red team engagements run weeks to months.

How long does a red team assessment take and what does it cost?

Most engagements run three to twelve weeks, and pricing typically tracks that duration plus the seniority of the operators, so a full red team usually costs several times a comparable pentest. Threat-led engagements under TIBER-EU or CBEST run longer and cost more because they add a separate threat-intelligence phase before any attack begins.

Does a red team assessment use real attacks on production?

Yes, that is the point. A red team operates against live production systems and a real, unaware defending team, because detection and response can only be tested under realistic conditions. The work is governed by tight rules of engagement, a coordinating white team, and pre-agreed safety guardrails so destructive techniques like T1486 stay strictly out of scope.

Do I need a SOC before running a red team assessment?

Practically, yes. A red team measures whether your defenders detect and respond, so without a SOC, EDR, or logging the exercise succeeds unopposed and teaches little. Organizations without detection capability get more value from penetration testing first, then graduate to red teaming once there are real defenses to measure.

What is MITRE ATT&CK's role in a red team assessment?

MITRE ATT&CK is a public knowledge base of real-world attacker tactics and techniques. Red teams map their actions to ATT&CK technique IDs so the report ties each step to a specific behavior the blue team can hunt for, and so the debrief clearly shows which techniques were detected and which slipped through. It also turns coverage into something measurable across engagements.

How is a red team assessment scored if not by findings?

By detection and response metrics: dwell time (time from initial access to first detection), detection rate (share of executed techniques that fired an alert), and mean time to respond (time from detection to containment). The most durable metric is the conversion rate afterward, meaning how many missed techniques became real detections within 30 days.

Sources and references

Akhil Reni

Co-founder and CTO, Strobes

Akhil Reni is co-founder and CTO of Strobes, building AI-driven penetration testing and exposure management for security teams.

Back to Blog

Offensive Security Penetration Testing

What Is a Red Team Assessment? (And How It Differs From Pentesting)

Akhil ReniJanuary 5, 20268 min read

Authors

Akhil Reni

TL;DR

✓A red team assessment is a goal-based attack simulation that emulates a real adversary to test whether your people, process, and technology detect and respond to an intrusion.
✓It is scored by what the SOC saw, not by a count of vulnerabilities: dwell time, detection rate, and mean time to respond are the headline numbers.
✓Operators work under realistic constraints (stealth, OPSEC, a defined threat profile) and map every action to MITRE ATT&CK so each step is a behavior defenders can hunt.
✓Penetration testing answers 'what is broken here?'; red teaming answers 'would we catch a real attacker, and how far would they get before we stopped them?'

Table of contents

What is a red team assessment?
What does a real red team assessment look like?
The most valuable output is the detection-gap report, not the flag
How does a red team assessment differ from a penetration test?
How do you score a red team assessment?
What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?
Why a red team assessment matters for defenders

What is a red team assessment?

The objective is written down before anything starts, as a flag the white team can verify. Defining it precisely is what keeps the engagement honest and safe:

OBJECTIVE  Demonstrate ability to initiate a wire transfer
           from the treasury application (TREASURY-WEB01).
FLAG       Screenshot of payment-initiation screen
           + contents of \\fin-fs01\treasury\flag.txt
OUT OF     Real funds movement, DoS, destruction of
SCOPE      production data, any action on TREASURY-PROD.
WIN        Flag captured, OR red team detected and
CONDITION  ejected before capture (a win for blue).

Why detection, not coverage, is the test

ATT&CK techniques in a typical phishing-to-objective chain (T1566, T1078, T1021, T1003, T1486)

Days

Median dwell time before breaches are detected (Verizon DBIR)

Hours

Time a capable attacker needs to reach the objective once inside

30 days

Window to convert each missed technique into a durable detection

What does a real red team assessment look like?

The clearest way to understand a red team assessment is to watch one unfold against the treasury objective above. Here is a condensed narrative of how the campaign actually runs.

$ # BloodHound shortest-path query result (abridged)
MATCH p=shortestPath((u:User {name:'JDOE@CORP'})
  -[*1..]->(g:Group {name:'DOMAIN ADMINS@CORP'}))

JDOE  --MemberOf-->  IT-SUPPORT
IT-SUPPORT  --GenericAll-->  SVC-BACKUP   <- over-privileged service acct
SVC-BACKUP  --AdminTo-->  FIN-JUMP01     <- finance jump host
FIN-JUMP01  --HasSession-->  treasury operator session

The result that matters is not the capture. It is the silence. Several steps SHOULD have fired an alert and did not.

The most valuable output is the detection-gap report, not the flag

Detection gaps the treasury scenario exposed

Attacker action	ATT&CK ID	Detection that SHOULD fire
Spearphishing attachment opened	T1566.001	Email gateway + macro/child-process EDR alert
Beacon C2 over HTTPS to redirector	T1071.001	Outbound beaconing / JA3 anomaly on proxy logs
BloodHound AD collection	T1087 / T1069	High-volume LDAP query from a workstation
Service-account reuse to pivot	T1078 / T1021	Logon from unusual host for a service identity
Credential dump on jump host	T1003	LSASS handle-access / suspicious process alert

How does a red team assessment differ from a penetration test?

How do you score a red team assessment?

You score a red team by detection and response, not by whether the flag was captured. Three numbers carry the verdict, and each has a concrete formula:

Dwell time = (timestamp of first SOC detection) minus (timestamp of initial access). In the narrative above, initial access was day 8 and the first true detection never came, so dwell time was the entire engagement, the worst possible result.
Detection rate = (ATT&CK techniques that generated an alert) divided by (techniques executed). Five techniques ran (T1566, T1078, T1021, T1003, ride-the-session); if only the phishing email flagged, detection rate is 1 of 5, or 20%.
MTTR = (timestamp of containment) minus (timestamp of first detection). It tests whether your incident-response playbook actually works under pressure, separate from whether you noticed at all.

Strobes insight

What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?

Why a red team assessment matters for defenders

Frequently asked questions

What is the difference between a red team assessment and a penetration test?

How long does a red team assessment take and what does it cost?

Does a red team assessment use real attacks on production?

Do I need a SOC before running a red team assessment?

What is MITRE ATT&CK's role in a red team assessment?

How is a red team assessment scored if not by findings?

Sources and references

Akhil Reni

Co-founder and CTO, Strobes

Akhil Reni is co-founder and CTO of Strobes, building AI-driven penetration testing and exposure management for security teams.

What Is a Red Team Assessment? (And How It Differs From Pentesting)

Table of Contents

Authors

Share

What is a red team assessment?

What does a real red team assessment look like?

The most valuable output is the detection-gap report, not the flag

How does a red team assessment differ from a penetration test?

How do you score a red team assessment?

What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?

Why a red team assessment matters for defenders

Frequently asked questions

Sources and references

What Is a Red Team Assessment? (And How It Differs From Pentesting)

Table of Contents

Authors

Share

What is a red team assessment?

What does a real red team assessment look like?

The most valuable output is the detection-gap report, not the flag

How does a red team assessment differ from a penetration test?

How do you score a red team assessment?

What is threat-led penetration testing (TIBER-EU, CBEST, DORA)?

Why a red team assessment matters for defenders

Frequently asked questions

Sources and references