Security · Topic 12 of 23 · Part II — Catalog of testing methodologies

Red teaming

A goal-driven adversary simulation against the whole organization — measuring not just vulnerabilities, but detection and response.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → Red teaming

Topic 12 · Adversary simulation

What pentest cannot answer

By the end of this topic you can:

Define red teaming and distinguish it sharply from penetration testing
Explain why red teaming exists — what question it answers that pentest cannot
Describe how a red team engagement is scoped, authorized, and run, including the role of trusted agents
Place red team operations within an attacker framework (Cyber Kill Chain, MITRE ATT&CK)
State the legal, ethical, and operational risks specific to red teaming

What red teaming is

Goal-driven — given a specific adversary objective, not a target list
Whole-organization scope — technical, human, physical, supply chain; the team chooses attack paths
Adversarial realism — TTPs derived from a defined threat actor tier, not a generic toolkit
Detection and response in scope — the defenders' ability to see, contain, and respond is what's being measured
Covert by default — only a small trusted-agent group knows the test is running
Long duration — 4–12 weeks of active operation plus planning and reporting

The core question If a real adversary wanted to compromise us, how would they do it, would we notice, and could we stop them?

Contrast Pentest answers: "are exploitable vulnerabilities present?"

Red team vs. pentest — precisely

	Pentest	Red team
Objective	Find exploitable vulnerabilities	Achieve a specific adversary goal
Scope	Defined target (app, network, etc.)	The organization
Duration	Days to weeks	Weeks to months
SOC awareness	Usually open; SOC knows	Usually covert; only trusted agents know
Methods	Defined toolkit, catalog techniques	Actor-matched TTPs; evasion is an explicit goal
Deliverable	Finding list + report	Attack narrative + detection/response evaluation
Frequency	Routine — annually, per release	Occasional — yearly at most, often less
Cost	Moderate	High

Red team is not "pentest with a larger scope" — it is a categorically different methodology.

What red teaming is not

Not "pentest with extra steps" — covert nature, detection evaluation, and goal-driven planning are categorical differences
Not appropriate for every organization — without a SOC or IR capability, the red team wins trivially and the report is expensive but not actionable
Not licence to do anything — covert engagements still operate within authorized scope, legal limits, and ethical constraints; the trusted-agent agreement defines the boundary
Not an annual compliance check — DORA TLPT and TIBER-EU require threat-led testing, but most routine annual pentest requirements are not red team

Common misuse Calling a scoped pentest a "red team" inflates expectations, misleads clients, and misallocates budget.

Threat-led red teaming — the modern standard

The engagement begins with threat intelligence about who actually targets organizations like the client. The red team simulates that actor — not a generic "hacker".

Key frameworks

TIBER-EU — ECB framework for financial infrastructure; threat-intel phase, red team execution, replay, reporting; national variants TIBER-CH, TIBER-DE
CBEST — Bank of England predecessor; influenced TIBER-EU
DORA TLPT — mandatory threat-led penetration testing for certain EU financial entities; in force 2025
iCAST / AASE — Asian-region analogues

Actor tier vocabulary

Tier 1 opportunistic, mass scanning, basic phishing
Tier 2 organized criminal, ransomware-as-a-service
Tier 3 well-resourced APT, sometimes nation-state

Too low a tier is unrealistic; too high tells the client nothing actionable about their actual risk.

The kill-chain mental model

Red team operators plan and report by attacker phases. The Lockheed Martin Cyber Kill Chain provides the coarse skeleton — red teams plan backward from phase 7.

1Reconnaissance
OSINT, attack surface, target ID

2Weaponization
Tools, payloads, infrastructure

3Delivery
Initial access: phishing, supply-chain, exposed services

4Exploitation
Code execution or access

5Installation
Persistence on foothold

6C2
Operator communication channel

7Actions on Objectives
Exfiltration, disruption, escalation

MITRE ATT&CK (Topic 18) replaces this with a far more granular technique catalogue — but the seven phases remain a useful planning and reporting skeleton.

Team composition and tradecraft

Roles

Operators — technical core; conduct offensive operations
Social engineers — design and execute SE phases
Threat intel analysts — characterize the simulated actor; keep TTPs current
Infrastructure engineers — C2, phishing domains, redirectors, payloads
Tool developers — custom tooling for specific defender environments
Engagement lead — single point of contact with the client trusted agent

Operational security (opsec)

C2 infrastructure — domain fronting, redirectors, evasion of detection signatures
Payload engineering — AV/EDR bypass, LOLBins, custom loaders
Pacing — no noisy mass scans; gradual, timed escalation
Identity hygiene — separate identities and accounts per operation
Cleanup — documented artefact inventory; removal after engagement

Common C2: Cobalt Strike, Sliver, Mythic. BloodHound + SharpHound for AD; Impacket for Windows protocols.

Trusted agents and authorization

Because the engagement is covert, the trusted-agent agreement carries unusually heavy weight:

Typically: CISO, head of incident response, and a senior executive sponsor
They know the engagement is running, control its boundaries, and serve as the "panic button"
Explicit duty to not interfere with normal detection and response — if the SOC investigates the red team as an incident, the trusted agent must not tip them off
Must be able to stop the engagement at any moment via an out-of-band channel (phone that works even if email is compromised)

Legal weight Contracts must explicitly authorize each technique class: phishing employees, social engineering, physical access attempts, specific tooling. Generic authorization is not sufficient.

Detection and response — the actual measurement

The deliverable is two parallel narratives, aligned in time:

Attacker narrative What the red team did, what worked, what failed, what they learned about the environment.

Defender narrative What the SOC and IR team saw, when, what they did, what they missed — reconstructed after the engagement from logs and tickets, without telling defenders what to expect.

Key metrics

Detection coverage — percentage of red team actions detected at all
Time to detect — from action to first defender awareness
Time to respond — from detection to containment
Mean time to compromise — from initial access to objective achieved
Failure points — controls expected to detect that did not

The replay / purple-team phase

After the covert phase, the engagement closes with a replay (TIBER vocabulary) or purple-team session:

1Red team walks blue team through every action — chronologically, in technical detail

2Selected actions are re-run while blue team watches, to identify exactly what was and was not visible

3Blue team leaves with a concrete improvement list: specific detections to build, logs to add, runbooks to update

Why this matters A red team report without a replay teaches abstractly. The replay turns every red team action into a concrete, testable candidate detection. This is where the engagement's defensive value lands.

Adjacent practices

Practice	How it differs from red team	Best fit
Purple teaming	Red and blue work together from the start; collaborative, no covert phase	Orgs not yet ready for full red team; faster, cheaper, more directly actionable
Adversary emulation	Scripted scenarios mapped to specific actor profiles (CALDERA, ATT&CK Evaluations)	Between purple and red team; systematic TTP coverage
Tabletop exercise	No technical execution; IR team talks through scenarios verbally	Decision-making practice; cheapest; no detection coverage
Cyber range	Both sides operate in a realistic but artificial environment	Training-focused; no production risk

A mature program uses several of these in rotation. None of them is "the answer".

When to commission — and the ethical risks

Preconditions

Mature vulnerability management and pentest history
Functioning SOC and IR capability
Executive sponsorship including legal counsel
Budget for months of expert time
Appetite for hearing bad news

Without these, spend on more pentests, a purple-team exercise, or foundational detection engineering.

Specific risks

Employees tested without consent — debrief required; no shaming
Real incidents — operation can crash systems or be misidentified as a live attack causing genuine disruption
Legal accumulation — each technique class must be explicitly authorized
Defender burnout — repeated covert tests demoralize teams; long-running programs must manage this
Public discovery — physical phase caught on camera, police called; plan the cover story in advance

Check — methodology fit

Reflection

A small SME with no SOC asks for a "red team test". What is your recommendation — and why?

Reveal answer

Push back politely. A red team measures detection and response — an SME with no SOC or IR capability will get a report saying "you were compromised within hours and detected nothing", which is expensive and produces nothing actionable. Recommend instead: a focused pentest, a purple-team exercise, and foundational investment in prevention (MFA, EDR, patch management).

What you take home

Red teaming is goal-driven, covert, whole-organization adversary simulation — not pentest with a larger scope
The defining difference: detection and response are in scope; most defenders do not know the test is running
Threat-led framing anchors the simulated actor to real intelligence about who targets organizations like the client
Trusted agents hold the authorization boundary and the panic button; their duty not to interfere is explicit and binding
The deliverable is two parallel narratives (attacker + defender); the replay is where defensive improvement actually lands
Red team requires SOC, IR, and pentest maturity — organizations without these should start elsewhere
Purple teaming, adversary emulation, and tabletop exercises serve different maturity levels and goals

Next: Topic 13 — Blue teaming. From offensive simulation back to the defender's side — how blue teams are organized, what they measure, and how red team output drives detection engineering.

END · TOPIC 12

Simulate the threat. Measure the response.

Before next session: identify an organization you know — what would its defender narrative look like if a red team operated against it today?