Security · Topic 12 of 23 · Part II — Catalog of testing methodologies

Red teaming

A goal-driven adversary simulation against the whole organization — measuring not just vulnerabilities, but detection and response.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → Red teaming
Topic 12 · Adversary simulation

What pentest cannot answer

By the end of this topic you can:
  • Define red teaming and distinguish it sharply from penetration testing
  • Explain why red teaming exists — what question it answers that pentest cannot
  • Describe how a red team engagement is scoped, authorized, and run, including the role of trusted agents
  • Place red team operations within an attacker framework (Cyber Kill Chain, MITRE ATT&CK)
  • State the legal, ethical, and operational risks specific to red teaming

What red teaming is

  • Goal-driven — given a specific adversary objective, not a target list
  • Whole-organization scope — technical, human, physical, supply chain; the team chooses attack paths
  • Adversarial realism — TTPs derived from a defined threat actor tier, not a generic toolkit
  • Detection and response in scope — the defenders' ability to see, contain, and respond is what's being measured
  • Covert by default — only a small trusted-agent group knows the test is running
  • Long duration — 4–12 weeks of active operation plus planning and reporting
The core question If a real adversary wanted to compromise us, how would they do it, would we notice, and could we stop them?
Contrast Pentest answers: "are exploitable vulnerabilities present?"

Red team vs. pentest — precisely

PentestRed team
ObjectiveFind exploitable vulnerabilitiesAchieve a specific adversary goal
ScopeDefined target (app, network, etc.)The organization
DurationDays to weeksWeeks to months
SOC awarenessUsually open; SOC knowsUsually covert; only trusted agents know
MethodsDefined toolkit, catalog techniquesActor-matched TTPs; evasion is an explicit goal
DeliverableFinding list + reportAttack narrative + detection/response evaluation
FrequencyRoutine — annually, per releaseOccasional — yearly at most, often less
CostModerateHigh

Red team is not "pentest with a larger scope" — it is a categorically different methodology.

What red teaming is not

  • Not "pentest with extra steps" — covert nature, detection evaluation, and goal-driven planning are categorical differences
  • Not appropriate for every organization — without a SOC or IR capability, the red team wins trivially and the report is expensive but not actionable
  • Not licence to do anything — covert engagements still operate within authorized scope, legal limits, and ethical constraints; the trusted-agent agreement defines the boundary
  • Not an annual compliance check — DORA TLPT and TIBER-EU require threat-led testing, but most routine annual pentest requirements are not red team
Common misuse Calling a scoped pentest a "red team" inflates expectations, misleads clients, and misallocates budget.

Threat-led red teaming — the modern standard

The engagement begins with threat intelligence about who actually targets organizations like the client. The red team simulates that actor — not a generic "hacker".

Key frameworks

  • TIBER-EU — ECB framework for financial infrastructure; threat-intel phase, red team execution, replay, reporting; national variants TIBER-CH, TIBER-DE
  • CBEST — Bank of England predecessor; influenced TIBER-EU
  • DORA TLPT — mandatory threat-led penetration testing for certain EU financial entities; in force 2025
  • iCAST / AASE — Asian-region analogues
Actor tier vocabulary
  • Tier 1 opportunistic, mass scanning, basic phishing
  • Tier 2 organized criminal, ransomware-as-a-service
  • Tier 3 well-resourced APT, sometimes nation-state

Too low a tier is unrealistic; too high tells the client nothing actionable about their actual risk.

The kill-chain mental model

Red team operators plan and report by attacker phases. The Lockheed Martin Cyber Kill Chain provides the coarse skeleton — red teams plan backward from phase 7.

1Reconnaissance
OSINT, attack surface, target ID
2Weaponization
Tools, payloads, infrastructure
3Delivery
Initial access: phishing, supply-chain, exposed services
4Exploitation
Code execution or access
5Installation
Persistence on foothold
6C2
Operator communication channel
7Actions on Objectives
Exfiltration, disruption, escalation

MITRE ATT&CK (Topic 18) replaces this with a far more granular technique catalogue — but the seven phases remain a useful planning and reporting skeleton.

Team composition and tradecraft

Roles

  • Operators — technical core; conduct offensive operations
  • Social engineers — design and execute SE phases
  • Threat intel analysts — characterize the simulated actor; keep TTPs current
  • Infrastructure engineers — C2, phishing domains, redirectors, payloads
  • Tool developers — custom tooling for specific defender environments
  • Engagement lead — single point of contact with the client trusted agent

Operational security (opsec)

  • C2 infrastructure — domain fronting, redirectors, evasion of detection signatures
  • Payload engineering — AV/EDR bypass, LOLBins, custom loaders
  • Pacing — no noisy mass scans; gradual, timed escalation
  • Identity hygiene — separate identities and accounts per operation
  • Cleanup — documented artefact inventory; removal after engagement

Common C2: Cobalt Strike, Sliver, Mythic. BloodHound + SharpHound for AD; Impacket for Windows protocols.

Trusted agents and authorization

Because the engagement is covert, the trusted-agent agreement carries unusually heavy weight:

  • Typically: CISO, head of incident response, and a senior executive sponsor
  • They know the engagement is running, control its boundaries, and serve as the "panic button"
  • Explicit duty to not interfere with normal detection and response — if the SOC investigates the red team as an incident, the trusted agent must not tip them off
  • Must be able to stop the engagement at any moment via an out-of-band channel (phone that works even if email is compromised)
Legal weight Contracts must explicitly authorize each technique class: phishing employees, social engineering, physical access attempts, specific tooling. Generic authorization is not sufficient.

Detection and response — the actual measurement

The deliverable is two parallel narratives, aligned in time:

Attacker narrative What the red team did, what worked, what failed, what they learned about the environment.
Defender narrative What the SOC and IR team saw, when, what they did, what they missed — reconstructed after the engagement from logs and tickets, without telling defenders what to expect.

Key metrics

  • Detection coverage — percentage of red team actions detected at all
  • Time to detect — from action to first defender awareness
  • Time to respond — from detection to containment
  • Mean time to compromise — from initial access to objective achieved
  • Failure points — controls expected to detect that did not

The replay / purple-team phase

After the covert phase, the engagement closes with a replay (TIBER vocabulary) or purple-team session:

1Red team walks blue team through every action — chronologically, in technical detail
2Selected actions are re-run while blue team watches, to identify exactly what was and was not visible
3Blue team leaves with a concrete improvement list: specific detections to build, logs to add, runbooks to update
Why this matters A red team report without a replay teaches abstractly. The replay turns every red team action into a concrete, testable candidate detection. This is where the engagement's defensive value lands.

Adjacent practices

PracticeHow it differs from red teamBest fit
Purple teamingRed and blue work together from the start; collaborative, no covert phaseOrgs not yet ready for full red team; faster, cheaper, more directly actionable
Adversary emulationScripted scenarios mapped to specific actor profiles (CALDERA, ATT&CK Evaluations)Between purple and red team; systematic TTP coverage
Tabletop exerciseNo technical execution; IR team talks through scenarios verballyDecision-making practice; cheapest; no detection coverage
Cyber rangeBoth sides operate in a realistic but artificial environmentTraining-focused; no production risk

A mature program uses several of these in rotation. None of them is "the answer".

When to commission — and the ethical risks

Preconditions

  • Mature vulnerability management and pentest history
  • Functioning SOC and IR capability
  • Executive sponsorship including legal counsel
  • Budget for months of expert time
  • Appetite for hearing bad news

Without these, spend on more pentests, a purple-team exercise, or foundational detection engineering.

Specific risks

  • Employees tested without consent — debrief required; no shaming
  • Real incidents — operation can crash systems or be misidentified as a live attack causing genuine disruption
  • Legal accumulation — each technique class must be explicitly authorized
  • Defender burnout — repeated covert tests demoralize teams; long-running programs must manage this
  • Public discovery — physical phase caught on camera, police called; plan the cover story in advance

Check — methodology fit

Reflection

A small SME with no SOC asks for a "red team test". What is your recommendation — and why?

Reveal answer

Push back politely. A red team measures detection and response — an SME with no SOC or IR capability will get a report saying "you were compromised within hours and detected nothing", which is expensive and produces nothing actionable. Recommend instead: a focused pentest, a purple-team exercise, and foundational investment in prevention (MFA, EDR, patch management).

What you take home

  • Red teaming is goal-driven, covert, whole-organization adversary simulation — not pentest with a larger scope
  • The defining difference: detection and response are in scope; most defenders do not know the test is running
  • Threat-led framing anchors the simulated actor to real intelligence about who targets organizations like the client
  • Trusted agents hold the authorization boundary and the panic button; their duty not to interfere is explicit and binding
  • The deliverable is two parallel narratives (attacker + defender); the replay is where defensive improvement actually lands
  • Red team requires SOC, IR, and pentest maturity — organizations without these should start elsewhere
  • Purple teaming, adversary emulation, and tabletop exercises serve different maturity levels and goals

Next: Topic 13 — Blue teaming. From offensive simulation back to the defender's side — how blue teams are organized, what they measure, and how red team output drives detection engineering.

END · TOPIC 12

Simulate the threat. Measure the response.

Before next session: identify an organization you know — what would its defender narrative look like if a red team operated against it today?