Security · Topic 19 of 23 · Part IV — From findings to value

Evidence & traceability

A finding is a claim; evidence is what turns that claim into something a developer, an executive, and a lawyer can all act on.

§ Documenting and Gathering Test Evidence (5) → Traceability of test activities

Topic 19 · Evidence discipline

The work is only as good as what you captured

By the end of this topic you can:

define what counts as evidence in a security engagement and why each property matters
maintain a traceable record sufficient for client trust, report credibility, and legal defensibility
capture per-finding evidence in a form a developer can reproduce and a report can present
apply evidence discipline consistently across pentest, OSINT, social engineering, scanning, and manual review
recognise the habits that make evidence cheap to gather as work happens, vs. expensive to reconstruct later

What evidence is, formally

Evidence is the documented record of what was observed, what was done, when, by whom, with what tools, and with what result.

Desirable properties

Timestamped — every artefact has a date/time from an explicit, ideally agreed source
Reproducible — a developer or another tester can follow the steps and see substantially the same result
Specific — names actual hosts, URLs, parameters, payloads, accounts
Complete enough — the finding stands without "see notes elsewhere"

Minimal — sensitive data appears only to the extent required; the rest is redacted
Sourced — which command produced which output, which screenshot at what moment
Tamper-resistant where it matters — hashes, write-once storage, signed records for high-stakes engagements

Core principle A finding without evidence is an opinion. A finding with evidence is a defensible claim.

Layer 1 — The activity log

A continuous, exhaustive, unedited record of what the tester did across the entire engagement.

What it records

Commands run — timestamps, source host
Tool invocations and their parameters
Hosts and URLs touched, with methods and times
Authentication events (account, time, location)
Significant decisions ("stopped scanning X because Y")
Client communications ("contact Z confirmed additional scope")

Practical mechanisms

script / asciinema — terminal session capture
Start-Transcript — PowerShell equivalent
Burp / ZAP — full HTTP traffic logs
Engagement note platform (Notion, Obsidian, Joplin, AttackForge, Faraday)

This log is the tester's defence in any scope-dispute or after-the-fact question. It is often boring; it is essential.

Layer 2 — The per-finding artefact

A focused, curated set of artefacts demonstrating a single finding — this is what ends up in the report.

1Pre-state
The configuration or behaviour observed before the tester acted

2Action
The exact request, command, payload, or steps taken

3Observable effect
The response, output, screenshot, or file contents produced

4Interpretation
Therefore X is true about the system — and what it lets an attacker do

The key distinction The activity log is exhaustive and unedited; the finding artefact is selective and presented. Both are needed; neither replaces the other.

Evidence shapes by methodology

Penetration testing

Pre-state: banner, configuration, observed behaviour
Exact request, command, or payload
Response, output, screenshot
Impact demonstration: what this access enables
ATT&CK technique mapping (Topic 18)

Vulnerability scanning

Scan config: tool, version, profile, target list, time window
Raw output in original format (.nessus, XML, JSON)
Triage notes: verified / false-positive / uninvestigated

OSINT

Source of each observation (URL, archive snapshot, CT log entry, query)
Captured content (screenshot, archived HTML)
Date observed — content changes over time
Linkage to the engagement objective

Manual review

File, line range, commit/version reviewed
Relevant code snippet with sufficient context
Reasoning trace: input X → component Y → sink Z

The shape of evidence changes by methodology; the obligation to capture it does not.

Discipline: capture as you go

The single most important habit: capture before you move on.

Common failure modes

"I'll screenshot when I find the next thing" — by then the previous evidence is gone
"I'll write from memory tonight" — four hours of testing have layered over it
"The terminal log captured it" — but you piped through less
"I'll re-run it for the screenshot" — the system has now changed

Cheap habits that work

Screenshot every meaningful state change — disk is free
Save tool output to file, even if you also read it on screen
Maintain a running findings document, updated continuously
Save HTTP requests and responses in the proxy
At end of each work session, write a brief progress note — five minutes, high payoff

Handling sensitive data in evidence

Evidence routinely contains material that is itself sensitive: personal data, credentials, customer content, client environment logs.

In the report

Redact what is not needed for the finding to land
Customer names → "Customer A"; real emails → user@example.com; SSNs → "XXX-XX-XXXX"
Minimum collection: one header row + one record proves a table is exposed

In the repository

Keep originals separately, encrypted, time-limited
Destroy at engagement end — per DPA (Topic 05)
Issue a certificate of destruction to the client

Critical The evidence repository, in aggregate, is more dangerous than any individual finding. Treat it accordingly.

Evidence storage and chain of custody

Standard engagement storage

Single working directory (or repository) per engagement
Subdirs: raw/, screenshots/, notes/, findings/, report/
Encrypted at rest — full-disk encryption is the floor
Backed up — losing evidence on day 9 of 10 is a real risk
Versioned where possible; access-controlled
Retention policy aligned to the DPA

Chain of custody (high-stakes)

Applies to regulator-facing or potentially adversarial engagements.

SHA-256+ hashes recorded for major artefacts at capture time
Records of who handled artefacts and when, and for what purpose
WORM or cryptographically signed storage
Independent senior reviewer countersigns key claims

Most pentest engagements do not reach this threshold. Red team engagements supporting regulatory testing sometimes do.

Reproducibility and evidence in the report

Reproducibility test

Can the client's developer, given only the finding's evidence, recreate the behaviour?

Common gaps

Exact request paraphrased instead of quoted
Pre-conditions implicit ("must be logged in as role Y")
System state undocumented ("works on a freshly created cart")
Timing-dependent findings with no reproduction instructions
Payload "redacted for sensitivity" — no template version provided

Evidence in the report

Show, don't claim — a screenshot of the admin page beats asserting access was achieved
Sanitize — real credentials and personal data do not appear
Frame — surround every artefact with interpretation
Selectivity — three chosen screenshots beat ten generic ones
Captions — every figure conveys what it shows without forcing re-reads

Common evidence anti-patterns

Anti-pattern	What is missing
Screenshot of a terminal window with no context	Which command? Which host? When?
Pasted text not distinguishing input from output	Clear request/response formatting
Finding asserted without evidence	Which endpoint, which payload, which response?
Raw tool-output dump	Analysis and triage — not the same as evidence
`whoami → root` screenshot only	Path to root, exploitation steps, pre-state, impact
Inconsistent timestamps across artefacts	Single timezone throughout; UTC is the conservative default
Tester credentials visible in screenshots	Redact VPN profile, tester IP, tester account name

Tools that support evidence discipline

Capture tools

script / asciinema — terminal session with full replay
PowerShell Start-Transcript — Windows equivalent
Burp Suite project files — full HTTP history, notes, scans
OWASP ZAP sessions
OS screenshot tools with timestamps in filename

Management platforms

AttackForge, Faraday, Plextrac, Dradis, Serpico — engagement management with built-in finding/evidence workflows
CherryTree, Obsidian, Notion, Joplin, OneNote — note-taking; engagement-specific instance

Shared property Evidence captured automatically as a side effect of working — not as a separate manual step.

Check — per-finding evidence completeness

Reflection

A finding's evidence consists of one screenshot showing a whoami returning root. What is missing for this to be useful as a client deliverable?

Reveal answer

Missing: the path to root (foothold, privilege-escalation steps); the exact commands run; the pre-state (configuration that enabled it); the impact (what root access exposes); and reproduction steps so a developer can verify the fix. A whoami screenshot is the end-state proof-of-concept — the finding needs the chain that led there.

What you take home

Evidence is the substrate of engagement value — findings are claims; evidence makes them defensible
Two layers always required: the exhaustive activity log, and the curated per-finding artefact
Evidence shape varies by methodology; the obligation to capture it does not
Capture as you go — reconstruction at engagement end is always worse than contemporaneous notes
The evidence repository is itself sensitive: apply encryption, access control, retention, and destruction discipline
Chain of custody applies in regulator-facing and potentially adversarial contexts — know the threshold
Reproducibility is the test: can a developer follow your evidence alone and recreate the behaviour?

Next: Topic 20 — Vulnerability scoring. How findings are rated, compared, and communicated using CVSS and related frameworks.

END · TOPIC 19

Capture it now or lose it.

Before your next session: set up terminal logging and a running findings document — make evidence capture automatic from day one.