Security · Topic 19 of 23 · Part IV — From findings to value

Evidence & traceability

A finding is a claim; evidence is what turns that claim into something a developer, an executive, and a lawyer can all act on.

§ Documenting and Gathering Test Evidence (5) → Traceability of test activities
Topic 19 · Evidence discipline

The work is only as good as what you captured

By the end of this topic you can:
  • define what counts as evidence in a security engagement and why each property matters
  • maintain a traceable record sufficient for client trust, report credibility, and legal defensibility
  • capture per-finding evidence in a form a developer can reproduce and a report can present
  • apply evidence discipline consistently across pentest, OSINT, social engineering, scanning, and manual review
  • recognise the habits that make evidence cheap to gather as work happens, vs. expensive to reconstruct later

What evidence is, formally

Evidence is the documented record of what was observed, what was done, when, by whom, with what tools, and with what result.

Desirable properties

  • Timestamped — every artefact has a date/time from an explicit, ideally agreed source
  • Reproducible — a developer or another tester can follow the steps and see substantially the same result
  • Specific — names actual hosts, URLs, parameters, payloads, accounts
  • Complete enough — the finding stands without "see notes elsewhere"

 

  • Minimal — sensitive data appears only to the extent required; the rest is redacted
  • Sourced — which command produced which output, which screenshot at what moment
  • Tamper-resistant where it matters — hashes, write-once storage, signed records for high-stakes engagements
Core principle A finding without evidence is an opinion. A finding with evidence is a defensible claim.

Layer 1 — The activity log

A continuous, exhaustive, unedited record of what the tester did across the entire engagement.

What it records

  • Commands run — timestamps, source host
  • Tool invocations and their parameters
  • Hosts and URLs touched, with methods and times
  • Authentication events (account, time, location)
  • Significant decisions ("stopped scanning X because Y")
  • Client communications ("contact Z confirmed additional scope")

Practical mechanisms

  • script / asciinema — terminal session capture
  • Start-Transcript — PowerShell equivalent
  • Burp / ZAP — full HTTP traffic logs
  • Engagement note platform (Notion, Obsidian, Joplin, AttackForge, Faraday)

This log is the tester's defence in any scope-dispute or after-the-fact question. It is often boring; it is essential.

Layer 2 — The per-finding artefact

A focused, curated set of artefacts demonstrating a single finding — this is what ends up in the report.

1Pre-state
The configuration or behaviour observed before the tester acted
2Action
The exact request, command, payload, or steps taken
3Observable effect
The response, output, screenshot, or file contents produced
4Interpretation
Therefore X is true about the system — and what it lets an attacker do
The key distinction The activity log is exhaustive and unedited; the finding artefact is selective and presented. Both are needed; neither replaces the other.

Evidence shapes by methodology

Penetration testing

  • Pre-state: banner, configuration, observed behaviour
  • Exact request, command, or payload
  • Response, output, screenshot
  • Impact demonstration: what this access enables
  • ATT&CK technique mapping (Topic 18)

Vulnerability scanning

  • Scan config: tool, version, profile, target list, time window
  • Raw output in original format (.nessus, XML, JSON)
  • Triage notes: verified / false-positive / uninvestigated

OSINT

  • Source of each observation (URL, archive snapshot, CT log entry, query)
  • Captured content (screenshot, archived HTML)
  • Date observed — content changes over time
  • Linkage to the engagement objective

Manual review

  • File, line range, commit/version reviewed
  • Relevant code snippet with sufficient context
  • Reasoning trace: input X → component Y → sink Z

The shape of evidence changes by methodology; the obligation to capture it does not.

Discipline: capture as you go

The single most important habit: capture before you move on.

Common failure modes

  • "I'll screenshot when I find the next thing" — by then the previous evidence is gone
  • "I'll write from memory tonight" — four hours of testing have layered over it
  • "The terminal log captured it" — but you piped through less
  • "I'll re-run it for the screenshot" — the system has now changed

Cheap habits that work

  • Screenshot every meaningful state change — disk is free
  • Save tool output to file, even if you also read it on screen
  • Maintain a running findings document, updated continuously
  • Save HTTP requests and responses in the proxy
  • At end of each work session, write a brief progress note — five minutes, high payoff

Handling sensitive data in evidence

Evidence routinely contains material that is itself sensitive: personal data, credentials, customer content, client environment logs.

In the report

  • Redact what is not needed for the finding to land
  • Customer names → "Customer A"; real emails → user@example.com; SSNs → "XXX-XX-XXXX"
  • Minimum collection: one header row + one record proves a table is exposed

In the repository

  • Keep originals separately, encrypted, time-limited
  • Destroy at engagement end — per DPA (Topic 05)
  • Issue a certificate of destruction to the client
Critical The evidence repository, in aggregate, is more dangerous than any individual finding. Treat it accordingly.

Evidence storage and chain of custody

Standard engagement storage

  • Single working directory (or repository) per engagement
  • Subdirs: raw/, screenshots/, notes/, findings/, report/
  • Encrypted at rest — full-disk encryption is the floor
  • Backed up — losing evidence on day 9 of 10 is a real risk
  • Versioned where possible; access-controlled
  • Retention policy aligned to the DPA

Chain of custody (high-stakes)

Applies to regulator-facing or potentially adversarial engagements.

  • SHA-256+ hashes recorded for major artefacts at capture time
  • Records of who handled artefacts and when, and for what purpose
  • WORM or cryptographically signed storage
  • Independent senior reviewer countersigns key claims

Most pentest engagements do not reach this threshold. Red team engagements supporting regulatory testing sometimes do.

Reproducibility and evidence in the report

Reproducibility test

Can the client's developer, given only the finding's evidence, recreate the behaviour?

Common gaps

  • Exact request paraphrased instead of quoted
  • Pre-conditions implicit ("must be logged in as role Y")
  • System state undocumented ("works on a freshly created cart")
  • Timing-dependent findings with no reproduction instructions
  • Payload "redacted for sensitivity" — no template version provided

Evidence in the report

  • Show, don't claim — a screenshot of the admin page beats asserting access was achieved
  • Sanitize — real credentials and personal data do not appear
  • Frame — surround every artefact with interpretation
  • Selectivity — three chosen screenshots beat ten generic ones
  • Captions — every figure conveys what it shows without forcing re-reads

Common evidence anti-patterns

Anti-patternWhat is missing
Screenshot of a terminal window with no contextWhich command? Which host? When?
Pasted text not distinguishing input from outputClear request/response formatting
Finding asserted without evidenceWhich endpoint, which payload, which response?
Raw tool-output dumpAnalysis and triage — not the same as evidence
whoami → root screenshot onlyPath to root, exploitation steps, pre-state, impact
Inconsistent timestamps across artefactsSingle timezone throughout; UTC is the conservative default
Tester credentials visible in screenshotsRedact VPN profile, tester IP, tester account name

Tools that support evidence discipline

Capture tools

  • script / asciinema — terminal session with full replay
  • PowerShell Start-Transcript — Windows equivalent
  • Burp Suite project files — full HTTP history, notes, scans
  • OWASP ZAP sessions
  • OS screenshot tools with timestamps in filename

Management platforms

  • AttackForge, Faraday, Plextrac, Dradis, Serpico — engagement management with built-in finding/evidence workflows
  • CherryTree, Obsidian, Notion, Joplin, OneNote — note-taking; engagement-specific instance
Shared property Evidence captured automatically as a side effect of working — not as a separate manual step.

Check — per-finding evidence completeness

Reflection

A finding's evidence consists of one screenshot showing a whoami returning root. What is missing for this to be useful as a client deliverable?

Reveal answer

Missing: the path to root (foothold, privilege-escalation steps); the exact commands run; the pre-state (configuration that enabled it); the impact (what root access exposes); and reproduction steps so a developer can verify the fix. A whoami screenshot is the end-state proof-of-concept — the finding needs the chain that led there.

What you take home

  • Evidence is the substrate of engagement value — findings are claims; evidence makes them defensible
  • Two layers always required: the exhaustive activity log, and the curated per-finding artefact
  • Evidence shape varies by methodology; the obligation to capture it does not
  • Capture as you go — reconstruction at engagement end is always worse than contemporaneous notes
  • The evidence repository is itself sensitive: apply encryption, access control, retention, and destruction discipline
  • Chain of custody applies in regulator-facing and potentially adversarial contexts — know the threshold
  • Reproducibility is the test: can a developer follow your evidence alone and recreate the behaviour?

Next: Topic 20 — Vulnerability scoring. How findings are rated, compared, and communicated using CVSS and related frameworks.

END · TOPIC 19

Capture it now or lose it.

Before your next session: set up terminal logging and a running findings document — make evidence capture automatic from day one.