Security · Topic 10 of 23 · Part II — Catalog of testing methodologies

Manual review

A skilled human inspects the artefact itself — source code, configuration, architecture, policy — and reasons about its security. The finding comes from judgement, not a pattern database.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → Manual review
Topic 10 · Human judgement

What tools cannot do

By the end of this topic you can:
  • Define manual review and contrast it with automated scanning
  • Identify vulnerabilities and weaknesses that only manual review finds reliably
  • Plan and execute a focused review of code, configuration, architecture, or process documentation
  • Use structured techniques — threat modeling, checklist, abuse-case analysis — to direct review effort
  • Configure, run, and triage SAST output without being overwhelmed by false positives
  • Combine SAST and DAST output with manual review without becoming dependent on tool output

What manual review is

Systematic, human-driven inspection of an artefact for security weaknesses. The artefact varies; the discipline is the same.

Code
Reading the implementation for vulnerabilities
Config
Web server, DB, OS, identity, network settings
Architecture
Components, trust boundaries, and data flows
IaC
Terraform, CloudFormation, Helm, Kubernetes manifests
Policy
IR playbooks, vendor management, patch SLAs
Key point Tools may assist with each category, but the interpretation and the finding of what tools missed is human.

What manual review finds that scanners do not

A scanner finds known patterns. Manual review finds:

  • Logic flaws — e.g., passing amount=-100 increases a balance
  • Authorization bugs — IDOR, privilege escalation across workflows
  • Race conditions — two simultaneous requests bypass a spending limit
  • Insecure design — client trusted to validate price
  • Crypto misuse — ECB mode, no IV, MAC-then-encrypt, custom protocols
  • Trust boundary errors — untrusted data crosses to trusted context without validation
  • Missing controls — no rate limit, no log, no validation (finding absence)
  • Dependency interactions — individually safe libraries combined unsafely
  • Compliance gaps — logs exist but omit required fields
  • Process weaknesses — vendor checklist skips security review

Common property: every one of these requires understanding what the system is supposed to do.

Manual review and the SDLC

Review is most powerful when it happens early. Each stage finds different things; none substitutes for another.

1Design review — before code is written; cheapest place to fix; typically threat modeling
2Code review — during development, integrated with pull-request workflows
3Pre-deployment review — before changes go live; catch what slipped through
4Periodic review — scheduled or triggered by major changes; existing systems
Shift left A design flaw found at stage 1 costs orders of magnitude less to fix than the same flaw found post-deployment.

Threat modeling — structured design-stage review

Systematically reason about how a system could be attacked, given its design. Based on Adam Shostack's Four Question Framework:

1What are we working on? Decompose into components; map trust boundaries and data flows
2What can go wrong? Enumerate threats — STRIDE, PASTA, LINDDUN
3What are we going to do? For each threat: mitigate, accept, or investigate further
4Did we do a good job? Review the model itself for completeness
Output Data-flow diagram + threat list + decisions — a structured artefact the team can act on, share, and revisit when the system changes.

Code review — structured approaches

Techniques

  • Top-down by data flow — follow untrusted input from entry points to sinks; the dominant technique
  • Top-down by feature — start at a feature (auth, payment, file upload) and read everything related
  • Pattern hunting — locate dangerous APIs (exec, eval, weak crypto), read surrounding context
  • Checklist — walk OWASP ASVS systematically; floor, not ceiling
  • Bottom-up — read code as written; useful only for small modules

Common workflow

Combine Pattern-hunt (or run SAST) to identify hot spots, then apply top-down data-flow review to confirm whether each is a real vulnerability in context.
Language matters Review depth depends on reviewer fluency in the language and framework. Scope honestly.

Configuration and IaC review

Configuration review

Inspect the settings an operator chose, not the code itself:

  • Web server (Apache, Nginx, IIS)
  • Application frameworks (Django, Spring, Express)
  • Database (pg_hba.conf, MongoDB binding)
  • OS hardening — sysctl, AppArmor, sudoers
  • Identity provider — SSO, MFA policy, conditional access
  • Network — firewall rulesets, ACLs, segmentation

Compare against a baseline: CIS Benchmark, vendor hardening guide, or organizational standard.

Infrastructure-as-code review

Read the Terraform / CloudFormation / Helm / Ansible source rather than the deployed result.

Advantages Findings fixed in code; catches drift; scales across many environments built from the same modules.

Tools: tfsec, Checkov, KICS, OPA / Conftest. They handle easy cases; humans handle architecture and design.

Architecture and policy review

Architecture review

Read diagrams and design documents to find:

  • Trust boundaries placed wrong — browser trusted to validate
  • Single points of failure for security controls
  • Sensitive data crossing networks or jurisdictions unprotected
  • Defence-in-depth gaps — one layer between attacker and crown jewels
  • Operational weaknesses — no detection path for attacks on component X

Benefits enormously from interactive sessions with system designers.

Policy and process review

Read the documents:

  • IR runbooks — contacts current, steps complete, dependencies named?
  • Vendor risk management — when reviewed, by whom, to what depth?
  • Patch management — SLAs per severity class; realistic?
  • Access management — joiners/leavers/movers, periodic recertification
High impact Clients can change a process in weeks where they cannot rewrite a system in years.

Combining manual and automated

The mature stance: automated triages, human decides. Neither replaces the other.

1Run SAST — produce a long list with many false positives
2Filter by category — start with injection, deserialization, command execution
3Triage each candidate — read surrounding code; is it real, exploitable, worth reporting?
4Review independently — entry points and sensitive flows that SAST may have missed
5Combine — report contains verified SAST findings plus human-found issues

The same shape applies to CSPM + manual cloud review, SCA + manual dependency analysis, and web scanner + manual application test.

SAST — how tools work and which to choose

How SAST works

  • Pattern matching — regex / AST rules for dangerous API calls
  • Data-flow analysis — tracks source-to-sink across multiple lines
  • Type and scope analysis — reduces false positives
  • Configuration — rules enabled/disabled; sensitivity adjusted; scope limited
Treat as A pattern matcher with a high false-positive rate, not a security oracle.

Tool landscape

  • SonarQube — enterprise standard, comprehensive, noisier
  • Semgrep — rule-as-code, lower false-positive rate, customizable
  • CodeQL — powerful query language, very precise, steep learning curve
  • Checkmarx / Veracode / Fortify — commercial; deep data-flow; expensive
  • GitHub Advanced Security — integrates Semgrep + CodeQL natively
  • Language-specific — Bandit (Python), Gosec (Go), SpotBugs (Java)

Running and triaging SAST effectively

Running SAST

  • Start with default config; understand the output before tuning
  • Baseline on existing code; track new findings per commit
  • Tune incrementally — disable noisy rules with a comment explaining why
  • Integrate into CI on every pull request, not just periodically
  • Gate only on high-confidence findings; use informational for the rest

Triaging output

  1. Filter by severity — critical / high first
  2. Filter by category — injection, hardcoded secrets, unsafe deserialization first
  3. Read code around the finding — 10–20 lines of context
  4. Check for framework protection — ORM, sanitizer, type constraint
  5. Verify exploitability — can an attacker actually control the input?
  6. Document decisions — especially false-positive reasoning

False-positive fatigue destroys adoption: 10 high-confidence findings beat 100 noisy ones every time.

Discipline of evidence

Manual review must produce evidence the way every other methodology does.

Not a finding "I noticed the code looks insecure."
A finding "Lines 142–158 of payment.py accept a client-supplied amount without server-side validation, demonstrated by request X."

For SAST findings, evidence must include:

  • Location — file, line, function
  • Tool used and its configuration — ruleset, version
  • Whether the finding was manually verified or accepted from tool output
  • If triaged as false positive — the documented reasoning

Topic 19 covers evidence and traceability in depth.

Check — what scanners miss

Reflection

A client says: "We already run SAST in CI — do we still need manual review?" How do you respond?

Reveal answer

SAST is necessary but not sufficient. It catches patterns (injection sinks, hardcoded secrets, deprecated APIs) reliably. It misses authorization bugs, business-logic flaws, race conditions, design weaknesses, and missing controls. Continuous SAST sets the floor; periodic manual review — and threat modeling at design time — provides the ceiling.

What you take home

  • Manual review covers five artefact types: code, configuration, IaC, architecture, and policy/process
  • Logic flaws, authorization bugs, missing controls, and design weaknesses require human judgement — scanners cannot reliably find them
  • Review is most powerful early in the SDLC; design-stage threat modeling is the cheapest place to fix issues
  • Threat modeling follows a four-question loop and produces a structured, reusable artefact
  • SAST is a pattern matcher with a high false-positive rate — configure, baseline, and triage before trusting output
  • Mature posture: automated triages, human decides; neither replaces the other
  • Every finding — manual or tool-assisted — requires specific, attributable evidence: file, line, context

Next: Topic 11 — Penetration testing (as methodology). Where manual review ends, controlled exploitation begins.

END · TOPIC 10

Tools triage. Humans decide.

Pick one artefact from a project you know and think through what a manual review of it would actually look for.