Security · Topic 10 of 23 · Part II — Catalog of testing methodologies

Manual review

A skilled human inspects the artefact itself — source code, configuration, architecture, policy — and reasons about its security. The finding comes from judgement, not a pattern database.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → Manual review

Topic 10 · Human judgement

What tools cannot do

By the end of this topic you can:

Define manual review and contrast it with automated scanning
Identify vulnerabilities and weaknesses that only manual review finds reliably
Plan and execute a focused review of code, configuration, architecture, or process documentation
Use structured techniques — threat modeling, checklist, abuse-case analysis — to direct review effort
Configure, run, and triage SAST output without being overwhelmed by false positives
Combine SAST and DAST output with manual review without becoming dependent on tool output

What manual review is

Systematic, human-driven inspection of an artefact for security weaknesses. The artefact varies; the discipline is the same.

Code

Reading the implementation for vulnerabilities

Config

Web server, DB, OS, identity, network settings

Architecture

Components, trust boundaries, and data flows

IaC

Terraform, CloudFormation, Helm, Kubernetes manifests

Policy

IR playbooks, vendor management, patch SLAs

Key point Tools may assist with each category, but the interpretation and the finding of what tools missed is human.

What manual review finds that scanners do not

A scanner finds known patterns. Manual review finds:

Logic flaws — e.g., passing amount=-100 increases a balance
Authorization bugs — IDOR, privilege escalation across workflows
Race conditions — two simultaneous requests bypass a spending limit
Insecure design — client trusted to validate price
Crypto misuse — ECB mode, no IV, MAC-then-encrypt, custom protocols

Trust boundary errors — untrusted data crosses to trusted context without validation
Missing controls — no rate limit, no log, no validation (finding absence)
Dependency interactions — individually safe libraries combined unsafely
Compliance gaps — logs exist but omit required fields
Process weaknesses — vendor checklist skips security review

Common property: every one of these requires understanding what the system is supposed to do.

Manual review and the SDLC

Review is most powerful when it happens early. Each stage finds different things; none substitutes for another.

1Design review — before code is written; cheapest place to fix; typically threat modeling

2Code review — during development, integrated with pull-request workflows

3Pre-deployment review — before changes go live; catch what slipped through

4Periodic review — scheduled or triggered by major changes; existing systems

Shift left A design flaw found at stage 1 costs orders of magnitude less to fix than the same flaw found post-deployment.

Threat modeling — structured design-stage review

Systematically reason about how a system could be attacked, given its design. Based on Adam Shostack's Four Question Framework:

1What are we working on? Decompose into components; map trust boundaries and data flows

2What can go wrong? Enumerate threats — STRIDE, PASTA, LINDDUN

3What are we going to do? For each threat: mitigate, accept, or investigate further

4Did we do a good job? Review the model itself for completeness

Output Data-flow diagram + threat list + decisions — a structured artefact the team can act on, share, and revisit when the system changes.

Code review — structured approaches

Techniques

Top-down by data flow — follow untrusted input from entry points to sinks; the dominant technique
Top-down by feature — start at a feature (auth, payment, file upload) and read everything related
Pattern hunting — locate dangerous APIs (exec, eval, weak crypto), read surrounding context
Checklist — walk OWASP ASVS systematically; floor, not ceiling
Bottom-up — read code as written; useful only for small modules

Common workflow

Combine Pattern-hunt (or run SAST) to identify hot spots, then apply top-down data-flow review to confirm whether each is a real vulnerability in context.

Language matters Review depth depends on reviewer fluency in the language and framework. Scope honestly.

Configuration and IaC review

Configuration review

Inspect the settings an operator chose, not the code itself:

Web server (Apache, Nginx, IIS)
Application frameworks (Django, Spring, Express)
Database (pg_hba.conf, MongoDB binding)
OS hardening — sysctl, AppArmor, sudoers
Identity provider — SSO, MFA policy, conditional access
Network — firewall rulesets, ACLs, segmentation

Compare against a baseline: CIS Benchmark, vendor hardening guide, or organizational standard.

Infrastructure-as-code review

Read the Terraform / CloudFormation / Helm / Ansible source rather than the deployed result.

Advantages Findings fixed in code; catches drift; scales across many environments built from the same modules.

Tools: tfsec, Checkov, KICS, OPA / Conftest. They handle easy cases; humans handle architecture and design.

Architecture and policy review

Architecture review

Read diagrams and design documents to find:

Trust boundaries placed wrong — browser trusted to validate
Single points of failure for security controls
Sensitive data crossing networks or jurisdictions unprotected
Defence-in-depth gaps — one layer between attacker and crown jewels
Operational weaknesses — no detection path for attacks on component X

Benefits enormously from interactive sessions with system designers.

Policy and process review

Read the documents:

IR runbooks — contacts current, steps complete, dependencies named?
Vendor risk management — when reviewed, by whom, to what depth?
Patch management — SLAs per severity class; realistic?
Access management — joiners/leavers/movers, periodic recertification

High impact Clients can change a process in weeks where they cannot rewrite a system in years.

Combining manual and automated

The mature stance: automated triages, human decides. Neither replaces the other.

1Run SAST — produce a long list with many false positives

2Filter by category — start with injection, deserialization, command execution

3Triage each candidate — read surrounding code; is it real, exploitable, worth reporting?

4Review independently — entry points and sensitive flows that SAST may have missed

5Combine — report contains verified SAST findings plus human-found issues

The same shape applies to CSPM + manual cloud review, SCA + manual dependency analysis, and web scanner + manual application test.

SAST — how tools work and which to choose

How SAST works

Pattern matching — regex / AST rules for dangerous API calls
Data-flow analysis — tracks source-to-sink across multiple lines
Type and scope analysis — reduces false positives
Configuration — rules enabled/disabled; sensitivity adjusted; scope limited

Treat as A pattern matcher with a high false-positive rate, not a security oracle.

Tool landscape

SonarQube — enterprise standard, comprehensive, noisier
Semgrep — rule-as-code, lower false-positive rate, customizable
CodeQL — powerful query language, very precise, steep learning curve
Checkmarx / Veracode / Fortify — commercial; deep data-flow; expensive
GitHub Advanced Security — integrates Semgrep + CodeQL natively
Language-specific — Bandit (Python), Gosec (Go), SpotBugs (Java)

Running and triaging SAST effectively

Running SAST

Start with default config; understand the output before tuning
Baseline on existing code; track new findings per commit
Tune incrementally — disable noisy rules with a comment explaining why
Integrate into CI on every pull request, not just periodically
Gate only on high-confidence findings; use informational for the rest

Triaging output

Filter by severity — critical / high first
Filter by category — injection, hardcoded secrets, unsafe deserialization first
Read code around the finding — 10–20 lines of context
Check for framework protection — ORM, sanitizer, type constraint
Verify exploitability — can an attacker actually control the input?
Document decisions — especially false-positive reasoning

False-positive fatigue destroys adoption: 10 high-confidence findings beat 100 noisy ones every time.

3 min. Walk through the SQL injection example: f"SELECT * FROM users WHERE id = {user_id}" where the framework already validates the integer type. Real, low exploitability; document and remediate to use proper parameterization.

Discipline of evidence

Manual review must produce evidence the way every other methodology does.

Not a finding "I noticed the code looks insecure."

A finding "Lines 142–158 of payment.py accept a client-supplied amount without server-side validation, demonstrated by request X."

For SAST findings, evidence must include:

Location — file, line, function
Tool used and its configuration — ruleset, version
Whether the finding was manually verified or accepted from tool output
If triaged as false positive — the documented reasoning

Topic 19 covers evidence and traceability in depth.

Check — what scanners miss

Reflection

A client says: "We already run SAST in CI — do we still need manual review?" How do you respond?

Reveal answer

SAST is necessary but not sufficient. It catches patterns (injection sinks, hardcoded secrets, deprecated APIs) reliably. It misses authorization bugs, business-logic flaws, race conditions, design weaknesses, and missing controls. Continuous SAST sets the floor; periodic manual review — and threat modeling at design time — provides the ceiling.

What you take home

Manual review covers five artefact types: code, configuration, IaC, architecture, and policy/process
Logic flaws, authorization bugs, missing controls, and design weaknesses require human judgement — scanners cannot reliably find them
Review is most powerful early in the SDLC; design-stage threat modeling is the cheapest place to fix issues
Threat modeling follows a four-question loop and produces a structured, reusable artefact
SAST is a pattern matcher with a high false-positive rate — configure, baseline, and triage before trusting output
Mature posture: automated triages, human decides; neither replaces the other
Every finding — manual or tool-assisted — requires specific, attributable evidence: file, line, context

Next: Topic 11 — Penetration testing (as methodology). Where manual review ends, controlled exploitation begins.

END · TOPIC 10

Tools triage. Humans decide.

Pick one artefact from a project you know and think through what a manual review of it would actually look for.