Security · Topic 07 of 23 · Part II — Catalog of testing methodologies

OSINT

Passive, quiet, and disproportionately effective — OSINT is the first methodology in the catalog because it is the closest to what real attackers do before the target ever notices.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → OSINT
Topic 07 · OSINT basics

What the world already knows about the target

By the end of this topic you can:
  • Define Open Source Intelligence and distinguish it from active reconnaissance, social engineering, and illegal data acquisition
  • Enumerate the major OSINT source categories and the types of information each yields
  • Plan an OSINT collection effort with clear objectives, sources, and stopping criteria
  • Recognize the legal and ethical limits of OSINT, including where "publicly accessible" stops meaning "free to use"
  • Evaluate the quality and risk of OSINT findings before acting on them

What OSINT is — and is not

OSINT Intelligence from publicly available sources. The defining characteristic is passivity: no packet reaches the target; no call is made; the tester observes what the world already has.

Gather what is freely indexed and intended to be discoverable; analyze for security relevance; document the trail; act only within scope and authorization.

What crosses the line

  • Active reconnaissance — any packet sent to target infrastructure (port scan, banner grab, even an obscure URL on the target's own server)
  • Social engineering — calling the helpdesk to ask questions is active
  • Illegal data acquisition — trawling stolen credential dumps from criminal forums
  • Surveillance — tracking movements or building a behavioural dossier on a real person

What an OSINT assessment is trying to find

  • Attack surface — domains, subdomains, IP ranges, exposed services, cloud assets, wireless networks
  • Technologies in use — stacks, CMS platforms, third-party SaaS, code repositories
  • People and identities — employees, roles, email patterns, public statements about internal systems
  • Operational information — locations, org chart, suppliers, recent hiring or layoffs
  • Business ecosystem — clients, subsidiaries, managed service providers, data processors and sub-processors; privacy notices and DPIAs often name the entire chain explicitly
  • Sensitive disclosures — source code or config leaked to public repos, credentials on paste sites, misconfigured cloud storage
  • Historical exposure — previous breaches, incident databases, archived site versions
Principle Gather what is needed for the engagement's objectives — not everything.

Source categories — part 1

CategoryExamplesWhat it yields
Search engines & indicesGoogle, Bing, Yandex, Wayback MachineCached pages, historical snapshots, exposed documents via dorks
Domain & infrastructureWHOIS, crt.sh, Censys, passive DNS, ASN lookupSubdomains, IP ranges, certificate history, routing data
Internet-wide scan indexesShodan, Censys, ZoomEye, FOFAExposed services, banners, open ports — the index did the scanning
Code & developer platformsGitHub, GitLab, npm, Docker HubLeaked config, API keys, internal product names, commit history
Social networksLinkedIn, X/Twitter, MastodonOrg structure, employee roles, technologies named in job postings

Source categories — part 2

CategoryExamplesWhat it yields
Public recordsCompany registers, regulatory filings, DPIAs, privacy noticesOwnership, subsidiaries, data processors; privacy notices often name sub-processors explicitly
Document repositoriesSlideShare, Scribd, public file-sharing indexesInternal documents that escaped through misuse
Specialized toolsMaltego, SpiderFoot, theHarvester, Recon-ng, AmassAutomated aggregation, email and subdomain gathering, visual link analysis
Breach indexes cautionHave I Been Pwned (query only)Email exposure — use query services, never handle raw dumps
Wireless passive reconairodump-ng (passive mode), SDR receiversSSIDs, encryption types, Bluetooth devices, RF signals — tester transmits nothing
Geospatial & visualMapping services, street-view imagery, public photosPhysical layout, visible screens, badge designs, building plans

A simple OSINT workflow

1Define the objective — attack surface? phishing targets? third-party dependencies?
2Identify seed information — company name, domain, employee list
3Expand outward from each seed; document every linkage
4Validate — cross-check important findings across multiple sources
5Stop at the boundary — when collection requires active probing, OSINT is over
6Document and minimize — note every source and query; discard unnecessary personal data

Quality, freshness, and the noise problem

Public information has a quality distribution. A subdomain from a certificate transparency log is usually current. A LinkedIn post from 2019 may be obsolete. An employee blog post may be accurate, aspirational, or wrong.

Antipattern A tool dump as a report. Tools like SpiderFoot generate hundreds of results — most irrelevant. The tester still has to read it.

Discipline rules

  • Date every finding — when was this published or last updated?
  • Source every finding — can another tester reproduce the lookup?
  • Cross-reference important findings — one LinkedIn post is not a reliable claim
  • Distinguish observed from inferred — "CT log shows dev.example.com" is observed; "a dev environment is exposed" is inferred and could be wrong

Legal and ethical limits

  • Personal data — employee names, email addresses, home addresses are personal data under FADP/GDPR; processing requires a lawful basis and data minimisation
  • Stolen data dumps — trawling leaked-credential files is in a grey zone at best; use professional query services (Have I Been Pwned) instead
  • Terms of service — platforms such as LinkedIn and Facebook prohibit scraping; automated mass scraping can attract legal claims
  • Identifiable individuals — stop before the profile looks like what a stalker would compile, regardless of legality
  • Cross-border data flow — OSINT may aggregate data from multiple jurisdictions; the export and storage rules of each apply
  • Authorization — OSINT must be explicitly in scope; some clients want it, some explicitly do not
Core principle "Publicly accessible" does not mean "legally and ethically free to gather and use." When in doubt, ask the client.

OSINT vs. active reconnaissance — the bright line

The defining act is the packet sent to the target. OSINT pulls data that is already aggregated by third parties. Active reconnaissance creates new traffic on the target.

Why the line matters

  • Legal authorization — OSINT is rarely controversial; active scanning requires explicit authorization
  • Detection profile — the target may notice active scanning; they cannot notice OSINT
  • Skill demands — patience and source breadth vs. network protocols and tooling
Canonical example Querying Shodan for the target's open ports is OSINT — Shodan did the scanning. Running Nmap against the target's IP is active reconnaissance. Both reveal the same information; the legal and ethical profile is entirely different.

Active reconnaissance is covered in Part III — Topic 14.

Key tools in the syllabus

Shodan Index of internet-exposed services. Search by banner, port, region, organization, certificate. Reveals what the target has exposed without ever scanning.
Maltego Visual link-analysis platform. Combines data-lookup transforms into graphs. Strong for showing entity relationships clearly in a report.
Google Dorking Targeted queries using operators: site:, inurl:, filetype:, intitle:. The Google Hacking Database (GHDB) collects useful pre-built queries.
Recon-ng Framework for organized OSINT collection with pluggable modules. Keeps a structured workspace; results feed into a local database for later analysis.

Active recon tools (Nmap, etc.) are introduced in Topic 14.

Check — the bright line

Reflection

What is the precise difference between OSINT and active reconnaissance? Give one example that sits right on the border.

Reveal answer

OSINT pulls from third-party aggregates; active recon sends packets directly to the target. Border example: querying Shodan for the target's open ports is OSINT — Shodan did the scanning. Running Nmap against the same IP is active. Visiting the target's public website is technically active traffic — most engagements treat ordinary browsing as low-risk, but the scope document should specify.

What you take home

  • OSINT is intelligence from publicly available sources; passivity — no packet to the target — is the defining characteristic
  • The bright line with active recon is the packet sent to the target, not the type of information retrieved
  • Source categories span search engines, DNS and CT logs, internet-scan indexes, code platforms, social networks, public records, breach indexes, and wireless passive observation
  • Business ecosystem mapping — including privacy notices and DPIAs — expands the attack surface far beyond the direct target
  • "Publicly accessible" does not mean "legally or ethically free to gather and use"
  • Every finding needs a date, a source, and a clear observed/inferred label
  • Counter-OSINT — limiting what the world can discover about the organization — is the defender's mirror of this topic

Next: Topic 08 — Social engineering. OSINT provides the intelligence that makes social engineering effective; the two are rarely run in isolation.

END · TOPIC 07

Look before you touch.

Before the next session: pick one public domain and find three facts about its infrastructure using only OSINT sources — document your sources.