Security · Topic 07 of 23 · Part II — Catalog of testing methodologies

OSINT

Passive, quiet, and disproportionately effective — OSINT is the first methodology in the catalog because it is the closest to what real attackers do before the target ever notices.

Syllabus: § Testing Methodologies and Tools of the Trade (2, 4) → OSINT

Topic 07 · OSINT basics

What the world already knows about the target

By the end of this topic you can:

Define Open Source Intelligence and distinguish it from active reconnaissance, social engineering, and illegal data acquisition
Enumerate the major OSINT source categories and the types of information each yields
Plan an OSINT collection effort with clear objectives, sources, and stopping criteria
Recognize the legal and ethical limits of OSINT, including where "publicly accessible" stops meaning "free to use"
Evaluate the quality and risk of OSINT findings before acting on them

What OSINT is — and is not

OSINT Intelligence from publicly available sources. The defining characteristic is passivity: no packet reaches the target; no call is made; the tester observes what the world already has.

Gather what is freely indexed and intended to be discoverable; analyze for security relevance; document the trail; act only within scope and authorization.

What crosses the line

Active reconnaissance — any packet sent to target infrastructure (port scan, banner grab, even an obscure URL on the target's own server)
Social engineering — calling the helpdesk to ask questions is active
Illegal data acquisition — trawling stolen credential dumps from criminal forums
Surveillance — tracking movements or building a behavioural dossier on a real person

What an OSINT assessment is trying to find

Attack surface — domains, subdomains, IP ranges, exposed services, cloud assets, wireless networks
Technologies in use — stacks, CMS platforms, third-party SaaS, code repositories
People and identities — employees, roles, email patterns, public statements about internal systems
Operational information — locations, org chart, suppliers, recent hiring or layoffs

Business ecosystem — clients, subsidiaries, managed service providers, data processors and sub-processors; privacy notices and DPIAs often name the entire chain explicitly
Sensitive disclosures — source code or config leaked to public repos, credentials on paste sites, misconfigured cloud storage
Historical exposure — previous breaches, incident databases, archived site versions

Principle Gather what is needed for the engagement's objectives — not everything.

Source categories — part 1

Category	Examples	What it yields
Search engines & indices	Google, Bing, Yandex, Wayback Machine	Cached pages, historical snapshots, exposed documents via dorks
Domain & infrastructure	WHOIS, crt.sh, Censys, passive DNS, ASN lookup	Subdomains, IP ranges, certificate history, routing data
Internet-wide scan indexes	Shodan, Censys, ZoomEye, FOFA	Exposed services, banners, open ports — the index did the scanning
Code & developer platforms	GitHub, GitLab, npm, Docker Hub	Leaked config, API keys, internal product names, commit history
Social networks	LinkedIn, X/Twitter, Mastodon	Org structure, employee roles, technologies named in job postings

Source categories — part 2

Category	Examples	What it yields
Public records	Company registers, regulatory filings, DPIAs, privacy notices	Ownership, subsidiaries, data processors; privacy notices often name sub-processors explicitly
Document repositories	SlideShare, Scribd, public file-sharing indexes	Internal documents that escaped through misuse
Specialized tools	Maltego, SpiderFoot, theHarvester, Recon-ng, Amass	Automated aggregation, email and subdomain gathering, visual link analysis
Breach indexes caution	Have I Been Pwned (query only)	Email exposure — use query services, never handle raw dumps
Wireless passive recon	airodump-ng (passive mode), SDR receivers	SSIDs, encryption types, Bluetooth devices, RF signals — tester transmits nothing
Geospatial & visual	Mapping services, street-view imagery, public photos	Physical layout, visible screens, badge designs, building plans

A simple OSINT workflow

1Define the objective — attack surface? phishing targets? third-party dependencies?

2Identify seed information — company name, domain, employee list

3Expand outward from each seed; document every linkage

4Validate — cross-check important findings across multiple sources

5Stop at the boundary — when collection requires active probing, OSINT is over

6Document and minimize — note every source and query; discard unnecessary personal data

Quality, freshness, and the noise problem

Public information has a quality distribution. A subdomain from a certificate transparency log is usually current. A LinkedIn post from 2019 may be obsolete. An employee blog post may be accurate, aspirational, or wrong.

Antipattern A tool dump as a report. Tools like SpiderFoot generate hundreds of results — most irrelevant. The tester still has to read it.

Discipline rules

Date every finding — when was this published or last updated?
Source every finding — can another tester reproduce the lookup?
Cross-reference important findings — one LinkedIn post is not a reliable claim
Distinguish observed from inferred — "CT log shows dev.example.com" is observed; "a dev environment is exposed" is inferred and could be wrong

Legal and ethical limits

Personal data — employee names, email addresses, home addresses are personal data under FADP/GDPR; processing requires a lawful basis and data minimisation
Stolen data dumps — trawling leaked-credential files is in a grey zone at best; use professional query services (Have I Been Pwned) instead
Terms of service — platforms such as LinkedIn and Facebook prohibit scraping; automated mass scraping can attract legal claims

Identifiable individuals — stop before the profile looks like what a stalker would compile, regardless of legality
Cross-border data flow — OSINT may aggregate data from multiple jurisdictions; the export and storage rules of each apply
Authorization — OSINT must be explicitly in scope; some clients want it, some explicitly do not

Core principle "Publicly accessible" does not mean "legally and ethically free to gather and use." When in doubt, ask the client.

OSINT vs. active reconnaissance — the bright line

The defining act is the packet sent to the target. OSINT pulls data that is already aggregated by third parties. Active reconnaissance creates new traffic on the target.

Why the line matters

Legal authorization — OSINT is rarely controversial; active scanning requires explicit authorization
Detection profile — the target may notice active scanning; they cannot notice OSINT
Skill demands — patience and source breadth vs. network protocols and tooling

Canonical example Querying Shodan for the target's open ports is OSINT — Shodan did the scanning. Running Nmap against the target's IP is active reconnaissance. Both reveal the same information; the legal and ethical profile is entirely different.

Active reconnaissance is covered in Part III — Topic 14.

Key tools in the syllabus

Shodan Index of internet-exposed services. Search by banner, port, region, organization, certificate. Reveals what the target has exposed without ever scanning.

Maltego Visual link-analysis platform. Combines data-lookup transforms into graphs. Strong for showing entity relationships clearly in a report.

Google Dorking Targeted queries using operators: site:, inurl:, filetype:, intitle:. The Google Hacking Database (GHDB) collects useful pre-built queries.

Recon-ng Framework for organized OSINT collection with pluggable modules. Keeps a structured workspace; results feed into a local database for later analysis.

Active recon tools (Nmap, etc.) are introduced in Topic 14.

Check — the bright line

Reflection

What is the precise difference between OSINT and active reconnaissance? Give one example that sits right on the border.

Reveal answer

OSINT pulls from third-party aggregates; active recon sends packets directly to the target. Border example: querying Shodan for the target's open ports is OSINT — Shodan did the scanning. Running Nmap against the same IP is active. Visiting the target's public website is technically active traffic — most engagements treat ordinary browsing as low-risk, but the scope document should specify.

What you take home

OSINT is intelligence from publicly available sources; passivity — no packet to the target — is the defining characteristic
The bright line with active recon is the packet sent to the target, not the type of information retrieved
Source categories span search engines, DNS and CT logs, internet-scan indexes, code platforms, social networks, public records, breach indexes, and wireless passive observation
Business ecosystem mapping — including privacy notices and DPIAs — expands the attack surface far beyond the direct target
"Publicly accessible" does not mean "legally or ethically free to gather and use"
Every finding needs a date, a source, and a clear observed/inferred label
Counter-OSINT — limiting what the world can discover about the organization — is the defender's mirror of this topic

Next: Topic 08 — Social engineering. OSINT provides the intelligence that makes social engineering effective; the two are rarely run in isolation.

END · TOPIC 07

Look before you touch.

Before the next session: pick one public domain and find three facts about its infrastructure using only OSINT sources — document your sources.