
On a recent test of a fintech client, the entire compromise traced back to one line of recon output: a TLS certificate transparency log listing jenkins-old.target.com, a build server everyone had forgotten. It ran an unauthenticated dashboard, leaked AWS keys in a job log, and from there the path to production took an afternoon. None of it would have surfaced if the tester had jumped straight to the login page on day one.
That is the case for understanding the phases. Every credible pentest follows the same arc, reconnaissance, scanning, exploitation, post-exploitation, and reporting, drawn from standards like the Penetration Testing Execution Standard (PTES). This post walks each phase with the real tools and output a tester sees, and shows where teams quietly lose the value they paid for.
Reconnaissance is intelligence gathering: the tester maps the target's footprint before touching it aggressively. Passive recon uses OSINT (WHOIS, certificate transparency, LinkedIn, leaked-credential dumps) to find subdomains and exposed assets without sending a hostile packet. Active recon runs Amass, subfinder, and theHarvester. The single highest-value habit is pulling every certificate-transparency entry, because TLS certs leak internal hostnames teams forget exist:
$ subfinder -d target.com -silent | tee subs.txt
api.target.com
vpn-test.target.com <- not in any inventory
jenkins-old.target.com <- forgotten build server
staging-pay.target.com <- shares prod databaseThree of those four hosts were never handed to the tester. A scanner pointed at a single production IP never sees them. Strong recon is the foundation of real network testing, and it mirrors the broader penetration testing process.
The why behind recon is attack-surface math. Every forgotten host, exposed email format, and leaked credential is a door the defender does not know exists, which means it is unmonitored and unpatched by definition. A common false negative here is trusting the client's asset inventory. Inventories are aspirational; certificate transparency, passive DNS, and old marketing campaigns are reality. The discipline that pays off is treating passive recon as exhaustive before sending a single active packet, because the noisiest scan in the world will not reveal a host you never knew to point it at.
Scanning turns the recon map into a precise inventory of live hosts, open ports, services, and versions. nmap is the workhorse, and the telltale lines are unauthenticated services and stale software:
$ nmap -sV -sC -p- --min-rate 2000 -oA full vpn-test.target.com
80/tcp open http Apache 2.4.49 <- path traversal CVE-2021-41773
443/tcp open ssl/http Apache 2.4.49
8080/tcp open http Jenkins 2.289 (login disabled) <- anon accessThat Apache 2.4.49 banner is a lead, not a conclusion. The tester verifies the vulnerable module is actually loaded before trusting it, because a version match alone produces false positives. For web apps, the tester crawls with Burp Suite and fuzzes hidden paths with ffuf or feroxbuster. The discipline that separates a tester from a scan operator: every scanner hit gets reachability-tested before it reaches the report.
Enumeration is the step within scanning that teams confuse. Scanning tells you a service is live; enumeration digs into it for the specifics an attacker acts on, usernames, SMB shares, exact build numbers, anonymous bind on LDAP. The classic false positive is a vulnerability scanner rating a backported patch as vulnerable because the banner still shows an old version. Distributions like Debian and Red Hat patch in place without bumping the visible version, so a tester always confirms against behavior, not the string. Skip that and your report fills with high-severity findings that a developer will close in five minutes and lose trust in everything else you wrote.
Exploitation turns a candidate vulnerability into real access. This might be SQL injection driven with sqlmap, an insecure deserialization bug, a misconfigured S3 bucket, or reused credentials from a leak. Against the Apache box above, the tester confirms the traversal actually reads files rather than trusting the banner:
$ curl --path-as-is "http://vpn-test.target.com/cgi-bin/.%2e/.%2e/etc/passwd"
root:x:0:0:root:/root:/bin/bash
deploy:x:1000:1000::/home/deploy:/bin/bash <- confirms read, not a false positiveThe discipline here is proof without damage. A good tester demonstrates impact, pulling one file or one record, rather than dumping a production database. Every successful exploit is logged with the exact request, response, and repro steps, which is what makes the eventual validated findings defensible when an engineer disputes them.
Exploitation is also where ethics and scope earn their keep. The moment a tester gets code execution or database access, they can cause real harm, so the rules of engagement defined at kickoff dictate exactly how far to go. A SQL injection gets proven by extracting the database version and one row, not by dumping every customer record, which would create a real breach you would have to disclose. If exploitation might destabilize production, the tester pauses and calls the contact rather than pressing on. That restraint is not timidity; it is what separates an authorized test from an incident.
Post-exploitation answers the question the business actually cares about: now that I am in, how far can I go? The tester escalates privileges, harvests credentials, and moves laterally. On a Windows network that means BloodHound to map the shortest path to Domain Admin and Rubeus or Mimikatz to abuse Kerberos tickets and credentials in memory:
$ bloodhound-python -d target.local -u svc_scan -p ... -c All
[+] shortest path to DA: svc_scan -> CanRDP -> FILESRV01
-> LocalAdmin -> admin creds in memory -> DA (3 hops)A low-severity SSRF that lets you reach the cloud metadata endpoint, steal IAM credentials, and pivot into the whole account is not low severity, and only post-exploitation reveals that. This maps directly to Active Directory penetration testing.
This is why post-exploitation, not exploitation, drives the real risk rating. The initial foothold is rarely the interesting part. The question the business pays to answer is blast radius: from this one compromised service account, can an attacker reach the crown jewels, and how many hops does it take? The tester also measures detection along the way. Did the blue team notice the BloodHound collection? Did the lateral RDP trigger an alert? A finding that says we reached Domain Admin in three hops and your SOC never paged is two findings: the path, and the detection gap that let it run silent.
Reporting is the deliverable you actually pay for. The tester converts everything into a prioritized list, each finding carrying a CVSS score, evidence, repro steps, and a concrete fix. The best reports prioritize by real exploitability, often weighting CVSS with EPSS (exploitation probability) and the CISA KEV catalog so your team fixes what attackers actually use first.
One report-quality tell: if every finding is rated purely by CVSS base score with no exploitability context, the tester skipped the judgment you paid for. Real prioritization weighs reachability and the chained impact. A strong report leads with an executive summary a non-technical leader can act on (what is the risk, what does it cost us, what do we fix first) and a technical section an engineer can reproduce line by line.
What teams get wrong is treating the report as the end. It is the start of remediation and retest, and the retest of high-severity items should be part of the original deal, not an upsell. Many now augment point-in-time reports with continuous coverage. Agentic pentesting keeps validating fixes and new exposure between engagements, so the gap between this report and next year's does not become the window an attacker walks through.
Constantly. The five phases run roughly in order, but good testers loop. Fresh access in post-exploitation kicks off new recon against internal systems, and a single new subdomain found late in the test can reopen scanning days later. Think of it as a funnel that occasionally refills: recon and scanning widen the picture, exploitation narrows to what is breakable, and post-exploitation plus reporting measure and communicate the damage.
This looping is also why time estimates are ranges, not fixed blocks. A black box engagement shifts far more time into recon, while a gray box test with credentials in hand can start probing authorization on day one. The phases are a methodology, not a stopwatch.
If you want to practice the phases safely, build a lab you own and run them end to end. Stand up a vulnerable target like OWASP Juice Shop or a Metasploitable VM, then move deliberately: enumerate with nmap, crawl with Burp, exploit one bug with proof, and write the finding up as if a developer had to fix it. The skill that compounds is not memorizing tool flags, it is the discipline of confirming reachability before believing a finding and measuring blast radius after landing one. That habit is what separates a phase you ran from a phase you understood, and it is the same discipline a formal penetration testing process enforces on a real engagement.