
Verizon's 2024 Data Breach Investigations Report found that breaches starting with exploitation of a vulnerability jumped 180% year over year, fueled by mass attacks on unpatched edge devices and web apps. The uncomfortable part: most of those organizations had vulnerability scanners running. A scanner tells you a door might be unlocked. A penetration test sends someone to walk through it and show you what is on the other side.
This guide answers what penetration testing actually is, the types you can run, the five-phase process testers follow, and what a real finding looks like on paper. You will see annotated tool output, a sample report excerpt, and where teams quietly waste their budget. The goal is that you finish able to scope the right test for your risk.
Penetration testing is the practice of attacking your own systems, with written permission, to find and exploit security flaws the way a real adversary would. The deliverable is not a list of vulnerabilities. It is proof: which weaknesses an attacker can actually use, how they chain together, and what the business impact is (data theft, account takeover, lateral movement to Domain Admin).
Here is the distinction in practice. A scanner flags a host running an old library and rates it High off the version banner. A pentester checks whether the vulnerable code path is even reachable, builds a working payload, and shows it pulling a live session token. One is a guess; the other is evidence. The reason this matters is prioritization: a security team with 800 open scanner findings and a two-person backlog needs to know which five will actually get them breached, and only exploitation answers that. For where each fits, see penetration testing vs vulnerability scanning.
A tester combines automated tooling with human judgment. Scanners surface candidates fast, but a person decides what is a false positive, what is reachable, and what is genuinely exploitable. That judgment is the product. A junior tester who runs a scanner and pastes the output has delivered nothing you could not have bought for a subscription fee. A senior one tells you the exact request that breaks your app and the one-line change that fixes it.
Engagements are always authorized and scoped in writing. Testing without explicit permission is a crime in most jurisdictions, which is why a signed rules-of-engagement document and a defined scope come first, before a single packet goes out. Get that wrong and the most damaging finding could be your own legal exposure.
Before any tooling runs, a real engagement nails down scope and rules of engagement in writing. This is not paperwork theater. It defines exactly what is in bounds, what is off limits, the testing window, and what the tester does the moment they find a live breach already in progress. A minimal version reads like this:
In-scope: app.acme.com, api.acme.com, 10.0.0.0/24
Out-of-scope: billing.acme.com, all third-party SaaS
Window: Mon-Fri 18:00-06:00 UTC
Rules: no DoS, no data destruction
Stop & call: live breach, real PII access, instability
Authorized by: ___________ (name, title, date)The single line that protects everyone is the authorization signature. Cloud adds a wrinkle: AWS lets you test most of your own resources without prior approval, but other providers and certain test types still need notification. We walk through the full kickoff in how to prepare for a penetration test.
The type of pentest is defined by the attack surface you point it at. Most programs run several over a year because each surface fails differently, and a tester who is excellent at web apps is not automatically strong on Active Directory or cloud IAM.
What teams get wrong is buying one annual network pentest and assuming the apps and APIs behind it are covered. They are not. A perimeter scan rarely touches the authenticated application logic where account takeover and payment tampering live. Scope each surface you actually expose.
The other common mistake is mismatched seniority. A cloud test is won or lost in the IAM layer: an over-permissive role, a wildcard policy, a function that can assume a more privileged role. A web specialist who has never read an AWS trust policy will miss the exact privilege-escalation path that matters most. When you scope, ask the vendor who specifically will run each surface and what they have broken before on that surface, not just the firm's logo wall.
Most engagements follow five phases: reconnaissance, scanning, exploitation, post-exploitation, and reporting. Recon gathers intel on the target. Scanning enumerates live hosts, ports, and versions. A typical opening sweep, and what it surfaces, looks like this:
$ nmap -sV -sC -p- --min-rate 2000 -oA scan 10.10.0.0/24
10.10.0.14
22/tcp open ssh OpenSSH 8.2p1
443/tcp open ssl/http nginx 1.18.0
8080/tcp open http Jetty 9.4.31 <- forgotten staging admin
9200/tcp open http Elasticsearch 6.8 (no auth) <- jackpotThat open Elasticsearch on 9200 with no auth is the line a tester circles immediately: a full data store reachable without a password. Exploitation turns that into proof, post-exploitation measures how far it reaches, and reporting translates everything into fixable findings.
The phases are not a rigid waterfall. A good tester loops: fresh access in post-exploitation opens new recon against internal systems, and a single subdomain found late can reopen scanning days later. The phase teams undervalue most is recon. The engagements that surface the worst bugs are almost always the ones where the tester spent day one mapping forgotten assets, like that 8080 staging admin, instead of jumping straight to the login page. We break down each step in the five phases of penetration testing, and the box model you pick (black, gray, or white) decides how much time recon eats, covered in black box vs white box vs gray box.
A finding is only useful if an engineer can reproduce it and a leader can prioritize it. That means a CVSS score, evidence, and exact repro steps, not a CVE dump. Here is how a single broken-access-control finding reads in a real report:
[High, CVSS 8.1] IDOR on GET /api/v1/invoices/{id}
Evidence: auth as user A, request invoice 8842 (owned by user B)
-> HTTP 200, returns full invoice + payment metadata, no 403
Impact: any authenticated user reads any tenant's billing data
Fix: enforce object ownership server-side, e.g.
WHERE invoice.id = :id AND invoice.tenant_id = :ctx.tenantThe fix line matters. Generic advice like use strong access control is useless to the developer who has to ship Tuesday. A concrete tenant-scoped query, or an enforced authorization check on the object, is something they can paste in. The findings table below shows how a handful of these get prioritized together.
Severity is where weak reports fall apart. A finding rated purely on CVSS base score, with no note on whether it is reachable or what it chains into, tells you nothing about urgency. Strong reports weight CVSS with EPSS (the probability a flaw will be exploited) and the CISA KEV catalog (flaws confirmed exploited in the wild), so a medium-CVSS bug that attackers are actively using outranks a high-CVSS one nobody can reach. If your report does not show that reasoning, you paid for a list, not an assessment.
Scanners miss anything that requires understanding intent. They match versions and patterns, so they never test business logic: a checkout that accepts a negative quantity to mint a refund, a password-reset token that is predictable, or a coupon endpoint with no rate limit. On a recent assessment of a logistics SaaS, we found exactly that chain. A standard user account could read its own user object, which leaked an internal UUID, and an adjacent role-assignment endpoint had no authorization check. Two findings that any scanner would rate Low individually combined into full admin takeover in under 30 requests.
Scanners cannot connect those dots because neither finding has a signature for the chain. This is why a clean scan report is not a clean bill of health, and why teams increasingly add reasoning, not just more signatures, to the loop. We compare the approaches in DAST vs penetration testing vs agentic pentesting.
Run a full pentest at least annually and after any major change: a new feature, an infrastructure migration, or a merger. Compliance frameworks like PCI DSS and SOC 2 often mandate this cadence. But annual testing alone leaves long blind windows, and a bug shipped the week after your test sits undetected until the next one.
That gap is why teams are moving toward continuous testing. AI-driven approaches like agentic pentesting keep probing your attack surface as it changes, so a risky deploy on Tuesday gets caught Tuesday, not at next year's audit. Use it to cover the gap between deep human-led engagements, not to replace them.
If you want to build the skill rather than buy it, the safest way to practice is a deliberately vulnerable target you own. Stand up OWASP Juice Shop or a lab like HackTheBox or TryHackMe, point your tools at it, and reproduce the IDOR and traversal findings shown above end to end. The discipline you are practicing is not running the tool, it is reading the response and deciding whether it is real, which is exactly the judgment that separates a pentest from a scan. When you are ready for a real engagement, prepare properly first, covered in how to prepare for a penetration test.