
Pick up almost any professional web application pentest report and you will find findings tagged WSTG-INPV-05 or WSTG-ATHZ-02. That is the OWASP Web Security Testing Guide doing its job: a shared vocabulary so a tester in Berlin and a reviewer in Bangalore mean the exact same check when they write down a test ID. WSTG 4.2 catalogs more than a hundred discrete tests across 12 categories, and each one carries an identifier that does not drift between revisions.
This guide is built around how you actually use the WSTG, not just what it lists. You will see how the categories and IDs work, how WSTG differs from the Top 10 and ASVS, what a single test looks like end to end with real request and response bytes, where scanners go blind, and how a mature finding pairs a WSTG ID with an OWASP Top 10 2021 mapping and a CVSS score.
The OWASP Web Security Testing Guide is a free, community-maintained framework that defines how to test a web application for security flaws end to end. It is published by the Open Worldwide Application Security Project, is currently at stable version 4.2 (released 2020), and has a 5.0 rewrite in active development on GitHub.
WSTG is descriptive and procedural, not a scanner you run. For each test it gives a summary of the issue, black-box and gray-box test steps, example payloads, and remediation notes. It assumes you are working through an application methodically rather than reading a tool's alert pane, which is exactly why it pairs with a hands-on web application pentesting checklist and a defined set of penetration testing steps and test cases.
WSTG groups its tests into 12 categories, each with a short code, and every individual test has a stable ID of the form WSTG-<CATEGORY>-<NUMBER>. The categories run roughly in the order you would test an app, from passive recon through to client-side issues, and the ID never changes between minor revisions, which is what makes coverage auditable.
For example, WSTG-INPV-01 is Testing for Reflected Cross Site Scripting, WSTG-INPV-05 is SQL Injection, WSTG-ATHN-03 is Weak Lockout Mechanism, and WSTG-SESS-02 is Cookie Attributes. The first eleven categories have existed since the 4.x line; API Testing (APIT) was formalized as the surface shifted toward services. If you test APIs heavily, treat APIT as a starting point and pair it with a dedicated API penetration testing methodology.
WSTG-INFO-* Information Gathering
WSTG-CONF-* Configuration & Deployment
WSTG-IDNT-* Identity Management
WSTG-ATHN-* Authentication
WSTG-ATHZ-* Authorization
WSTG-SESS-* Session Management
WSTG-INPV-* Input Validation
WSTG-ERRH-* Error Handling
WSTG-CRYP-* Cryptography
WSTG-BUSL-* Business Logic
WSTG-CLNT-* Client-side
WSTG-APIT-* API Testing
The OWASP Top 10 is an awareness document that ranks the ten most critical web risk categories; WSTG is the methodology you use to actually find instances of those risks. They are complementary, not competing. The Top 10 answers "what should I worry about," WSTG answers "how do I test for it," and ASVS answers "what requirement must this app meet."
In practice you test with WSTG and report against both. A SQL injection found via WSTG-INPV-05 maps to A03:2021 Injection in the OWASP Top 10, and you attach a CVSS score for severity. That triple, WSTG ID plus Top 10 category plus CVSS, is what separates a mature finding from a vague claim, and it is the structure clients and auditors expect to see.
Take WSTG-INPV-01, Testing for Reflected XSS. You find a parameter that echoes into the response, inject a context-breaking probe, and capture the request and response together. The proof is not that your string came back, it is that it came back unencoded into an executable context. Here is the parameter q reflecting straight into the HTML body:
GET /search?q=%22%3E%3Csvg%20onload%3Dalert(document.domain)%3E HTTP/1.1
Host: shop.target.tld
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
...
<input name="q" value=""><svg onload=alert(document.domain)>"> <-- payload broke out of the value="" attributeThat unencoded <svg onload> landing inside the markup, not as text, is the finding. Contrast it with a server that does its job: the same probe comes back inert, which is itself reportable assurance. Capturing both the request and the raw response, and pointing at the telltale line, is what makes a WSTG result reproducible rather than a screenshot of a popup.
GET /search?q=%22%3E%3Csvg%20onload%3Dalert(1)%3E HTTP/1.1
Host: shop.target.tld
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
...
<input name="q" value=""><svg onload=alert(1)>"> <-- correctly entity-encoded, NOT a finding
Work the categories in order, map each test to a tool or manual technique, and log the WSTG ID against every result whether it passes, fails, or is not applicable. That last part matters: an auditable matrix of pass/fail/N-A per ID is what a client pays for, not a pile of scanner alerts. Start with passive Information Gathering using whatweb and Wappalyzer, fingerprint the stack under CONF, then move into the authenticated surface.
The bulk of high-severity findings cluster in two places. INPV is where injection and XSS live (Burp Suite Intruder, sqlmap, ffuf for content discovery). ATHZ and BUSL are where the money is: insecure direct object references, horizontal privilege escalation, and workflow abuse that no payload reveals. For access control, lean on Burp's Autorize extension to replay each request as a lower-privileged session and diff the responses. WSTG is the manual backbone, but keeping that coverage current between point-in-time engagements is the case for agentic pentesting that chains automated and reasoning-based checks continuously.
WSTG forces you to test the categories scanners are structurally blind to: Authorization (ATHZ), Business Logic (BUSL), and Identity Management (IDNT), where every request is well-formed and only the context makes it abuse. A scanner can flag a missing HttpOnly flag under SESS, but it cannot reason that order #1043 belongs to another tenant, or that the password-reset flow lets you skip email verification.
On a recent assessment of a B2B logistics portal, the automated pass came back clean. Manually following WSTG-ATHZ-02, we incremented a numeric invoiceId in a JSON body and the API returned a different customer's billing record with a 200 and no error, a horizontal IDOR worth more than every INPV finding combined. Watch the false positives too: a parameter reflected into a sanitized template is not XSS, and a verbose stack trace under ERRH is only a finding if it discloses something exploitable. Map the negatives as carefully as the positives, because a passed WSTG-CRYP-01 with TLS 1.3 and HSTS is assurance the client can show an auditor.
A WSTG-anchored report ranks findings by CVSS, ties each to its test ID and Top 10 category, and gives evidence a reviewer can reproduce. The findings table below is the shape of what lands in the executive summary; the detail pages then carry the full request and response. For more on what reviewers expect to see, our guide on the key elements of a pentest report walks through the structure.
The trap is mistaking tool output for coverage. A clean Burp Scanner pass tells you the input-validation surface looks fine; it says nothing about WSTG-BUSL or WSTG-ATHZ, where the test is whether a valid request should have been allowed. Map every WSTG ID you intend to cover before you start, and you get a defensible matrix. For when to reach for each tool, see our breakdown of web application penetration testing tools.