
The fastest way to waste a week on an API pentest is to look for the one tool that does it all. There isn't one, and the reason is structural: the bugs that dominate API findings (BOLA, BFLA, mass assignment, broken logic) all depend on knowing who should have access, and no scanner has that model. So you build a kit by phase, with Burp Suite at the center as the intercepting proxy, and a handful of focused open-source tools filling discovery, spec fuzzing, token attacks, and GraphQL.
This guide names the tools we actually reach for, shows the real output each one produces, and is honest about where every one stops and a human has to take over. It is organized so you can map a tool to every phase of an API pentest checklist rather than collecting tools you never run.
For discovery you want tools that enumerate endpoints, versions, and parameters from both specs and brute force, because missed endpoints (and old versions) are where breaches hide. The goal is a complete inventory before you send a single payload.
The workhorses are ffuf for raw content discovery, Kiterunner for API-aware route brute forcing, and Postman for importing the spec into a working collection. Kiterunner earns its place over a generic fuzzer because it replays Assetnote's route database with the correct method and content type:
$ kr scan https://api.target.com -w routes-large.kite
GET 200 /api/v2/users/me
GET 403 /api/v2/admin/users # <- exists, gated: revisit in BFLA
POST 200 /api/v2/internal/users # <- a path-only GET fuzzer marks this 405 and drops it
GET 401 /api/v1/accounts/{id} # <- old version still live (API9)The 403 and that undocumented POST are the lines worth your attention. The practical move is to run ffuf for breadth, Kiterunner for API-aware depth, then diff both against the imported OpenAPI file to surface the routes that never made it into the docs.
Burp Suite is the default intercepting proxy for API testing, with mitmproxy as the scriptable, CLI-first alternative. You need a proxy to capture, modify, and replay every request and to feed traffic into the rest of your kit.
The single most useful Burp extension for API work is Autorize: it replays everything you do with a second account's token and flags any request that should have been a 403 but came back 200. That one extension is your fastest path to BOLA and BFLA. For mobile apps that ignore the system proxy, see intercepting proxy-unaware traffic.
For authentication you want token-focused tools, not a general scanner, because broken auth (API2) is the master key to every other category. Two tools carry the phase: jwt_tool for the structural attacks (alg:none, signature stripping, kid injection) and hashcat for cracking a weak HMAC secret offline.
$ hashcat -a 0 -m 16500 token.jwt rockyou.txt # -m 16500 = JWT mode
eyJhbGciOiJIUzI1NiJ9...<snip>:devsecret123 # <- secret recovered in seconds
Status...........: CrackedThat single recovered string lets you re-sign any claim you want, including "role":"admin". jwt_tool handles the rest of the battery and tells you immediately when a server honors an unsigned token. The thing no token tool decides for you is whether the escalated claim actually unlocks something sensitive, which is where the human reads the response.
For breadth across an entire API, drive the spec through Schemathesis, which generates test cases directly from an OpenAPI or GraphQL schema and catches 500s and contract violations at scale. It is the closest thing to push-button coverage, and it is excellent at flushing out the endpoints worth a manual look:
$ schemathesis run https://api.target.com/openapi.json \
--checks all -H "Authorization: Bearer <token>"
FAILED GET /api/v1/search status_code_conformance # 500 on empty q
FAILED POST /api/v1/orders response_schema # <- leaks internal_cost field (BOPLA lead)
2 passed, 2 failedThat response_schema failure is a lead: the endpoint returned a field the spec never declared, which is exactly the excessive-data-exposure pattern worth chasing by hand. Pair Schemathesis with Burp Intruder or ffuf for targeted parameter fuzzing on the endpoints it flags. Schemathesis finds the crash; you decide whether the crash is a vulnerability.
Two practical caveats save hours. First, Schemathesis is only as good as the spec, so an API whose OpenAPI file is stale or hand-edited will leave whole branches untested; feed it the spec you reconstructed from live traffic, not the one in the repo. Second, treat every 500 as a lead, not a finding. A stack trace in the body is a real issue (API8, security misconfiguration), but a generic 500 on malformed input is often just brittle validation. The tool gives you the haystack; the needle is still a human judgment call.
GraphQL needs its own toolchain because a single endpoint hides the entire schema behind introspection. The kit centers on fingerprinting, schema recovery, and IDE-style query building: graphw00f fingerprints the engine (Apollo, Hasura, graphql-ruby), InQL parses introspection inside Burp and generates queries, Clairvoyance recovers the schema even when introspection is disabled by abusing field-suggestion error messages, and GraphQL Voyager visualizes the type graph.
The workflow ties them together: graphw00f tells you the engine, an introspection query (or Clairvoyance, if it's off) recovers the schema, InQL turns that schema into ready-to-fire queries, and Voyager shows which types stitch into which so you know where nested-authorization gaps will live. From there the GraphQL-specific abuse (aliasing hundreds of operations into one request to defeat a rate limit, deeply nested queries for cost limits, batching to bypass login throttles) is hand-crafted in Repeater, because no scanner understands the schema relationships. Full coverage is in our GraphQL security testing guide, and the OWASP API methodology covers the why behind each test.
Automated scanners miss every bug that depends on knowing who should have access, which is most of the serious ones. A DAST tool fires payloads and grades responses; it has no model of tenancy, ownership, or business state, so it cannot tell that a 200 returned tenant B's invoice, and it will not invent a second account to compare against.
On a recent assessment, a client's nightly DAST run had been green for a year while a standard user token could read every other tenant's records by incrementing one integer. The scanner never had two accounts, so it never saw the bug. The findings table below shows which tool surfaced which class of finding on that engagement, and where the tool stopped and a human had to confirm.
On the open-source question: use mostly open source, with Burp Suite Professional as the one commercial anchor. The essential kit (ffuf, Kiterunner, mitmproxy, Schemathesis, jwt_tool, hashcat, InQL, Clairvoyance, graphw00f) is free and actively maintained. Spend money where it buys workflow (Burp Pro's Repeater, Intruder, extensions) and invest human or agentic time where judgment is required. Point-in-time tool runs also miss drift between tests, which is why the automated versus manual debate increasingly lands on continuous, agentic approaches for APIs.