Secure Code Review: A Complete Guide

Shubham JhaMarch 6, 20268 min read

Authors

Shubham Jha

TL;DR

✓Secure code review reads source to find injection, broken auth, and unsafe deserialization that black-box testing never reaches because the dangerous path sits behind a flag or an admin role.
✓The whole exercise is one algorithm: find a dangerous sink, walk backward to an attacker-controlled source, and check whether a real sanitizer sits in between.
✓Semgrep matches syntactic patterns cheaply in CI; CodeQL proves reachability across functions by modeling sources, sinks, and sanitizers as a data-flow query.
✓Most exploitable bugs cluster in a few sinks: SQL string concatenation, shell command execution, and untrusted deserialization like pickle.loads.
✓Review the diff at pull-request time and block merge only on high-confidence rules, or developers stop reading the output entirely.

Veracode's 2024 State of Software Security report found that around 80% of applications carry at least one flaw on first scan, and that flaws left unfixed have a half-life measured in months, not days. That backlog does not come from exotic zero-days. It comes from the same handful of sinks getting concatenated, exec'd, and deserialized in pull request after pull request, faster than anyone reads them. Secure code review is how you catch those before they merge.

This is a working guide, not a definition. You will trace a single tainted parameter from an HTTP source all the way to a SQL sink, see three vulnerable-versus-fixed pairs, read a real Semgrep rule and the CodeQL source/sink/sanitizer model, and wire the whole thing into a PR gate developers will actually trust. The OWASP Code Review Guide is the reference standard underneath all of it.

Table of contents

Every review is the same backward walk from sink to source
Which vulnerable-versus-fixed pairs show up constantly?
How do Semgrep and CodeQL actually find these?
A findings excerpt from a real review
What can scanners never see on their own?
How do you gate reviews at pull-request time?

Every review is the same backward walk from sink to source

Strip away the tooling and secure code review is one repeatable move: find a dangerous operation (a sink), walk backward through the code to where the data came from (a source), and decide whether anything trustworthy sanitized it on the way. If an attacker-controlled source reaches a sink with no sanitizer between them, you have a finding. That backward walk is faster than reading top to bottom because there are far fewer sinks than inputs.

Sources are anything an attacker influences: query parameters, headers, cookies, JSON bodies, uploaded files, queue messages, and even rows read back from a database that was poisoned earlier. Sinks are where that data does damage: a SQL statement, an OS command, an HTML response, a file path, a deserializer. Sanitizers are the trustworthy things in between: parameterized queries, output encoding, allow-list validation, safe library APIs. Here is the walk made concrete on one request, annotated line by line:

# 1. SOURCE   attacker controls the raw value
order_id = request.args.get("order")        # ?order=5 OR 1=1

# 2. PROPAGATION   value flows untouched into a string
q = "SELECT total FROM orders WHERE id = " + order_id

# 3. (no sanitizer: nothing validates or binds order_id)

# 4. SINK   the string is parsed as SQL by the driver
cursor.execute(q)        # <- 'OR 1=1' makes the WHERE always true

Reading it backward: the sink is line 4, the source is line 1, and the gap where a parameterized query should live is empty. That is the entire judgment call, repeated for every sink in the diff. This is the same mental model that threat modeling with STRIDE applies one layer up at the design, where a missing trust boundary predicts exactly this kind of missing sanitizer.

Why reviews catch what tests miss

~80%

of apps carry at least one flaw on first scan (Veracode 2024)

CWE-89

SQL injection, still a top sink in source review

CWE-502

unsafe deserialization, RCE from one bad loads()

100x

rough cost gap between fixing in PR vs in production

Which vulnerable-versus-fixed pairs show up constantly?

Most exploitable bugs are not subtle. They are the same three sinks fed raw input, and learning their before/after shapes lets you spot them in seconds. First, SQL built by string concatenation versus a bound parameter. The driver only stops parsing input as code when you bind it.

# VULNERABLE: userId is spliced into the SQL text
userId = request.args.get("id")
query = "SELECT * FROM accounts WHERE id = '" + userId + "'"
cursor.execute(query)              # injection: id = ' OR '1'='1

# FIXED: value travels as data, never parsed as SQL
cursor.execute("SELECT * FROM accounts WHERE id = %s", (userId,))

Second, command injection. os.system hands a single string to /bin/sh, so any shell metacharacter in the input runs. The fix is to skip the shell entirely and pass an argument list, which removes word-splitting and metacharacter parsing.

# VULNERABLE: "x.png; rm -rf /" runs as two shell commands
filename = request.form["file"]
os.system("convert " + filename + " out.png")

# FIXED: no shell, args passed as a list (no word-splitting)
subprocess.run(["convert", filename, "out.png"], shell=False, check=True)

Third, unsafe deserialization. pickle.loads will execute code embedded in a crafted payload, so any pickle crossing a trust boundary is remote code execution waiting to happen. Switch to a data-only format that has no code-execution semantics.

# VULNERABLE: a malicious pickle runs code on load
data = pickle.loads(request.data)        # RCE sink

# FIXED: json parses data only, never instantiates objects
data = json.loads(request.data)

The path-traversal and request-forgery sinks follow the same pattern of raw input reaching a dangerous API, which is why a finding here usually predicts a finding when you later run SSRF testing against the running system.

A secure code review pass, end to end

Scope

Pin the languages, frameworks, and highest-risk modules: auth, payments, file handling, deserialization.

Map entry points

List trust boundaries and where untrusted input enters: routes, uploads, queue consumers, third-party callbacks.

Automated pass

Semgrep in CI for syntactic sinks, CodeQL for reachability across functions.

Manual sink walk

Read backward from each flagged sink, plus auth and logic the tools cannot see.

Triage and gate

Confirm true positives, drop noise, score with CVSS, file fixes with paste-ready snippets.

How do Semgrep and CodeQL actually find these?

Semgrep matches patterns that read like the code itself, and CodeQL runs interprocedural data-flow queries against a database it builds from your code. Use Semgrep for cheap, fast CI gates on known-bad shapes, and CodeQL when you need to prove a tainted path actually connects across several functions before you believe the finding.

A real Semgrep rule for the pickle sink is small enough to read in one breath. It declares the language, severity, the message a reviewer sees, and the pattern to match:

rules:
  - id: python-unsafe-pickle-load
    languages: [python]
    severity: ERROR
    message: >
      pickle.loads on untrusted data allows remote code execution.
      Use json.loads or a schema-validated format instead.
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"
      owasp: "A08:2021 Software and Data Integrity Failures"
    patterns:
      - pattern-either:
          - pattern: pickle.loads(...)
          - pattern: pickle.load(...)

That rule is syntactic: it fires on the call no matter where the data came from, which is why it can produce false positives on a pickle of trusted local data. CodeQL solves that by asking the harder question, does attacker-controlled data actually reach this sink? You model three things and let its data-flow library connect them:

// CodeQL taint-tracking model (conceptual)
// SOURCE    where untrusted data enters
//   e.g. a Flask request.args / request.form access
// SINK      the dangerous operation
//   e.g. an argument to cursor.execute() or os.system()
// SANITIZER a node that makes data safe, stopping the flow
//   e.g. an int() cast, an allow-list check, parameter binding
//
// CodeQL reports a result only when a path exists from a
// SOURCE to a SINK with NO SANITIZER on it, tracing across
// function and file boundaries automatically.

The practical difference: Semgrep tells you the dangerous call exists; CodeQL tells you it is reachable by an attacker. Reach for the first to block obvious patterns at scale, the second to confirm a high-severity finding before you page someone. Both reflect the broader split between fast automation and careful judgment covered in automated versus manual penetration testing.

A findings excerpt from a real review

Findings only matter if they reach a developer with severity, evidence, and a fix they can paste. Below is the shape of what lands at the end of a review: each row pairs a CVSS-scored issue with the exact line of evidence and the config-level remediation, not advice to "validate input." The compare visual after this section shows a representative excerpt from one such report.

The remediation column is the part teams skip and the part that actually closes the bug. "Use a parameterized query" becomes cursor.execute(sql, params); "avoid the shell" becomes shell=False with an argument list; "do not deserialize untrusted data" becomes json.loads plus a schema check. A finding without that concrete snippet gets reopened next sprint.

Findings excerpt from a review report

Finding	Severity (CVSS)	Evidence	Remediation
SQL injection in order lookup	9.1 Critical	orders.py:42 cursor.execute on concatenated order_id	Bind parameters: cursor.execute(sql, (order_id,))
Command injection in image convert	8.8 High	media.py:88 os.system("convert "+filename)	subprocess.run([...], shell=False, check=True)
Untrusted deserialization	9.8 Critical	tasks.py:17 pickle.loads(request.data)	Replace with json.loads + schema validation
Broken object-level authorization	8.1 High	transfer.py:60 trusts fromAccount from body	Derive account from session, verify ownership
Hardcoded API secret	7.5 High	config.py:12 STRIPE_KEY = "sk_live_..."	Move to secrets manager, rotate the leaked key

What can scanners never see on their own?

SAST is blind to anything that depends on intent rather than syntax: broken authorization, business-logic abuse, and whether a flagged path is reachable at all. A green scan is not proof of safety; it is proof that no pattern matched. That gap is exactly where a human reviewer earns the time.

On a recent review of a fintech payments service, every Semgrep and CodeQL rule passed clean. Reading the transfer endpoint by hand, we found it trusted a fromAccount ID supplied in the request body and never checked it against the authenticated session, so any user could move money out of any account by changing one number. No tool flags that, because the code is syntactically perfect; only a reviewer who understands the trust model sees the missing ownership check. That class of broken object-level authorization overlaps directly with what surfaces during business logic and payment tampering testing on the running app.

The recurring human wins: authorization (the missing ownership check on an object), multi-step logic (a coupon applied twice, a state machine skipped), trust assumptions (a client-set role field believed server-side), and reachability (deciding whether a tool's flagged path can actually be hit). Budget your manual time on anything touching auth, crypto, deserialization, file handling, and money. Let the scanners own the rote sinks.

Strobes insight

A scanner points at sinks; it never makes the call. Every critical finding still needs a human to confirm the source is attacker-controlled and the sanitizer is genuinely missing, which is exactly the broken-auth bug no rule will ever match.

How do you gate reviews at pull-request time?

Run the security review on the diff, in the pull request, before merge, and block only on rules you trust completely. Reviewing the changed lines plus their immediate context is cheaper and more accurate than a quarterly full-repo audit, and the feedback lands while the author still has the change in their head.

A setup that holds up in practice: Semgrep runs on every PR and posts findings as inline comments, with merge blocked only on a curated high-confidence ruleset (the three sinks above, hardcoded secrets, disabled TLS verification). CodeQL runs on a schedule or on the default branch for the heavier interprocedural analysis, where a few minutes of runtime is acceptable. A human reviewer is auto-requested on any diff touching auth, crypto, deserialization, or query building, tool-flagged or not. Keep PRs small so the security-relevant line is not buried in a thousand-line refactor.

The failure mode to avoid is a noisy gate. The moment a scanner blocks merges on findings developers do not believe, they blanket-mute it and you are worse off than with no scanner. Tune the blocking set down, encode your framework's safe wrappers as custom rules so they stop firing, and feed every confirmed bug back as a regression rule. This per-change, continuous posture is the same one behind agentic pentesting, which keeps assessment running against every build instead of once a quarter.

Frequently asked questions

What is the difference between secure code review and SAST?

SAST is the automated tooling that scans source with static rules; secure code review is the human-led process that uses SAST as one input. SAST gives breadth and consistency, while manual review adds the judgment for authorization, business logic, and reachability that tools handle poorly. A mature program runs both and feeds confirmed bugs back as new rules.

What are sources and sinks in code review?

A source is any point where attacker-controllable data enters, such as a request parameter, header, or uploaded file. A sink is a dangerous operation where that data causes harm, like a SQL query, an OS command, or a deserializer. A vulnerability exists when data flows from a source to a sink with no sanitizer (parameter binding, output encoding, allow-list validation) in between.

Should I use Semgrep or CodeQL for secure code review?

Use Semgrep for fast, cheap CI gates on known-bad syntactic patterns, since its rules read like the code and run in seconds. Use CodeQL when you need to prove a tainted path is actually reachable across functions, because it builds a queryable database and runs interprocedural data-flow analysis. Many teams run Semgrep on every PR and CodeQL on the default branch.

When in the SDLC should code review happen?

Run the security review at pull-request time on the changed diff, before merge. Feedback is cheapest and most accurate while the author still has context, and it stops vulnerabilities from compounding. Periodic full-repo audits still help for legacy code that predates the practice, but the diff is where you get the most value per minute.

How do I reduce false positives in a SAST scan?

Confirm each finding by checking whether the source is genuinely attacker-controlled and the sanitizer is genuinely absent, then tune the ruleset. Encode your framework's safe wrappers as custom rules so they stop firing, suppress confirmed false positives with annotated reasons, and only block merges on high-confidence rules. A gate developers do not trust gets blanket-muted, which is worse than no gate.

Does secure code review replace penetration testing?

No. Code review reads logic you cannot always reach from outside, while penetration testing exercises the running system including configuration, deployment, and chained exploits. The two find overlapping but distinct issues, which is why mature programs do both and route findings between them. A broken-auth bug found in review, for example, is worth confirming with a live test.

Sources and references

Shubham Jha

Security Researcher, Strobes

Shubham Jha leads offensive security research at Strobes, focused on web and API exploitation and red team tradecraft.

Back to Blog

Application Security

Secure Code Review: A Complete Guide

Shubham JhaMarch 6, 20268 min read

Authors

Shubham Jha

TL;DR

✓Secure code review reads source to find injection, broken auth, and unsafe deserialization that black-box testing never reaches because the dangerous path sits behind a flag or an admin role.
✓The whole exercise is one algorithm: find a dangerous sink, walk backward to an attacker-controlled source, and check whether a real sanitizer sits in between.
✓Semgrep matches syntactic patterns cheaply in CI; CodeQL proves reachability across functions by modeling sources, sinks, and sanitizers as a data-flow query.
✓Most exploitable bugs cluster in a few sinks: SQL string concatenation, shell command execution, and untrusted deserialization like pickle.loads.
✓Review the diff at pull-request time and block merge only on high-confidence rules, or developers stop reading the output entirely.

Table of contents

Every review is the same backward walk from sink to source
Which vulnerable-versus-fixed pairs show up constantly?
How do Semgrep and CodeQL actually find these?
A findings excerpt from a real review
What can scanners never see on their own?
How do you gate reviews at pull-request time?

Every review is the same backward walk from sink to source

# 1. SOURCE   attacker controls the raw value
order_id = request.args.get("order")        # ?order=5 OR 1=1

# 2. PROPAGATION   value flows untouched into a string
q = "SELECT total FROM orders WHERE id = " + order_id

# 3. (no sanitizer: nothing validates or binds order_id)

# 4. SINK   the string is parsed as SQL by the driver
cursor.execute(q)        # <- 'OR 1=1' makes the WHERE always true

Why reviews catch what tests miss

~80%

of apps carry at least one flaw on first scan (Veracode 2024)

CWE-89

SQL injection, still a top sink in source review

CWE-502

unsafe deserialization, RCE from one bad loads()

100x

rough cost gap between fixing in PR vs in production

Which vulnerable-versus-fixed pairs show up constantly?

# VULNERABLE: userId is spliced into the SQL text
userId = request.args.get("id")
query = "SELECT * FROM accounts WHERE id = '" + userId + "'"
cursor.execute(query)              # injection: id = ' OR '1'='1

# FIXED: value travels as data, never parsed as SQL
cursor.execute("SELECT * FROM accounts WHERE id = %s", (userId,))

# VULNERABLE: "x.png; rm -rf /" runs as two shell commands
filename = request.form["file"]
os.system("convert " + filename + " out.png")

# FIXED: no shell, args passed as a list (no word-splitting)
subprocess.run(["convert", filename, "out.png"], shell=False, check=True)

# VULNERABLE: a malicious pickle runs code on load
data = pickle.loads(request.data)        # RCE sink

# FIXED: json parses data only, never instantiates objects
data = json.loads(request.data)

A secure code review pass, end to end

Scope

Pin the languages, frameworks, and highest-risk modules: auth, payments, file handling, deserialization.

Map entry points

List trust boundaries and where untrusted input enters: routes, uploads, queue consumers, third-party callbacks.

Automated pass

Semgrep in CI for syntactic sinks, CodeQL for reachability across functions.

Manual sink walk

Read backward from each flagged sink, plus auth and logic the tools cannot see.

Triage and gate

Confirm true positives, drop noise, score with CVSS, file fixes with paste-ready snippets.

How do Semgrep and CodeQL actually find these?

A real Semgrep rule for the pickle sink is small enough to read in one breath. It declares the language, severity, the message a reviewer sees, and the pattern to match:

rules:
  - id: python-unsafe-pickle-load
    languages: [python]
    severity: ERROR
    message: >
      pickle.loads on untrusted data allows remote code execution.
      Use json.loads or a schema-validated format instead.
    metadata:
      cwe: "CWE-502: Deserialization of Untrusted Data"
      owasp: "A08:2021 Software and Data Integrity Failures"
    patterns:
      - pattern-either:
          - pattern: pickle.loads(...)
          - pattern: pickle.load(...)

// CodeQL taint-tracking model (conceptual)
// SOURCE    where untrusted data enters
//   e.g. a Flask request.args / request.form access
// SINK      the dangerous operation
//   e.g. an argument to cursor.execute() or os.system()
// SANITIZER a node that makes data safe, stopping the flow
//   e.g. an int() cast, an allow-list check, parameter binding
//
// CodeQL reports a result only when a path exists from a
// SOURCE to a SINK with NO SANITIZER on it, tracing across
// function and file boundaries automatically.

A findings excerpt from a real review

Findings excerpt from a review report

Finding	Severity (CVSS)	Evidence	Remediation
SQL injection in order lookup	9.1 Critical	orders.py:42 cursor.execute on concatenated order_id	Bind parameters: cursor.execute(sql, (order_id,))
Command injection in image convert	8.8 High	media.py:88 os.system("convert "+filename)	subprocess.run([...], shell=False, check=True)
Untrusted deserialization	9.8 Critical	tasks.py:17 pickle.loads(request.data)	Replace with json.loads + schema validation
Broken object-level authorization	8.1 High	transfer.py:60 trusts fromAccount from body	Derive account from session, verify ownership
Hardcoded API secret	7.5 High	config.py:12 STRIPE_KEY = "sk_live_..."	Move to secrets manager, rotate the leaked key

What can scanners never see on their own?

Strobes insight

How do you gate reviews at pull-request time?

Frequently asked questions

What is the difference between secure code review and SAST?

What are sources and sinks in code review?

Should I use Semgrep or CodeQL for secure code review?

When in the SDLC should code review happen?

How do I reduce false positives in a SAST scan?

Does secure code review replace penetration testing?

Sources and references

Shubham Jha

Security Researcher, Strobes

Shubham Jha leads offensive security research at Strobes, focused on web and API exploitation and red team tradecraft.

Secure Code Review: A Complete Guide

Table of Contents

Authors

Share

Every review is the same backward walk from sink to source

Which vulnerable-versus-fixed pairs show up constantly?

How do Semgrep and CodeQL actually find these?

A findings excerpt from a real review

What can scanners never see on their own?

How do you gate reviews at pull-request time?

Frequently asked questions

Sources and references

Secure Code Review: A Complete Guide

Table of Contents

Authors

Share

Every review is the same backward walk from sink to source

Which vulnerable-versus-fixed pairs show up constantly?

How do Semgrep and CodeQL actually find these?

A findings excerpt from a real review

What can scanners never see on their own?

How do you gate reviews at pull-request time?

Frequently asked questions

Sources and references