Social Engineering Penetration Testing Guide

Likhil ChekuriDecember 21, 20258 min read

Authors

Likhil Chekuri

TL;DR

✓The Verizon DBIR consistently attributes the majority of breaches to a human element, which is why a test that skips people skips the most-used entry point.
✓A pretext built from real OSINT (a current vendor, a real project name, a recent reorg) beats a generic lure every time.
✓The whole exercise maps to MITRE ATT&CK initial access, primarily Phishing (T1566), so the blue team can measure detection against the exact techniques you ran.
✓Report rate and mean time to report matter more than click rate, because resilience is detection and response, not a perfect zero-click score.
✓Everything runs under written rules of engagement, in aggregate, with no operational fraud and a hard de-escalation rule on physical work.

Year after year, the Verizon Data Breach Investigations Report attributes the majority of breaches to a human element: a clicked link, a convincing phone call, someone holding a door. Attackers still prefer the path of least resistance, and that path runs through people. A security program that hardens servers but never tests the human layer is defending the wrong wall.

Social engineering penetration testing measures whether your people and processes hold up against those techniques, under written authorization and aggregate reporting so no individual is punished. This guide walks the engagement as it actually runs: building a believable pretext from open-source intelligence, standing up an authorized GoPhish campaign, mapping the work to MITRE ATT&CK, and reading the metrics that separate real resilience from a vanity zero-click number.

Table of contents

What is social engineering penetration testing?
How do you build a pretext from OSINT?
How do you run an authorized GoPhish campaign?
Social engineering is the initial-access stage of MITRE ATT&CK
Why do rules of engagement matter more here than anywhere else?
How do you measure whether the organization is actually improving?

Social engineering penetration testing is an authorized assessment that uses deception to test whether employees and processes can be manipulated into granting access, revealing credentials, or performing harmful actions. It targets the human layer rather than (or before) the technical one, simulating how a real attacker gains an initial foothold across four channels: phishing, vishing, smishing, and physical pretexting.

On its own it produces a phishing or pretexting assessment; combined with technical exploitation and post-compromise activity it becomes a full red team assessment. It complements rather than replaces the technical work across the types of penetration testing an organization runs, because the human foothold is what the rest of the attack chain builds on.

How do you build a pretext from OSINT?

A pretext is the believable story behind the attack, and its quality is the single biggest predictor of success. You build it from open-source intelligence the target can verify, so when someone checks, the story holds. The aim is not to fabricate something elaborate; it is to anchor the lure to something real and current the recipient already half-expects.

A solid pretext build for an authorized engagement layers four verifiable signals into one coherent story:

PRETEXT BUILD (authorized engagement)
  Org structure   -> LinkedIn: target's manager, team, recent reorg
  Live context    -> press release: ongoing vendor migration (real project name)
  Tech footprint  -> job posts naming the exact SSO/VPN product in use
  Lookalike       -> registered domain matching the brand, valid TLS
  ----------------------------------------------------------------
  Lure            -> "IT: action required before the [real project] cutover"

The lure lands because every element checks out: the manager is real, the project is real, the SSO product is the one they actually use. Generic "your account is locked" blasts get reported; a message that references this week's real migration gets clicked. The same psychological levers (authority, urgency, familiarity) drive every channel, only the delivery changes between email, phone, and SMS.

Social engineering channels at a glance

Channel	Delivery	Primary control tested	ATT&CK mapping
Phishing	Email lure	Filtering, user reporting, MFA	T1566.001 / .002
Vishing	Phone call	Helpdesk identity verification	T1566 (voice variant)
Smishing	SMS message	Mobile awareness, link filtering	T1566.003
Physical	On-site presence	Access control, reception, tailgating	T1078 / T1133 follow-on

How do you run an authorized GoPhish campaign?

For email phishing simulations, GoPhish handles template design, sending, landing pages, and per-target tracking of opens, clicks, and submissions. It runs as a single binary with an admin API, so an authorized campaign is mostly configuration: a sending profile pointed at your relay, an email template carrying a tracking pixel and link, a landing page, and an in-scope target group. You launch it, then read the results table per target.

$ ./gophish    # admin UI on :3333, then drive via API against the in-scope group

# Campaign results (GoPhish dashboard export)
Target               Email Sent   Opened   Clicked   Submitted Data   Reported
a.lee@inscope.test   yes          yes      yes       yes              no
m.ross@inscope.test  yes          yes      no        no               yes   <- reported it
j.kim@inscope.test   yes          no       no        no               no
Totals: 3 sent / 2 opened / 1 clicked / 1 submitted / 1 reported

The columns are the whole story: who opened, who clicked, who actually submitted credentials, and crucially who reported the message. The m.ross row is the win you want to see more of: opened but reported instead of clicking. Where the target has MFA, reverse-proxy phishing frameworks can capture session tokens and sidestep the second factor, which is precisely why the recommended fix in nearly every report is phishing-resistant authentication (FIDO2/WebAuthn). We keep operational tradecraft out of client reports; the point that lands is the control gap, not the recipe. This is the same continuous-validation thinking we cover for technical testing in our DAST versus pentesting versus agentic pentesting guide.

War story

On a logistics-firm engagement we skipped the generic lure entirely and anchored the email to a real vendor migration named in the company's own press release. Click rate jumped from a typical 8% to 31% in one send, purely because the pretext was verifiable.

Social engineering maps to the MITRE ATT&CK initial-access tactic, which describes how adversaries get their first foothold. Phishing (T1566) is the headline technique, with sub-techniques for spearphishing attachment (T1566.001), link (T1566.002), and via service (T1566.003). Framing the test against ATT&CK lets the blue team measure detection and response against the exact techniques you ran, not a vague "phishing" category.

That mapping also connects the human entry point to everything downstream: once initial access succeeds, a real adversary moves to execution, persistence, and lateral movement. Reporting the technique IDs lets the defenders trace which detections fired and which did not at each step. The continuity is why social engineering belongs inside an adversary-emulation program rather than as a standalone stunt, the same logic behind the staged approach in our red team methodology.

T1566.001 spearphishing attachment: a weaponized document as the lure.
T1566.002 spearphishing link: credential-capture or payload delivery via URL.
T1566.003 spearphishing via service: lures over LinkedIn, SMS, or chat rather than corporate email.

Why do rules of engagement matter more here than anywhere else?

Because social engineering targets people, it creates legal, ethical, and HR risk that technical testing usually does not. Written authorization is non-negotiable, and the rules of engagement must define exactly what is in scope, what is off-limits, and how the assessment protects the individuals involved. The aim is always to measure the organization, never to entrap or punish an individual, so findings are reported in aggregate.

At minimum, agree on the in-scope channels and target groups, hard exclusions (no threats, no real harm, no exploiting personal crises like layoffs or bereavement), how captured credentials and PII are stored and destroyed, and for physical work a safe word, live emergency contacts, and a get-out-of-jail letter. The first-person rule I never break on physical engagements: if challenged, de-escalate and produce the letter rather than push the pretext, because no finding is worth a confrontation with real security or police. Vishing against the helpdesk is the test organizations most underestimate; a confident caller who knows an employee's manager and start date can often talk an agent into a reset, which is why callback-based identity verification is the control that pays back fastest.

How do you measure whether the organization is actually improving?

Measure the funnel, not just the failures. For phishing the core rates show how far an attack progressed, but the most important number is the report rate, because real resilience is detection and response, not a perfect score. A high click rate paired with fast, high reporting is a healthier outcome than a low click rate with near-zero reporting.

click rate          = unique clicks / messages delivered
credential rate     = credential submissions / messages delivered
report rate         = reports to security / messages delivered
mean time to report = median minutes from delivery to first report

The trap is treating a 0% click rate as the goal; that only trains people to fear every email and teaches you nothing about response. Track trends across repeated tests rather than obsessing over one campaign, segment by department to target training, and feed results into both behavioral training and technical fixes: phishing-resistant MFA, better email authentication (DMARC, DKIM, SPF), and tighter filtering. Tie remediation into the same queue you use for findings across your other manual and automated testing, so the human layer is governed like every other control.

Phishing funnel: what good looks like

<10%

click rate on a mature program

>60%

report rate is the real target

<10 min

mean time to first report

100%

aggregate, no individual named

Frequently asked questions

What is social engineering penetration testing?

It is an authorized security assessment that uses deception to test whether employees and processes can be tricked into granting access, sharing credentials, or taking harmful actions. It targets the human layer through phishing, vishing, smishing, and physical pretexting, simulating how real attackers gain an initial foothold, and it always runs under written rules of engagement with aggregate reporting.

What is the difference between phishing, vishing, and smishing?

Phishing is delivered by email, vishing by phone call, and smishing by SMS. All three use the same psychological levers (authority, urgency, familiarity), but each tests different controls: email filtering and user reporting for phishing, helpdesk identity verification for vishing, and mobile awareness plus link filtering for smishing. A good program tests more than one because attackers do.

How do you build a phishing pretext ethically?

Build it from open-source intelligence the target can verify (a real manager, a current project, the actual SSO product), then anchor the lure to something genuine the recipient already half-expects. Keep it strictly within the authorized scope, avoid exploiting personal crises, and never include real threats or coercion. The goal is to test a realistic scenario, not to maximize harm.

Can social engineering bypass multi-factor authentication?

Yes. Reverse-proxy phishing frameworks sit between the victim and the real login page and can capture the authenticated session token, which defeats most app-based and SMS MFA. The durable fix is phishing-resistant authentication such as FIDO2 or WebAuthn hardware keys, which bind the credential to the real domain and cannot be relayed.

How is a social engineering test kept legal and ethical?

Through signed rules of engagement that define scope, target groups, and hard exclusions, plus a get-out-of-jail letter, safe word, and emergency contacts for physical work. Captured credentials and PII are encrypted, minimized, and destroyed on a set schedule, results are reported in aggregate so no individual is punished, and assessors de-escalate rather than push a pretext when challenged.

What metrics matter in a phishing assessment?

Click rate and credential-submit rate show how far an attack progressed, but the report rate and mean time to report are the most telling, because resilience is about detection and response. A program with a higher click rate but fast, widespread reporting is healthier than one with a low click rate and almost no reporting. Track trends across repeated tests, not a single campaign.

How does social engineering relate to MITRE ATT&CK?

It maps to the initial-access tactic, primarily Phishing (T1566) with sub-techniques for spearphishing attachment, link, and via service. Mapping the test to ATT&CK lets the blue team measure detection against the exact techniques you ran and connects the human foothold to the downstream execution, persistence, and lateral movement an adversary would attempt next.

Sources and references

Likhil Chekuri

Application Security Engineer, Strobes

Likhil Chekuri is an AppSec engineer at Strobes who has run hundreds of web, mobile, and cloud penetration tests for regulated industries.

Back to Blog

Offensive Security

Social Engineering Penetration Testing Guide

Likhil ChekuriDecember 21, 20258 min read

Authors

Likhil Chekuri

TL;DR

✓The Verizon DBIR consistently attributes the majority of breaches to a human element, which is why a test that skips people skips the most-used entry point.
✓A pretext built from real OSINT (a current vendor, a real project name, a recent reorg) beats a generic lure every time.
✓The whole exercise maps to MITRE ATT&CK initial access, primarily Phishing (T1566), so the blue team can measure detection against the exact techniques you ran.
✓Report rate and mean time to report matter more than click rate, because resilience is detection and response, not a perfect zero-click score.
✓Everything runs under written rules of engagement, in aggregate, with no operational fraud and a hard de-escalation rule on physical work.

Table of contents

What is social engineering penetration testing?
How do you build a pretext from OSINT?
How do you run an authorized GoPhish campaign?
Social engineering is the initial-access stage of MITRE ATT&CK
Why do rules of engagement matter more here than anywhere else?
How do you measure whether the organization is actually improving?

How do you build a pretext from OSINT?

A solid pretext build for an authorized engagement layers four verifiable signals into one coherent story:

PRETEXT BUILD (authorized engagement)
  Org structure   -> LinkedIn: target's manager, team, recent reorg
  Live context    -> press release: ongoing vendor migration (real project name)
  Tech footprint  -> job posts naming the exact SSO/VPN product in use
  Lookalike       -> registered domain matching the brand, valid TLS
  ----------------------------------------------------------------
  Lure            -> "IT: action required before the [real project] cutover"

Social engineering channels at a glance

Channel	Delivery	Primary control tested	ATT&CK mapping
Phishing	Email lure	Filtering, user reporting, MFA	T1566.001 / .002
Vishing	Phone call	Helpdesk identity verification	T1566 (voice variant)
Smishing	SMS message	Mobile awareness, link filtering	T1566.003
Physical	On-site presence	Access control, reception, tailgating	T1078 / T1133 follow-on

How do you run an authorized GoPhish campaign?

$ ./gophish    # admin UI on :3333, then drive via API against the in-scope group

# Campaign results (GoPhish dashboard export)
Target               Email Sent   Opened   Clicked   Submitted Data   Reported
a.lee@inscope.test   yes          yes      yes       yes              no
m.ross@inscope.test  yes          yes      no        no               yes   <- reported it
j.kim@inscope.test   yes          no       no        no               no
Totals: 3 sent / 2 opened / 1 clicked / 1 submitted / 1 reported

War story

T1566.001 spearphishing attachment: a weaponized document as the lure.
T1566.002 spearphishing link: credential-capture or payload delivery via URL.
T1566.003 spearphishing via service: lures over LinkedIn, SMS, or chat rather than corporate email.

Why do rules of engagement matter more here than anywhere else?

How do you measure whether the organization is actually improving?

click rate          = unique clicks / messages delivered
credential rate     = credential submissions / messages delivered
report rate         = reports to security / messages delivered
mean time to report = median minutes from delivery to first report

Phishing funnel: what good looks like

<10%

click rate on a mature program

>60%

report rate is the real target

<10 min

mean time to first report

100%

aggregate, no individual named

Frequently asked questions

What is social engineering penetration testing?

What is the difference between phishing, vishing, and smishing?

How do you build a phishing pretext ethically?

Can social engineering bypass multi-factor authentication?

How is a social engineering test kept legal and ethical?

What metrics matter in a phishing assessment?

How does social engineering relate to MITRE ATT&CK?

Sources and references

Likhil Chekuri

Application Security Engineer, Strobes

Likhil Chekuri is an AppSec engineer at Strobes who has run hundreds of web, mobile, and cloud penetration tests for regulated industries.

Social Engineering Penetration Testing Guide

Table of Contents

Authors

Share

How do you build a pretext from OSINT?

How do you run an authorized GoPhish campaign?

Why do rules of engagement matter more here than anywhere else?

How do you measure whether the organization is actually improving?

Frequently asked questions

Sources and references

Social Engineering Penetration Testing Guide

Table of Contents

Authors

Share

How do you build a pretext from OSINT?

How do you run an authorized GoPhish campaign?

Why do rules of engagement matter more here than anywhere else?

How do you measure whether the organization is actually improving?

Frequently asked questions

Sources and references

Table of Contents

Authors

Share

What is social engineering penetration testing?

How do you build a pretext from OSINT?

How do you run an authorized GoPhish campaign?

Social engineering is the initial-access stage of MITRE ATT&CK

Why do rules of engagement matter more here than anywhere else?

How do you measure whether the organization is actually improving?

Frequently asked questions

Sources and references

Table of Contents

Authors

Share

What is social engineering penetration testing?

How do you build a pretext from OSINT?

How do you run an authorized GoPhish campaign?

Social engineering is the initial-access stage of MITRE ATT&CK

Why do rules of engagement matter more here than anywhere else?

How do you measure whether the organization is actually improving?

Frequently asked questions

Sources and references