
Year after year, the Verizon Data Breach Investigations Report attributes the majority of breaches to a human element: a clicked link, a convincing phone call, someone holding a door. Attackers still prefer the path of least resistance, and that path runs through people. A security program that hardens servers but never tests the human layer is defending the wrong wall.
Social engineering penetration testing measures whether your people and processes hold up against those techniques, under written authorization and aggregate reporting so no individual is punished. This guide walks the engagement as it actually runs: building a believable pretext from open-source intelligence, standing up an authorized GoPhish campaign, mapping the work to MITRE ATT&CK, and reading the metrics that separate real resilience from a vanity zero-click number.
Social engineering penetration testing is an authorized assessment that uses deception to test whether employees and processes can be manipulated into granting access, revealing credentials, or performing harmful actions. It targets the human layer rather than (or before) the technical one, simulating how a real attacker gains an initial foothold across four channels: phishing, vishing, smishing, and physical pretexting.
On its own it produces a phishing or pretexting assessment; combined with technical exploitation and post-compromise activity it becomes a full red team assessment. It complements rather than replaces the technical work across the types of penetration testing an organization runs, because the human foothold is what the rest of the attack chain builds on.
A pretext is the believable story behind the attack, and its quality is the single biggest predictor of success. You build it from open-source intelligence the target can verify, so when someone checks, the story holds. The aim is not to fabricate something elaborate; it is to anchor the lure to something real and current the recipient already half-expects.
A solid pretext build for an authorized engagement layers four verifiable signals into one coherent story:
PRETEXT BUILD (authorized engagement)
Org structure -> LinkedIn: target's manager, team, recent reorg
Live context -> press release: ongoing vendor migration (real project name)
Tech footprint -> job posts naming the exact SSO/VPN product in use
Lookalike -> registered domain matching the brand, valid TLS
----------------------------------------------------------------
Lure -> "IT: action required before the [real project] cutover"The lure lands because every element checks out: the manager is real, the project is real, the SSO product is the one they actually use. Generic "your account is locked" blasts get reported; a message that references this week's real migration gets clicked. The same psychological levers (authority, urgency, familiarity) drive every channel, only the delivery changes between email, phone, and SMS.
For email phishing simulations, GoPhish handles template design, sending, landing pages, and per-target tracking of opens, clicks, and submissions. It runs as a single binary with an admin API, so an authorized campaign is mostly configuration: a sending profile pointed at your relay, an email template carrying a tracking pixel and link, a landing page, and an in-scope target group. You launch it, then read the results table per target.
$ ./gophish # admin UI on :3333, then drive via API against the in-scope group
# Campaign results (GoPhish dashboard export)
Target Email Sent Opened Clicked Submitted Data Reported
a.lee@inscope.test yes yes yes yes no
m.ross@inscope.test yes yes no no yes <- reported it
j.kim@inscope.test yes no no no no
Totals: 3 sent / 2 opened / 1 clicked / 1 submitted / 1 reportedThe columns are the whole story: who opened, who clicked, who actually submitted credentials, and crucially who reported the message. The m.ross row is the win you want to see more of: opened but reported instead of clicking. Where the target has MFA, reverse-proxy phishing frameworks can capture session tokens and sidestep the second factor, which is precisely why the recommended fix in nearly every report is phishing-resistant authentication (FIDO2/WebAuthn). We keep operational tradecraft out of client reports; the point that lands is the control gap, not the recipe. This is the same continuous-validation thinking we cover for technical testing in our DAST versus pentesting versus agentic pentesting guide.
Social engineering maps to the MITRE ATT&CK initial-access tactic, which describes how adversaries get their first foothold. Phishing (T1566) is the headline technique, with sub-techniques for spearphishing attachment (T1566.001), link (T1566.002), and via service (T1566.003). Framing the test against ATT&CK lets the blue team measure detection and response against the exact techniques you ran, not a vague "phishing" category.
That mapping also connects the human entry point to everything downstream: once initial access succeeds, a real adversary moves to execution, persistence, and lateral movement. Reporting the technique IDs lets the defenders trace which detections fired and which did not at each step. The continuity is why social engineering belongs inside an adversary-emulation program rather than as a standalone stunt, the same logic behind the staged approach in our red team methodology.
Because social engineering targets people, it creates legal, ethical, and HR risk that technical testing usually does not. Written authorization is non-negotiable, and the rules of engagement must define exactly what is in scope, what is off-limits, and how the assessment protects the individuals involved. The aim is always to measure the organization, never to entrap or punish an individual, so findings are reported in aggregate.
At minimum, agree on the in-scope channels and target groups, hard exclusions (no threats, no real harm, no exploiting personal crises like layoffs or bereavement), how captured credentials and PII are stored and destroyed, and for physical work a safe word, live emergency contacts, and a get-out-of-jail letter. The first-person rule I never break on physical engagements: if challenged, de-escalate and produce the letter rather than push the pretext, because no finding is worth a confrontation with real security or police. Vishing against the helpdesk is the test organizations most underestimate; a confident caller who knows an employee's manager and start date can often talk an agent into a reset, which is why callback-based identity verification is the control that pays back fastest.
Measure the funnel, not just the failures. For phishing the core rates show how far an attack progressed, but the most important number is the report rate, because real resilience is detection and response, not a perfect score. A high click rate paired with fast, high reporting is a healthier outcome than a low click rate with near-zero reporting.
click rate = unique clicks / messages delivered
credential rate = credential submissions / messages delivered
report rate = reports to security / messages delivered
mean time to report = median minutes from delivery to first reportThe trap is treating a 0% click rate as the goal; that only trains people to fear every email and teaches you nothing about response. Track trends across repeated tests rather than obsessing over one campaign, segment by department to target training, and feed results into both behavioral training and technical fixes: phishing-resistant MFA, better email authentication (DMARC, DKIM, SPF), and tighter filtering. Tie remediation into the same queue you use for findings across your other manual and automated testing, so the human layer is governed like every other control.