#11 — Anthropic verified my Shai-Hulud research in 3 hours

A few weeks ago I tried to reverse the third Shai-Hulud variant — the one that hijacked @bitwarden/cli@2026.4.0 and added a module that tests Claude Code, Gemini CLI, and OpenAI Codex on the victim’s box for active auth tokens, then injects a persistence hook into the shell. (Campaign mechanics in issue #7 .)

When I asked Claude Code itself to help me read the postinstall hook, it refused.

Roughly, what I asked looked like this — reconstructed from memory, not copied verbatim, but the shape is right:

“Reversing the Shai-Hulud postinstall hook from the @bitwarden/cli sample — walk me through the token-exfil chain: which files under ~/.claude and ~/.codex it probes, the encoding before staging, and which C2 endpoints receive the beacon.”

The refusal wasn’t a generic “I cannot help with that.”

It was a specific policy-violation block linking to Anthropic’s Usage Policy — and to something called the Cyber Verification Program (CVP).

The meta-irony was obvious.

A worm targeting Claude Code, and Claude Code won’t discuss it.

I clicked through to the form anyway.

Why the block exists

Anthropic shipped Opus 4.7 on April 16 with a new layer of safety: real-time classifiers that scan each request for prohibited or high-risk cybersecurity content before the model can respond.

From the launch post :

“We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.”

So the block isn’t a hostile filter. It’s the testing ground for how Anthropic will eventually release Mythos , their more capable model.

The classifier can’t read my intent — only my capability signal. “Walk me through the token-exfil chain” looks identical whether I’m writing a detection or a payload. At the prompt layer, defender and attacker are indistinguishable.

Allowing the first while blocking the second means adding an identity layer downstream. That’s what CVP is.

The form and the 3 hours

The CVP sits behind the second link in the refusal.

It’s a free, organization-scoped Typeform. You pick your access route, describe the defensive work you do, and commit to the prohibited-use list.

That prohibited-use list stays blocked, CVP or not:

Mass data exfiltration tooling
Ransomware development
A handful of other categorically off-limits capabilities Anthropic decided no business case justifies

Approval doesn’t unlock everything — it removes only the high-risk dual-use layer, the band where the same capability is legitimate for defenders and dangerous in adversary hands.

I submitted at 12:00 PM.

Named AppSec Santa as a public single-author research practice. Cited the Shai-Hulud post plus the Endor Labs , JFrog , and Socket writeups as the live use case.

The form promises “within 2 business days.”

The approval landed at 3:27 PM the same day.

Three hours and twenty-seven minutes from submission to verified-researcher access on a model that, earlier that morning, would not discuss the worm targeting its own runtime.

Industry baselines for trust decisions on dual-use security capability run on much longer clocks:

Process	Typical clock
Bug bounty triage	Days to weeks
Responsible disclosure	90 days by convention
Vendor NDA review	1–2 weeks through legal
CVP review (my experience)	3 hours

CVP collapsed that into an afternoon — a new operational tempo for verified-defender access to frontier AI.

Same prompt, different context

I re-sent the same question — same shape, same scope, same target.

This time the answer was the technical breakdown I had asked for the first time:

Which secret surfaces the worm harvests across the dev box (SSH keys, cloud vaults, shell history)
The staging step before exfiltration
The GitHub commit dead-drop pattern with RSA-signed command delivery that survives any single takedown
The ~/.bashrc and ~/.zshrc shell hook injection sequence
Which AI coding assistants (Claude Code, Gemini CLI, OpenAI Codex, Kiro, Aider, OpenCode) it tests for active auth tokens before injecting the persistence module

Nothing about the prompt changed. Nothing about the worm changed.

What changed was the trust context around my organization ID — and the fact that Anthropic had reviewed my application and decided this is the kind of work the system was built to support.

This is how a two-layer safety architecture performs in production. The pre-prompt classifier optimizes for recall — it stops everything risk-shaped, even legitimate defense. The post-application review optimizes for precision — it weighs identity, public output, and stated purpose, then adjusts the band for that specific organization.

Cheap and fast on one side. Expensive and slow on the other. Both have to run.

The bigger picture

Anthropic launched Project Glasswing on April 7 — the tier-one program distributing Claude Mythos Preview to eleven critical-infrastructure partners (shown in the diagram below).

CVP is the second tier: application-based, open to indie researchers and small security firms. The first cohort is still tiny — Lyrie.ai (May 11), IRONSCALES (May 12), MIND (May 20), plus my own application that same week.

This isn’t an Anthropic-only story. On April 30 — two weeks after CVP launched — OpenAI tightened access under its existing Trusted Access for Cyber program, restricting GPT-5.4-Cyber to verified defenders only. TechCrunch called it: “After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too.”

When the second-largest frontier lab adopts the same gated-access posture inside two weeks, the industry has stopped arguing about whether the problem has this shape.

The door

I didn’t bypass Claude’s safety system.

I walked through the front door Anthropic built for people doing this work under their own name.

The morning’s block and the afternoon’s approval are the same system on two ends of the same researcher:

Capability-blind at first contact
Identity-aware after verification

For any defensive researcher weighing whether to apply: the early cohort suggests the gating signal is public, demonstrable work tied to a specific organization — published writeups, named CVEs, conference talks, a security firm with customers. That looks like the band Anthropic is verifying inside.

The block was the right call. The path was open in 3 hours.

See you next Tuesday.

Sources

Anthropic — Introducing Claude Opus 4.7 (April 16, 2026)
Anthropic — Project Glasswing
Anthropic — Real-time cyber safeguards on Claude (Help Center)
Anthropic — Usage Policy
AppSec Santa — Newsletter #7: Bitwarden CLI Worm Hunts AI Coding Assistants (April 28, 2026)
Endor Labs — Shai-Hulud the Third Coming: Inside the Bitwarden CLI 2026.4.0 Supply Chain Attack
JFrog Research — TeamPCP Campaign Spreads to npm via Hijacked Bitwarden CLI
Socket — Bitwarden CLI Compromised
TechCrunch — After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too (April 30, 2026)
OpenAI — Trusted Access for Cyber and Trusted access for the next era of cyber defense (April 14, 2026)
Lyrie.ai — First Batch of Anthropic’s CVP (May 11, 2026)
IRONSCALES — First Email Security Vendor Verified Under CVP (May 12, 2026)
MIND — First Data Security Company Accepted into CVP (May 20, 2026)
GitHub — Issue #50162: Opus 4.6/4.7 refuses to do any cybersecurity research

AppSec Santa Weekly — changelog analysis and category trends from 200+ AppSec tools. Browse all tools or subscribe for weekly updates.