Shield Blog Audit Registry GitHub

What the Claude Code Source Leak Teaches Us About AI Agent Security

By Joerg Michno · April 3, 2026 · 8 min read

512K

lines of TypeScript leaked via a missing .npmignore entry

On March 31, 2026, Anthropic accidentally shipped a 59.8 MB source map file inside the @anthropic-ai/claude-code npm package. Within hours, the complete client-side source code was extracted, analyzed, and forked 82,000+ times on GitHub — making it the fastest-growing repository in GitHub history.

The irony is hard to miss: a company that positions its AI as a code review expert leaked its own source code through a packaging oversight that any linter could have caught.

But the real story isn't the leak itself. It's what happened after — and what it reveals about the state of AI agent security.

What Was Inside

The source map contained 1,906 files spanning the complete agentic harness — the system that orchestrates Claude's tool use, permissions, and execution flow. Key findings:

Feature flags for unreleased capabilities

1,906

Files including all system prompts

Undercover Mode

~90 lines of code that strip all Anthropic references when Claude operates in public repositories. Removes "Co-Authored-By" attributions, hides the Claude Code identity. An AI operating in open-source projects without disclosing its nature.

Anti-Distillation via Decoy Tools

A feature flag (tengu_anti_distill_fake_tool_injection) that injects fake tool definitions into system prompts. Purpose: poison training data for competitors who scrape Claude's outputs to train their own models.

Internal Codenames

"Capybara" (Claude 4.6 variant), "Fennec" (Opus 4.6), "Numbat" (unreleased) — revealing Anthropic's model pipeline months ahead of announcements.

The Security Cascade

Within 72 hours of the leak, security researchers found real vulnerabilities in Claude Code's architecture. Two CVEs were assigned:

March 31 — The Leak

Source map discovered in npm package v2.1.88 by researcher Chaofan Shou. Root cause: Bun runtime generates source maps by default; .npmignore didn't exclude them.

April 1 — CVE-2025-59536

Remote Code Execution via malicious repository configurations. Attackers can craft repo configs, hooks, MCP server definitions, and environment variables that execute arbitrary code when Claude Code opens the project.

April 2 — CVE-2026-21852 (CVSS 5.3)

Information Disclosure: malicious repositories can exfiltrate Anthropic API keys through the project-load flow. Your API credentials — leaked by opening a repository.

April 2 — Malware Campaign

Fake "Claude Code Leak" repositories on GitHub delivering Vidar (infostealer: passwords, credit cards, browser data) and GhostSocks (RAT/proxy). Part of a broader campaign imitating 25+ software brands.

The MCP Attack Vector

Both CVEs share a common thread: MCP servers as the attack surface.

When an AI agent connects to an MCP server, it trusts that server's tool definitions, descriptions, and responses. A malicious MCP server can:

Attack	How It Works	Impact
Tool Poisoning	Hidden instructions in tool descriptions	Agent executes attacker-controlled actions
Credential Theft	MCP server reads env vars during handshake	API keys, tokens exfiltrated
Command Injection	Malicious input through unvalidated parameters	Arbitrary code execution on host
Prompt Injection	Tool responses contain adversarial instructions	Agent behavior hijacked

Adversa AI demonstrated a particularly elegant attack: Claude Code's 50-subcommand limit can be bypassed by sending 50 no-op commands followed by one malicious command. Instead of blocking it, the agent prompts the user for approval — a social engineering vector.

Why This Matters for Your Organization

If Anthropic — with a world-class security team — ships a source map by accident, what are the chances that the MCP servers your AI agents connect to are secure? Our research shows 7.2% of public MCP servers contain vulnerabilities. With 11,500+ servers in the ecosystem, that's 800+ potential attack surfaces.

Lessons for AI Teams

1. Scan Before You Connect

Every MCP server your agent uses is a trust boundary. Treat it like a third-party dependency: scan it before integrating, monitor it continuously.

2. Source Maps Are Not Secrets — But They Reveal Them

The Claude Code leak wasn't a breach. It was a build artifact. Check your .npmignore, your Docker layers, your CI artifacts. The most dangerous leaks are the ones nobody intended.

3. The EU AI Act Requires Technical Evidence

Starting August 2, 2026, high-risk AI systems must demonstrate security compliance. The attack vectors exposed by this leak — prompt injection, credential theft, tool poisoning — are exactly what regulators will assess. You need documentation that shows you've tested for these.

4. Defense in Depth, Not Defense in Hope

The arxiv-mcp-server maintainer got it right: after our security advisory, they added content warnings to tool responses, documenting that external content is untrusted. No single layer prevents prompt injection. But labeling untrusted data at the boundary gives the model signal to be cautious.

How ClawGuard Helps

We built ClawGuard specifically for this threat landscape:

225

Security patterns covering OWASP LLM Top 10, Agentic Top 10, and MCP-specific attacks

Languages scanned for prompt injection, tool poisoning, and credential exposure

<10ms

Scan time per request — fast enough for CI/CD integration

Security advisories filed against major MCP servers (336k+ GitHub stars)

Scan your MCP server for free

Get an instant security score in seconds. No signup required.

Try ClawGuard Shield →

For organizations preparing for the EU AI Act, our Compliance Report generates a full PDF mapping your scan results to EU AI Act articles — the kind of technical evidence that holds up in an audit.

Timeline of Events

Date	Event
Mar 31, 04:23 ET	Source map discovered in npm package by Chaofan Shou
Mar 31, ~06:00	GitHub repo with extracted source reaches 10k stars
Mar 31, ~12:00	Anthropic confirms packaging error, issues statement
Apr 1	CVE-2025-59536 assigned (RCE via malicious repos)
Apr 2	CVE-2026-21852 assigned (API key exfiltration, CVSS 5.3)
Apr 2	Fake "Claude Code Leak" repos spreading Vidar/GhostSocks malware
Apr 3	84,000+ stars, 82,000+ forks on leaked source repo

What's Next

The Claude Code leak is a watershed moment. Not because of what was leaked — most of it was already suspected by the security community — but because it made the attack surface visible.

When you can see the exact permission system, the exact prompt structure, the exact tool orchestration — you can find the exact vulnerabilities. Two CVEs in 72 hours proves that.

The question isn't whether your AI agent infrastructure has similar issues. It's whether you've looked.

Don't wait for your own leak.

Scan your AI infrastructure now → | Get your EU AI Act Compliance Report →