InversePrompt: Turning Claude Against Itself, One Prompt at a Time
(CVE-2025-54794 & CVE-2025-54795)

Introduction: When AI Helps You Hack AI
As Anthropic’s Claude Code gains traction as a powerful AI coding assistant, it promises developers a safe and streamlined way to build with Claude’s capabilities. But what happens when the same assistant meant to enforce restrictions unknowingly reveals how to bypass them?
During Anthropic’s Research Preview phase, I discovered two high-severity vulnerabilities in Claude Code, which were quickly addressed by the team. These issues allowed me to escape its intended restrictions and execute unauthorized actions, all with Claude’s own help.
By turning the tool inward and exploring how it interprets and validates inputs, I uncovered flaws that led to:
- Path restriction bypass.
- Code execution via command injection.
Both are exploitable through simple prompt crafting. These findings highlight the risks of blindly trusting LLM-powered developer tools, especially when the same system meant to enforce the rules can also be used to break them.

What is Claude Code?
Claude Code is Anthropic’s AI-powered coding assistant, designed to help developers write, analyze, and run code through natural language prompts. It’s part of the Claude ecosystem, focused specifically on software development tasks like generating functions, debugging code, and interacting with files on the developer’s machine.
By design, Claude Code runs with the current user’s privileges and can perform most actions the user can, depending on approval and the user’s settings. It is generally scoped to a current working directory (CWD) and asks for user consent when accessing unfamiliar files or executing commands outside of a predefined set.
Within this model, Claude Code can:
- Read or write files
- List directory contents
- Execute specific permitted commands
- Interact with your local dev setup (with explicit user permission)
Claude asks for user consent when accessing unfamiliar files or locations outside the CWD, or executing commands that are not part of the pre-approved (whitelisted) commands, aiming to enforce clear boundaries between the assistant and the underlying system. But as we’ll see next, those boundaries can be crossed, sometimes with nothing more than a cleverly crafted prompt.


How Does Claude Code Work?
Claude Code operates through a secure local interface that connects the Claude language model to your development environment. Its design aims to give the AI just enough access to be useful while keeping the rest of your system off-limits unless permitted by the user.
At the core of this security model are two primary controls:
1. CWD Restriction (Current Working Directory)
Claude Code is limited to a specific directory, the current working directory (CWD). This means all file-related operations, like reading or writing files, listing contents, or exploring subdirectories, are limited to this defined scope.
If you try to access files outside of that directory, Claude will either:
- Ask for explicit user permission through a system prompt.
- Deny the request.

This mechanism is intended to prevent unauthorized file access or leakage of sensitive data.
2. Pre-Approved Commands (Whitelisting)
Claude Code can execute bash commands, but only from a strictly defined list of pre-approved commands. These are typically limited to safe, development-focused operations such as:
- ls
- cat
- touch
- mkdir
- echo

More dangerous commands (e.g., rm, curl, dd) are either blocked or require explicit user confirmation before execution. This provides a layer of protection against command injection and privilege escalation, at least in theory.

Under the Hood
Internally, Claude Code uses one of Anthropic’s Claude LLMs via API. The tool can be configured to connect through multiple providers, depending on your setup:
- Anthropic API
- AWS Bedrock
- GCP Vertex AI
Each platform offers access to different Claude model versions, like Opus, Sonnet, or Haiku, which the user can select during setup. The connection is authenticated using an API key for the chosen provider, and Claude Code routes all LLM prompts through that backend.

When you type a prompt like: “List all files in the cwd”.

Claude Code interprets it, maps it to a whitelisted command (like ls -la), and executes it inside the CWD. All interactions are logged and constrained, or at least, they’re supposed to be.
But as we’ll see next, a few small oversights in path validation and input sanitization open the door to high-impact vulnerabilities.
Reverse Engineering Claude Code: How I Started Exploring the Code
Unlike many LLM developer tools, Claude Code isn’t open source, which makes vulnerability research a bit trickier. My journey started with a simple question: Could I still figure out how it works behind the scenes?
I began by searching for public or deobfuscated versions of Claude Code, which led me to a great blog post by Geoffrey Huntley. His analysis provided helpful insight into how Claude Code operates and interacts with the system, and it pointed me to his deobfuscation project on GitHub, which contains a readable snapshot of Claude Code’s frontend logic.
Instead of relying directly on that version, I decided to go hands-on myself, using Claude integrated inside the Cursor editor to better understand the app’s behavior. With Claude's help, I began unpacking the obfuscated JavaScript bundles, asking questions about Webpack internals, deobfuscation techniques, and how best to untangle the structure.
Eventually, I landed on a powerful tool: WebCrack, which enabled me to automate the process and get a clean mapping of the real code behind Claude Code’s interface. From there, I had a working structure, and a target rich environment to start hunting for security vulnerabilities.
After generating the deobfuscated file using WebCrack, I just started asking Claude questions, anything that popped into my mind. And it worked like magic.
Claude literally helped reverse-engineer Claude Code, by itself.
For example, this part of the code from the deobfuscated version shows some of the commands Claude Code can execute, along with their associated regex limitations.
var In5 = new Set([/^date\b[^<>()$`]*$/
/^cal\b[^<>()$`]*$/
/^uptime\b[^<>()$`]*$/
/^echo\s+(?:'[^']*'|"[^"$<>]*"|[^|;&`$(){}><#\\\s!]+?)*$/
/^claude -h$/
/^claude --help$/
/^git diff(?!\s+.*--ext-diff)(?!\s+.*--extcmd)[^<>()$`]*$/
/^git log[^<>()$`]*$/
/^git show[^<>()$`]*$/
/^git status[^<>()$`]*$/
/^git blame[^<>()$`]*$/
/^git reflog[^<>()$`]*$/
/^git stash list[^<>()$`]*$/
/^git ls-files[^<>()$`]*$/
/^git ls-remote[^<>()$`]*$/
/^git config --get[^<>()$`]*$/
/^git remote -v$/
/^git remote show[^<>()$`]*$/
/^git tag$/
/^git tag -l[^<>()$`]*$/
/^git branch$/
/^git branch (?:-v|-vv|--verbose)$/
/^git branch (?:-a|--all)$/
/^git branch (?:-r|--remotes)$/
/^git branch (?:-l|--list)(?:\s+"[^"]*"|'[^']*')?$/
/^git branch (?:--color|--no-color|--column|--no-column)$/
/^git branch --sort=\S+$/
/^git branch --show-current$/
/^git branch (?:--contains|--no-contains)\s+\S+$/
/^git branch (?:--merged|--no-merged)(?:\s+\S+)?$/
/^head[^<>()$`]*$/
/^tail[^<>()$`]*$/
/^wc[^<>()$`]*$/
/^stat[^<>()$`]*$/
/^file[^<>()$`]*$/
/^strings[^<>()$`]*$/
/^hexdump[^<>()$`]*$/
/^sort(?!\s+.*-o\b)(?!\s+.*--output)[^<>()$`]*$/
/^grep\s+(-[a-zA-Z]+\s+)*(?:'[^']*'|"[^"]*"|\S+)\s*$/
/^pwd$/
/^whoami$/
/^id[^<>()$`]*$/
/^uname[^<>()$`]*$/
/^free[^<>()$`]*$/
/^df[^<>()$`]*$/
/^du[^<>()$`]*$/
/^ps(?!\s+.*-o)[^<>()$`]*$/
/^locale[^<>()$`]*$/
/^node -v$/
/^npm -v$/
/^npm list[^<>()$`]*$/
/^python --version$/
/^python3 --version$/
/^pip list[^<>()$`]*$/
/^docker ps[^<>()$`]*$/
/^docker images[^<>()$`]*$/
/^ping\s+(?:-c\s+\d+\s+)[^<>()$`]*$/
/^host[^<>()$`]*$/
/^nslookup[^<>()$`]*$/
/^dig[^<>()$`]*$/
/^netstat(?!\s+.*-p)[^<>()$`]*$/
/^ip addr[^<>()$`]*$/
/^ifconfig[^<>()$`]*$/
/^man(?!\s+.*-P)(?!\s+.*--pager)[^<>()$`]*$/
/^info[^<>()$`]*$/
/^help[^<>()$`]*$/
/^sleep[^<>()$`]*$/
/^tree$/
/^which[^<>()$`]*$/
/^type[^<>()$`]*$/
/^history(?!\s+-c)[^<>()$`]*$/
/^alias$/
/^compgen[^<>()$`]*$/
/^yes$/]);
CVE-2025-54794: Path Restriction Bypass, Breaking Out of the Scope (CVSS Score 7.7)
Claude Code is designed to restrict file operations to a predefined current working directory (CWD).
This containment model is supposed to prevent the assistant from accessing files outside its designated scope unless explicitly authorized.
My first instinct was to check whether Claude Code suffered from the same types of vulnerabilities I previously discovered in Anthropic’s Filesystem MCP Server, detailed in my blog post here:
CVE-2025-53109 & CVE-2025-53110 – EscapeRoute: Breaking Anthropic’s Sandbox
This was a natural starting point: both Claude Code and the Filesystem MCP Server were designed and developed by Anthropic, and possibly even by the same team of developers. That means architectural patterns, and security oversights, might be repeated.
As it turns out, they were.
Both types of vulnerabilities were actually present in Claude Code as well. Although the symlink-based vulnerability (CVE-2025-53109) fell outside the scope of what Claude Code allows, The directory containment bypass (CVE-2025-53110) was fully in scope.

In fact, the exact same naïve prefix-based path validation flaw was found here too, allowing the same style of containment bypass I had previously demonstrated in Anthropic’s Filesystem MCP Server.
How the Validation Works
The deobfuscated code.
function ww(Z) {
return pB(Z).startsWith(pB(t9()));
}
This translates to:
function isPathWithinCwd(filePath) {
return path.resolve(filePath).startsWith(path.resolve(originalCwd));
Let’s assume Claude Code is running with its working directory set to:
“/Users/eladbeber/Documents/claude_code”
Now, an attacker creates another directory with a similar prefix:
“/Users/eladbeber/Documents/claude_code_evil”
Because this new path starts with the allowed prefix, Claude Code allows read access, even though the target directory is clearly outside the intended boundary. This lets the attacker:
- Read sensitive files
- Bypass permission prompts
- Break out of the intended sandbox restrictions
Even worse, when combined with symbolic links (symlinks), this flaw can enable full access to arbitrary locations in the file system. In environments where Claude Code is used with elevated privileges, this flaw can serve as a gateway to privilege escalation.
For example, reading “/etc/passwd” without explicit permission.

PoC Video
CVE-2025-54795: Code Execution via Command Injection (CVSS Score 8.7)
While Claude Code claims to restrict command execution to a predefined whitelist, I discovered a way to inject and execute arbitrary commands, completely bypassing the system’s approval prompts.
This vulnerability arises from improper input sanitization in how Claude Code handles whitelisted commands. By crafting a string that breaks out of the intended command context, an attacker can execute unintended shell commands, no user confirmation required.
After validating the two containment vulnerabilities I had previously discovered, I noticed another interesting attack path, one that was explicitly listed within the scope of Anthropic’s own security model.

This caught my attention. I quickly realized that Claude Code has two types of terminal commands:
- Whitelisted commands that run without asking the user (e.g., echo, pwd, whoami)
- Restricted commands that require explicit approval (e.g., rm, dd, sudo, kill)
With this in mind, I began exploring ways to abuse the permissive commands, and turn them into something far more dangerous. After another round of deep deobfuscation and analysis (with Claude’s help again), I asked the model to generate payloads that use a whitelisted command, like “echo”, as a wrapper for command injection.
One of the first prompts I tried was as simple as:

That didn’t trigger execution, but it sparked an idea. I kept refining the payloads, one prompt at a time, until finally… I hit the one that worked.

Once I landed on the working payload, the door was wide open, I could execute anything I wanted, without any user prompt or restriction.

A template for this injection could be:
echo "\"; <COMMAND>; echo \""
How does the Injection work?
Claude Code supports commands like:
echo “Hello, world!”
However, when user input is passed directly into the echo without proper escaping, an attacker can smuggle additional commands by closing the string and injecting arbitrary shell instructions. For example
Here’s what this payload does:
- Ends the current echo string.
- Injects a new command (open -a Calculator).
- Then resumes echo, making the payload look legitimate.
Because the entire string is treated as part of a valid request, Claude doesn’t prompt for confirmation, and executes the injected command immediately.This technique works with any whitelisted command, not just echo. As long as the base command is permitted without a user prompt, injection is possible.
At least Claude realized the issue and openly admitted the presence of a command injection vulnerability, even listing the exact steps it should have taken to prevent it.

This vulnerability enables Local Privilege Escalation (LPE) by gaining arbitrary code execution through a trusted interface. In environments where Claude Code runs with elevated privileges, this can lead to complete system compromise by gaining code execution via Claude Code’s permissions. However, practical exploitation of this issue requires that Claude Code be run in a context where it has elevated privileges and is exposed to untrusted input or code.
Patch & Mitigation
Thanks to Anthropic’s security and engineering teams for a fast, professional response and smooth coordination during disclosure.
- Path restriction bypass (CVE-2025-54794)
Affected: versions < v0.2.111
Fixed: v0.2.111.
Action: Update to ≥ v0.2.111 - Command injection (CVE-2025-54795)
Affected: versions < v1.0.20
Fixed: v1.0.20
Action: Update to ≥ v1.0.20
Summary
This research underscores two broad lessons. First, vulnerabilities repeat: when products share architecture, code, or teams, the same flaw can resurface, so once you find a weakness in one tool, proactively test its siblings. Second, LLMs can be turned inward: with inverse prompting, the model can help map its own behavior and expose bypasses, effectively becoming both assistant and adversary. In short, an LLM can be both the lock and the pick, this shows how LLMs can be exceptionally powerful tools, especially against themselves when steered with inverse prompts.