Waxell

Product

Compare

START FREE

Waxell

Logan Kelly

Jul 2, 2026

GuardFall Shell Injection: How 10 of 11 Popular AI Coding Agents Bypass Their Own Safety Guards

GuardFall defeats shell guards in 10 of 11 AI coding agents using decades-old bash tricks. Named tools: Aider, Cline, Goose, Plandex, and more.

Waxell blog cover: GuardFall AI coding agent shell injection 2026

The safety check that's supposed to stop an AI coding agent from running a dangerous command is the exact reason developers feel safe enough to switch off human review. Turn on --auto-exec, set up the CI pipeline, trust the blocklist. GuardFall is what happens when the blocklist doesn't work.

On June 30, 2026, Adversa AI researcher Omer Ben Simon published research showing that 10 of the 11 most popular open-source AI coding agents — accounting for roughly 548,000 combined GitHub stars — can be bypassed with shell tricks that have been documented in security research for decades. The affected tools include Aider, Cline, Roo-Code, Goose, Plandex, Open Interpreter, OpenHands, SWE-agent, opencode, and NousResearch's Hermes. Only one agent, Continue, was built to actually defend against this class of bypass.

GuardFall is the name Adversa AI gave to a class of bypasses against pattern-based shell guards in agentic coding tools: the guard inspects raw command text, bash rewrites that text before executing it, and the two never see the same thing. The gap is structural, not incidental — adding more denylist patterns closes none of it.

What Actually Happened?

The research started with NousResearch's open-source Hermes project. Adversa AI found that Hermes's 30-pattern regex denylist could be walked past by using decades-old shell quoting techniques. The filter sees r''m as different from rm. Bash removes the empty quotes and runs rm anyway. From there, they tested ten more of the most popular coding agents by GitHub star count.

The attack requires two conditions, neither of which is unusual in production environments. First, the AI model must emit the malicious command. A blunt rm -rf / is typically refused, but the same command embedded in a Makefile target, an MCP tool's documentation response, or a config file that the agent reads as part of a repository is emitted as routine work. Second, an auto-execution flag must be on — --auto-exec, --auto-run, --auto-test, dangerously-skip-permissions — or the container sandbox has to be switched off. Both are standard in CI pipelines and developer laptop setups.

Adversa demonstrated the full attack chain end-to-end against the production Plandex binary and confirmed equivalent attack paths against eight other agents. In the demonstrated attack scenario: a developer's CI pipeline includes a coding agent running on pull requests; the agent reads an attacker-submitted Makefile that defines a clean target wiping ~/.aws/credentials; the agent's auto-exec flag runs it without human review. The IAM credentials are gone.

Five bypass classes cover the structural problem. Class A: quote removal — r''m becomes rm. Class B: $IFS expansion — rm$IFS-rf$IFS/ looks like one word to a pattern filter, three arguments to bash. Class C: command substitution — $(echo rm) -rf / emits the binary name dynamically, invisible to a string matcher. Class D: base64 piped to a shell — each segment is benign alone, destructive in combination. Class E: alternative destructive argv shapes — find /x -delete, dd of=/dev/sda, install -m 4755 payload /usr/bin/backdoor.

There is no single CVE assigned to GuardFall. That is the point: it is not a patchable bug in a single component. It is a design convention — agent to shell, gated by string matching — that fails structurally.

Why Pattern-Based Shell Guards Can't Win This Fight

The fundamental problem is that a text matcher operating on raw command strings cannot model bash's expansion rules. Any rewrite that survives the matcher but is bash-equivalent to a matched pattern is a bypass. The breadth of the blocklist is irrelevant.

The Adversa survey identified four architectural failure modes across the affected agents:

Mode 1: Regex over raw string. Hermes, opencode, and Goose compile a fixed regex set and test it against the verbatim command. This fails all five bypass classes.

Mode 2: Tokenized but raw-text matching. Cline (opt-in mode) and Roo-Code tokenize the command but still match against the rebuilt segment text. This partially closes Class A, but Classes C-inside-quotes and E remain open.

Mode 3: No static guard plus auto-yes. Aider, Plandex, Open Interpreter, and Cline (default mode) do no pattern matching. The defense is a human in the loop — until workflow pressure produces --auto-test, --auto-exec, or a malicious repository ships a config file that flips the flag before any operator does. A .aider.conf.yml file inside a cloned repository can set auto-test: true and plant a payload in the test-cmd field. The agent reads it as normal project configuration.

Mode 4: Container sandbox with a documented local opt-out. OpenHands and SWE-agent sandbox by default — sound when the workspace is disposable. Both ship a documented local-mode configuration that disables the container and provides zero fallback. Local mode is common in CI, developer laptops, and self-hosted runners: exactly the deployments where the credentials are real.

Continue is the exception. Its evaluator tokenizes with shell-quote, detects variable expansion, recurses into command substitutions, checks pipe destinations, and maintains an explicit disabled list for canonical destructive patterns. In tests, zero of 21 bypass payloads reached allowedWithoutPermission in the default IDE mode.

The common thread across every failing design: the decision sits somewhere that cannot hold it. The model cannot be relied upon — the same payload refused as a direct request is emitted without hesitation when wrapped in operational context. The operator confirmation cannot be relied upon — auto-yes flags get set the first time the workflow slows down. The container cannot be relied upon — it defends until the local opt-out is set, which is common in real deployments.

What Should You Check Before Your Next CI Run?

None of these is a complete fix. They're compensating controls while the architectural problem gets addressed upstream.

Redirect $HOME for every agent invocation. A one-line shell wrapper — HOME=$HOME/.agent-sandbox-$RANDOM agent … — keeps the project directory but removes ~/.ssh/, ~/.aws/, shell history, and the credential surface the attack most wants. This is always-on and has no single-flag opt-out.

Disable auto-execute flags unless unavoidable. This applies to --auto-exec, --auto-run, --auto-test, dangerously-skip-permissions, and auto-mode: full. In CI, treat these as privileged subprocess invocations requiring explicit justification — not defaults.

Disable agent execution on fork pull requests. An attacker-controlled README or test config file is the most direct path from untrusted content to privileged shell execution. Fork PRs should not run coding agents with full account access.

Audit repository config files. A malicious .aider.conf.yml shipped inside a cloned repository can flip auto-test: true and plant a payload in the test-cmd field. Treat repo-shipped agent configuration as untrusted code.

For builders and maintainers: the Continue-style evaluator is the only sound, always-on, flag-independent defense the survey identifies. Implementing the five components — tokenize, detect expansion, recurse into substitutions, check pipe destinations, maintain an explicit disabled list — is described by the researchers as roughly a two-day exercise for an experienced engineer.

How Waxell Handles This

The structural problem GuardFall exposes is that the agent-to-shell boundary has no governance layer. The agent emits a command, the shell runs it, and the string match in between has no model of what bash will actually do. Adding more patterns to the blocklist does not solve a lexer-versus-evaluator problem.

Waxell Observe sits at the agent execution layer, not in front of the shell. Initialized with two lines of code, it intercepts execution at the tool-call level and evaluates each call against 50+ policy categories before anything reaches the shell. The Content policy category validates what can be ingested as input; the Safety category enforces hard limits on what tool calls the agent is permitted to emit; the Identity category ensures each action is scoped to the identity that authorized it, not the full account. These policies run pre-execution. They evaluate before bash -c ever sees the string.

The contrast with pattern-based blocklists is direct: a blocklist runs on text the agent has already decided to emit. Waxell's policy engine runs on tool calls before the shell sees them — and on inputs before the model decides anything. A quote-removal or $IFS bypass that defeats a text matcher does not affect a policy that evaluates the semantics of the action, not the characters in the command string.

For teams running agents connected to external services via MCP, Waxell MCP Gateway adds a second layer. It scans tool descriptions at fingerprint time — when the tool is registered, before any agent calls it — using a prompt injection scanner that catches embedded instructions in tool metadata. A Makefile instruction planted in an MCP tool's documentation field is flagged before it ever reaches the model's context. PII and secrets are redacted in flight before they enter the agent. These are controlled inputs and validated data interfaces: the governance decision sits somewhere that can actually hold it.

You can explore the full agent security model. Start with Waxell Observe in two lines:

→ waxell.dev/signup

Frequently Asked Questions

Which AI coding agents are affected by GuardFall?

The ten affected agents in the June 30, 2026 Adversa AI survey are: Aider, Cline, Goose, Open Interpreter, OpenHands, opencode, Plandex, Roo-Code, SWE-agent, and NousResearch Hermes. Together they account for roughly 548,000 GitHub stars. Only Continue was built with a guard architecture that closes the structural majority of the bypass surface.

Is there a CVE assigned to GuardFall?

No. Adversa AI explicitly states there is no single CVE to track or patch. GuardFall is a class of bypasses — a design convention, not a discrete vulnerability in a single component. Adding more denylist patterns does not fix it; the architectural pattern (agent to bash, gated by string matching) fails structurally regardless of how many patterns are in the list.

What does GuardFall mean for CI/CD pipelines?

CI pipelines typically run with auto-execute flags on — that is the point of CI. GuardFall is exploitable the moment that flag is active. Every CI pipeline that uses an affected coding agent and processes untrusted content (fork pull requests, third-party dependency files, attacker-accessible config files) sits in the attack surface. Redirect $HOME and disable agent execution on fork PRs as immediate compensating controls.

Does disabling auto-execute protect against GuardFall?

Partially. The attack requires the model to emit a malicious command AND the shell to run it automatically. Removing the auto-execute flag removes the second condition, but doesn't close the first: a malicious repo can ship a config file (.aider.conf.yml) that re-enables auto-exec on first edit, circumventing the operator's setting. Disabling auto-execute is a compensating control, not a complete defense.

How does Waxell's policy enforcement prevent GuardFall-style attacks?

Waxell Observe intercepts at the tool-call level before the shell sees any command. Its 50+ policy categories — including Content (input validation), Safety (tool-call limits), and Identity (scope enforcement) — run pre-execution against the semantics of the action, not the text of the command string. A quote-removal or $IFS bypass that defeats a text matcher does not affect a policy evaluating what the call is attempting to do. Waxell MCP Gateway adds a second layer for externally connected tools: it scans tool descriptions at fingerprint time for embedded instructions and redacts credentials and PII before they reach the agent.

What should I do before my next CI run?

Immediately: redirect $HOME for every agent invocation, disable auto-execute flags on fork PRs, and audit repository-shipped agent configuration files for planted auto-exec settings. This week: disable all auto-execution in automated pipelines processing untrusted content. This quarter: evaluate the Continue-style tokenize-and-canonicalize evaluator for any coding agent or computer-use shell channel you build or operate.

Sources

Adversa AI, "GuardFall: a universal shell injection vulnerability in open-source AI agents," Omer Ben Simon, June 30, 2026: https://adversa.ai/blog/opensource-ai-coding-agents-shell-injection-vulnerability/
The Hacker News, "GuardFall Exposes Open-Source AI Coding Agents to Decades-Old Shell Injection Risks," Swati Khandelwal, June 30, 2026: https://thehackernews.com/2026/06/guardfall-exposes-open-source-ai-coding.html
SecurityAffairs, "GuardFall Flaw Hits 10 of 11 Popular Open-Source AI Agents": https://securityaffairs.com/194546/ai/guardfall-flaw-hits-10-of-11-popular-open-source-ai-agents.html
SecurityWeek, "Decades-Old Bash Tricks Expose AI Coding Agents to Supply Chain Attacks": https://www.securityweek.com/decades-old-bash-tricks-expose-ai-coding-agents-to-supply-chain-attacks/
SC Media, "Shell injection flaw found in 10 of 11 open-source AI agents": https://www.scworld.com/brief/shell-injection-flaw-found-in-10-of-11-open-source-ai-agents

Agentic Governance, Explained

Waxell blog cover: Copilot billing shock agentic cost enforcement 2026

Copilot Billing Shock: $29 Plans Now Cost $750 [2026]

GitHub's first Copilot token billing cycle ended June 30. Agentic sessions hit 10x–50x cost spikes. Why dashboards don't fix this—and what does.

Logan Kelly

Jul 1, 2026

Waxell blog cover: AI agent hallucination detection vs fallback enforcement in production

AI Agent Hallucination: Why Detection Isn't Enough [2026]

64% of enterprises lost $1M+ to AI errors last year. Hallucination detection finds bad outputs after the agent acted. Runtime enforcement stops the damage.

Logan Kelly

Jul 1, 2026

Waxell blog cover: AI agent output quality and confidence compounding

AI Agent Output Quality: Confidence Fails at Step 20 [2026]

LLMs are confidently wrong 15–52% of the time. In multi-step agents, confidence compounds into catastrophic failure. Here's why detection isn't enough.

Logan Kelly

Jun 29, 2026

Waxell blog cover: SearchLeak CVE-2026-42824 M365 Copilot data exfiltration governance

CVE-2026-42824: One-Click Email Theft via M365 Copilot

CVE-2026-42824 turned Copilot into a one-click exfiltration tool. Emails, MFA codes, files—gone. Here's what governance stops that a patch can't.

Logan Kelly

Jun 29, 2026

Copilot Billing Shock: $29 Plans Now Cost $750 [2026]

GitHub's first Copilot token billing cycle ended June 30. Agentic sessions hit 10x–50x cost spikes. Why dashboards don't fix this—and what does.

Logan Kelly

Jul 1, 2026

AI Agent Hallucination: Why Detection Isn't Enough [2026]

64% of enterprises lost $1M+ to AI errors last year. Hallucination detection finds bad outputs after the agent acted. Runtime enforcement stops the damage.

Logan Kelly

Jul 1, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product