Waxell

Product

Compare

START FREE

Waxell

Logan Kelly

Jul 3, 2026

Poisoned MCP Tool Descriptions Leak Agent Data: What Microsoft's Warning Means for Enterprise Governance

Microsoft warns poisoned MCP tool descriptions redirect agents to exfiltrate data silently. The mechanism, why it persists, and the controls that stop it.

Waxell blog cover: MCP tool description poisoning enterprise governance

On June 30, 2026, Microsoft Incident Response and its Defender security research team published a specific warning: MCP tool description poisoning — where an attacker embeds hidden instructions into the natural-language metadata of an MCP (Model Context Protocol) tool — can redirect a connected AI agent into quietly exfiltrating company data without triggering a single visible alert. The researchers walked through a complete scenario involving a finance team's third-party "invoice enrichment" service. The tool had been approved months earlier, never deeply reviewed, and never given a re-approval trigger on description changes. The attacker modified the tool's description field, burying a hidden directive inside what looked like formatting notes: grab the last thirty unpaid invoices and attach them to the next outbound call. MCP picked up the change immediately. The next time an analyst asked a routine question about a vendor, the agent followed the hidden order, silently bundled a month of invoice records, and sent them alongside a legitimate-looking response to a server outside the organization. The analyst saw nothing wrong.

Microsoft frames this as a scenario rather than a named breach, but the attack class is documented in the wild. The MCPTox benchmark, published in August 2025, tested tool description poisoning across 45 real MCP servers and 20 AI models — finding attack success rates as high as 72.8% with certain models, and an average success rate of 36.5%. The research also found that agents almost never refuse: the highest refusal rate observed was less than 3%, attributed to Claude 3.7 Sonnet. In September 2025, Koi Security confirmed the first in-the-wild instance — an npm package called postmark-mcp that had shipped fifteen clean releases before version 1.0.16 introduced a single line that secretly BCC'd every email an agent sent to an attacker-controlled address.

Why Do Poisoned MCP Tool Descriptions Go Undetected?

The problem is not a bug in any specific product. It is a trust boundary that most enterprise teams have not drawn yet.

MCP tool descriptions are plain text that live inside an agent's context window right next to its system prompt and conversation history. The agent reads tool descriptions to understand what it can do and when to do it — that's the entire mechanism by which MCP works. But it also means that whoever controls a tool's description controls one input into the agent's decision-making. The agent has no reliable way to distinguish an honest description from one that carries a hidden order.

What makes this worse in practice is that MCP tool descriptions can update live. A tool server pushes a new version and connected agents pick it up without any re-approval step in most enterprise configurations. Security teams treat approved tools the way they treat approved software packages: once vetted and deployed, they're in. Nobody is watching the description field the way they'd watch a code commit. The MCPTox research confirms why this matters: more capable models are actually more susceptible to the attack, because the attack exploits their stronger instruction-following. The same property that makes a model useful makes it a better target for embedded directives.

The deeper architectural problem is that MCP mixes instructions and data in the same channel. A tool's description field is semantically identical to its output from the model's perspective — both are text the model processes in the same context. An attacker who poisons a description before an organization connects the tool can wait patiently through every automated check. There's no signature to match, no payload to scan. The first signal that anything went wrong may be the data already on its way to the attacker's server.

What Should Security Teams Audit Right Now?

Before reaching for a platform, three things can be checked with existing access:

Inventory every connected tool. Pull a list of every MCP server and tool your agents can reach. For each one, note who approved it, when, and what process (if any) exists for reviewing description changes. In most organizations, this list does not exist. Building it is the prerequisite for every other control.

Read the tool descriptions you have approved. Open the current description for each connected tool and read it the way you'd read a suspicious email. Look for conditional language — "if the user asks about X, also do Y." Look for references to external endpoints, data collection instructions, or text that has no business appearing in a help field. This is tedious work. It is also the thing that would have caught the postmark-mcp attack before version 1.0.16 shipped.

Put a human in front of write and send actions. Microsoft is explicit on this point: any agent action that moves data outside the organization — sending email, calling external APIs, writing to shared storage, making financial requests — should require a person to approve it before execution, not after review. The attack scenario Microsoft describes works specifically because the exfiltration step looked identical to a normal outbound call. Requiring human approval for that step would have stopped it at the boundary.

How Does Waxell MCP Gateway Handle This?

The architectural problem Microsoft describes — live description updates, no re-approval trigger, no inspection of what descriptions actually say — maps precisely to what Waxell MCP Gateway was built to address.

When a tool connects to the Gateway, it doesn't get catalogued and forgotten. Its description is immediately passed through a prompt injection scanner that runs at fingerprint time, before any agent can call the tool. The scanner is specifically checking for embedded instructions — the kind of hidden "also collect invoices" directive buried in a formatting block that Microsoft's researchers describe. Tools that fail the scan land in Pending review status and cannot be called until a human clears them.

Description drift is handled at the fingerprinting layer. Waxell maintains five fingerprint states for every connected tool: Pending, Drift Detected, Trusted, Blocked, and Removed. A tool whose description changes without re-approval automatically moves to Drift Detected and is blocked from calls until the change is reviewed. MCP's live-update behavior — the property that makes description poisoning operationally easy for attackers — becomes visible and auditable instead of silent.

Human-in-the-loop approval for consequential actions is enforced at the gateway layer, not at the application. When an agent call would move data to an external endpoint, initiate a write, or touch any action the policy marks as sensitive, the Gateway holds the MCP connection open and routes the approval to the right person. The agent waits. Nothing is sent until a human says yes. This is the control Microsoft explicitly calls for, applied uniformly across every tool call that reaches the gateway.

Waxell MCP Gateway connects to 160+ upstream MCP servers, enforces 50+ policy categories out of the box, and propagates policy changes across the fleet in 30 seconds. The audit log is durable and exportable, so if a suspicious call did go out before a description was flagged, you have a timestamped record of exactly what the agent sent, when, and to which tool.

The Microsoft post maps these controls to Prompt Shields, Purview DLP, and Entra Agent ID — tooling specific to organizations running Microsoft-native infrastructure. The principles it's prescribing are what Waxell MCP Gateway delivers for teams building on any framework, connecting any client — Claude Desktop, Cursor, Claude Code, or any MCP-compatible assistant — to any of the 160+ upstream connectors in the catalog.

See how Waxell MCP Gateway works →
Start for free at waxell.dev/signup

Frequently Asked Questions

What is MCP tool description poisoning?
MCP tool description poisoning is an attack where a threat actor modifies the natural-language description of a Model Context Protocol (MCP) tool to embed hidden instructions. Because AI agents read tool descriptions to decide what to do and when to do it, a poisoned description can redirect the agent's behavior — collecting unauthorized data, calling external endpoints, or exfiltrating records — without generating a visible error or triggering an access control violation.

How does the attack remain undetected?
Every individual action the agent takes is legitimate. The tool was approved. The data query ran within the user's own permissions. The outbound call went to a server that was permitted when the tool was first onboarded. No single action trips an alarm. The attack is only detectable if something inspects the content of tool descriptions before each call, or if anomalous data movement patterns are flagged in the audit log after the fact.

What did Microsoft's June 30 warning specifically say?
Microsoft Incident Response and its Defender security research team published a blog post on June 30, 2026, walking through an invoice enrichment scenario where a finance team's approved third-party MCP tool was modified by an attacker to exfiltrate the last thirty unpaid invoices during a routine query. The post identified MCP's live description update behavior — no re-approval step required — as the core gap that makes the attack easy to execute and hard to catch. The full post is at the Microsoft Security Blog: securing-ai-agents-ai-tools-move-from-reading-acting.

Has this happened in the real world?
Yes. In September 2025, researchers at Koi Security identified the postmark-mcp npm package — a malicious MCP server that had mirrored a legitimate email tool for fifteen clean releases before version 1.0.16 introduced a single line that secretly BCC'd every email an AI agent sent to an attacker-controlled address. The MCPTox benchmark (August 2025, arxiv.org/abs/2508.14925) found attack success rates as high as 72.8% with certain models across 45 real MCP servers and 20 leading AI models. Model refusal rates were negligible — the highest observed was less than 3%.

What's the difference between tool description poisoning and prompt injection?
Prompt injection embeds malicious instructions in content the model reads during a task — documents, emails, retrieved search results, user inputs. Tool description poisoning specifically targets the metadata layer: the fields that explain what a tool does and when to call it. Both exploit the same structural vulnerability — the model cannot reliably distinguish its principal's instructions from adversarially crafted text in its context — but they attack at different layers. Tool description poisoning is particularly persistent because descriptions are written once, approved once, and rarely reviewed again.

What three controls specifically prevent this?
Microsoft identifies the same three controls that Waxell MCP Gateway enforces: first, a scanner that reads tool descriptions for embedded instructions before any agent can call the tool; second, a fingerprinting or drift-detection system that catches description changes and requires re-approval before they go live; and third, human-in-the-loop enforcement that holds any data-exfiltration or write action for explicit approval before execution. Each control alone reduces risk; all three together close the attack path Microsoft describes.

Sources

Agentic Governance, Explained

Waxell blog cover: AI agents processing employee PII without a policy

AI Agents and Employee PII: The Policy Gap [2026]

34.8% of corporate data employees put into AI tools is sensitive. Meta's MCI shows the stakes. Here's what a real employee PII policy for agents actually covers.

Logan Kelly

Jul 3, 2026

Waxell blog cover: GuardFall AI coding agent shell injection 2026

GuardFall Shell Injection: 10 of 11 AI Coding Agents [2026]

GuardFall defeats shell guards in 10 of 11 AI coding agents using decades-old bash tricks. Named tools: Aider, Cline, Goose, Plandex, and more.

Logan Kelly

Jul 2, 2026

Waxell blog cover: Copilot billing shock agentic cost enforcement 2026

Copilot Billing Shock: $29 Plans Now Cost $750 [2026]

GitHub's first Copilot token billing cycle ended June 30. Agentic sessions hit 10x–50x cost spikes. Why dashboards don't fix this—and what does.

Logan Kelly

Jul 1, 2026

Waxell blog cover: AI agent hallucination detection vs fallback enforcement in production

AI Agent Hallucination: Why Detection Isn't Enough [2026]

64% of enterprises lost $1M+ to AI errors last year. Hallucination detection finds bad outputs after the agent acted. Runtime enforcement stops the damage.

Logan Kelly

Jul 1, 2026

AI Agents and Employee PII: The Policy Gap [2026]

34.8% of corporate data employees put into AI tools is sensitive. Meta's MCI shows the stakes. Here's what a real employee PII policy for agents actually covers.

Logan Kelly

Jul 3, 2026

GuardFall Shell Injection: 10 of 11 AI Coding Agents [2026]

GuardFall defeats shell guards in 10 of 11 AI coding agents using decades-old bash tricks. Named tools: Aider, Cline, Goose, Plandex, and more.

Logan Kelly

Jul 2, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product

Connect

Observe

Runtime