PII and Secret Scanning for MCP Tools
When your agent calls MCP tools, the arguments it sends and the results it receives may contain sensitive data -- social security numbers, API keys, credit card numbers, or other personally identifiable information. PII scanning intercepts these tool calls automatically, detecting sensitive patterns in both directions and letting you block, warn, or redact before data reaches an external server.
This guide walks you through enabling PII scanning, configuring per-type actions, building a custom scanner, and understanding what appears in your traces.
How PII Scanning Works
PII scanning runs at two points during every MCP tool call:
- Input scanning (before execution): Inspects the tool's arguments before the call is made. If a blocking-level finding is detected, the tool call is stopped entirely -- the MCP server never sees the data.
- Output scanning (after execution): Inspects the tool's result text after the call returns. Because the tool has already executed, output scanning warns but does not block -- you cannot un-send data that was already processed by the server.
Input scanning can block a tool call because the data has not left your process yet. Output scanning can only warn because blocking would discard a result that was already computed. This asymmetry is intentional -- it gives you defense-in-depth without throwing away valid results.
Both scans truncate text to 4KB before pattern matching for performance. The scan results are recorded on the span as attributes so you can query for PII events across all your traces.
Detectable PII Types
The built-in RegexPIIScanner detects 11 types of sensitive data across two categories:
Personal Information (PII)
| Type | Pattern | Example Match |
|---|---|---|
ssn | \d{3}-\d{2}-\d{4} | 123-45-6789 |
email | Standard email format | user@example.com |
phone | US phone numbers with optional +1 | (555) 123-4567 |
credit_card | 16-digit card numbers with optional separators | 4111-1111-1111-1111 |
Credentials and Secrets
| Type | Pattern | Example Match |
|---|---|---|
password | password=, pwd:, passwd= followed by value | password=s3cret |
api_key | api_key=, api-secret: followed by value | api_key=abc123xyz |
secret | secret_key=, access_key=, client_secret= | client_secret=xyz789 |
aws_key | AWS access key IDs (AKIA prefix + 16 chars) | AKIAIOSFODNN7EXAMPLE |
generic_token | Tokens with known prefixes (sk-, pk_live_, etc.) | sk-abc123def456ghi789jkl |
github_pat | GitHub personal access tokens (ghp_ prefix) | ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
waxell_key | Waxell API keys (wax_sk_ prefix) | wax_sk_abc123 |
PII Actions
Each PII type can be assigned one of three actions. When multiple types are detected, the highest severity action is recorded on the span:
| Action | Severity | Behavior |
|---|---|---|
warn | 0 (lowest) | Log a warning, record finding on span, continue with tool call |
redact | 1 | Replace detected pattern with ##TYPE## placeholder (e.g., ##SSN##) |
block | 2 (highest) | Stop the tool call entirely, raise PolicyViolationError |
The default action for all types is warn. You override actions per type using the pii_actions dictionary.
Step 1: Enable PII Scanning
PII scanning is part of MCP governance. Enable it by calling configure_session() with a governance_config that includes pii_actions:
import waxell_observe as waxell
from waxell_observe.instrumentors.mcp_instrumentor import configure_session
waxell.init()
# After session.initialize()
configure_session(
session,
server_name="my-server",
governance_config={
"agent_name": "my-agent",
"scan_inputs": True, # Scan tool arguments (default: True)
"scan_outputs": True, # Scan tool results (default: True)
"pii_actions": {
"ssn": "block", # Block calls containing SSNs
"credit_card": "block", # Block calls containing credit cards
"email": "warn", # Warn on email addresses
"api_key": "block", # Block calls containing API keys
"aws_key": "block", # Block AWS access keys
},
},
)
With this configuration:
- A tool call with
{"query": "SSN 123-45-6789"}in the arguments is blocked before it reaches the server. - A tool call with
{"query": "contact user@example.com"}proceeds but logs a warning and records the finding on the span. - A tool result containing an AWS key triggers a warning in the output scan (output scans never block).
If you want all-warn scanning (detect but never block), pass an empty pii_actions:
governance_config={
"agent_name": "my-agent",
"pii_actions": {}, # All types default to "warn"
}
Step 2: Use a Custom PII Scanner
The built-in RegexPIIScanner covers common patterns. For more advanced detection -- ML-based NER, domain-specific patterns, or integration with tools like Presidio or Microsoft Presidio -- you can provide your own scanner.
A custom scanner must implement the PIIScanner protocol:
from waxell_observe.scanning import PIIScanner
class MyScanner:
"""Custom PII scanner using Presidio."""
def scan(self, text: str) -> dict:
"""Scan text and return findings.
Must return:
{
"detected": bool,
"count": int,
"findings": [
{"type": "ssn", "category": "pii", "action": "block"},
{"type": "email", "category": "pii", "action": "warn"},
]
}
"""
# Your detection logic here
findings = self._run_presidio(text)
return {
"detected": bool(findings),
"count": len(findings),
"findings": findings,
}
Each finding dict must have:
type(str): The PII type name (e.g.,"ssn","email","custom_id")category(str):"pii"or"credential"action(str):"warn","block", or"redact"
Pass your custom scanner in the governance config:
configure_session(
session,
server_name="my-server",
governance_config={
"agent_name": "my-agent",
"pii_scanner": MyScanner(), # Your custom scanner
},
)
When pii_scanner is provided, it replaces the default RegexPIIScanner entirely. If you want to combine them, call both inside your custom scanner's scan() method:
from waxell_observe.scanning import RegexPIIScanner
class CombinedScanner:
def __init__(self):
self._regex = RegexPIIScanner(actions={"ssn": "block"})
def scan(self, text: str) -> dict:
# Run built-in regex scanner
regex_result = self._regex.scan(text)
# Run your own custom logic
custom_findings = self._detect_custom_patterns(text)
# Merge findings
all_findings = regex_result["findings"] + custom_findings
return {
"detected": bool(all_findings),
"count": len(all_findings),
"findings": all_findings,
}
PIIScanner is a runtime_checkable Protocol. You do not need to inherit from it -- any class with a scan(text: str) -> dict method works.
What Appears in Traces
When PII scanning runs, the following attributes are recorded on the MCP tool span:
| Span Attribute | Type | Description |
|---|---|---|
waxell.mcp.pii_detected | bool | True if any PII or credentials were found |
waxell.mcp.pii_count | int | Total number of findings across input and output scans |
waxell.mcp.pii_scan_direction | string | Which scan found PII: "inputs", "outputs", or "both" |
waxell.mcp.pii_action_taken | string | Highest-severity action across all findings: "warn", "block", or "redact" |
waxell.mcp.governance_checked | bool | True when governance ran (always true when PII scanning is active) |
waxell.mcp.governance_timestamp | string | ISO 8601 timestamp of the governance check |
The actual PII values (SSN digits, email addresses, etc.) are intentionally not recorded on spans. Only summary attributes (detected/count/action) are stored. This prevents your tracing backend from becoming a PII liability.
You can query for PII events in Grafana using TraceQL:
{waxell.mcp.pii_detected = true}
Or filter by action:
{waxell.mcp.pii_action_taken = "block"}
Full Example
Complete runnable example with PII scanning
"""MCP tool call with PII scanning enabled.
Demonstrates:
- Blocking SSNs and credit cards in tool inputs
- Warning on email addresses
- Output scanning (warn-only)
"""
import asyncio
import waxell_observe as waxell
from waxell_observe.instrumentors.mcp_instrumentor import configure_session
from waxell_observe.scanning import RegexPIIScanner
waxell.init()
# Import MCP after init() so the instrumentor patches it
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def main():
server_params = StdioServerParameters(
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Configure PII scanning
configure_session(
session,
server_name="filesystem",
governance_config={
"agent_name": "file-assistant",
"scan_inputs": True,
"scan_outputs": True,
"pii_actions": {
"ssn": "block",
"credit_card": "block",
"email": "warn",
"api_key": "block",
"aws_key": "block",
"github_pat": "block",
},
},
)
# This call succeeds -- no PII in arguments
result = await session.call_tool(
name="read_file",
arguments={"path": "/tmp/readme.txt"},
)
print("Safe call succeeded:", result.content[0].text[:100])
# This call is BLOCKED -- SSN in arguments
try:
result = await session.call_tool(
name="write_file",
arguments={
"path": "/tmp/output.txt",
"content": "User SSN: 123-45-6789",
},
)
except Exception as e:
print(f"Blocked as expected: {e}")
if __name__ == "__main__":
asyncio.run(main())
Troubleshooting
PII scanning is not blocking anything
- Check that
governance_configis set. PII scanning only runs whengovernance_configis passed toconfigure_session(). Without it, no scanning occurs. - Check the action level. The default action is
"warn", not"block". You must explicitly set"block"for types you want to stop:"pii_actions": {"ssn": "block", "credit_card": "block"} - Check that
scan_inputsis not disabled. It defaults toTrue, but if you set"scan_inputs": False, input scanning is skipped.
Output scanning detected PII but didn't block
This is expected behavior. Output scanning runs after the tool has already executed, so it cannot block. It records a warning on the span and logs the finding. To prevent sensitive data from being returned, block the input patterns that would cause the server to fetch sensitive data.
Custom scanner is not being used
Make sure you pass the scanner instance in governance_config["pii_scanner"], not pii_actions:
# Correct
governance_config={"pii_scanner": MyScanner()}
# Wrong -- this creates a RegexPIIScanner with these actions
governance_config={"pii_actions": {"ssn": "block"}}
When pii_scanner is provided, pii_actions is ignored. The custom scanner is responsible for its own action mapping.
Scanner errors are silently ignored
PII scanning is fail-open by design. If your custom scanner raises an exception, the tool call proceeds as if no PII was detected, and a warning is logged:
MCP PII scan failed: <error> -- treating as no PII (fail-open)
Check your application logs for these warnings. Fix the scanner bug, then redeploy.
TraceQL queries return no PII results
PII attributes are only set when PII is actually detected. If waxell.mcp.pii_detected is absent from a span, it means no PII was found (not that scanning was skipped). To verify scanning is active, check for waxell.mcp.governance_checked = true on the span.