Prompt Guard
Prompt Guard intercepts LLM calls before they are sent and scans prompts for PII, credentials, and prompt injection patterns. It runs client-side with zero network overhead (regex-based), with an optional server-side ML tier for deeper detection.
Quick Start
Enable prompt guard in init():
import waxell_observe
waxell_observe.init(
api_key="wax_sk_...",
api_url="https://acme.waxell.dev",
prompt_guard=True, # Enable client-side guard
prompt_guard_action="block", # "block", "warn", or "redact"
)
Now every auto-instrumented LLM call is scanned automatically:
from openai import OpenAI
client = OpenAI()
# This will raise PromptGuardError if the prompt contains PII
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
)
Actions
| Action | Behavior |
|---|---|
"block" | Raises PromptGuardError. The LLM call is never made. |
"warn" | Logs violations as warnings. The LLM call proceeds with the original prompt. |
"redact" | Replaces sensitive data with ##TYPE## placeholders. The LLM call proceeds with the sanitized prompt. |
What Gets Detected
PII
| Type | Pattern | Example |
|---|---|---|
| SSN | XXX-XX-XXXX | 123-45-6789 |
| Standard email format | user@example.com | |
| Phone | US phone numbers | (555) 123-4567 |
| Credit Card | 16-digit card numbers | 4111-1111-1111-1111 |
Credentials
| Type | Pattern | Example |
|---|---|---|
| Password | password=, pwd: | password=hunter2 |
| API Key | api_key=, apikey: | api_key=sk-abc123 |
| Secret | secret_key=, client_secret: | secret_key=mySecret |
| AWS Key | AKIA prefix | AKIAIOSFODNN7EXAMPLE |
| Generic Token | sk-, pk_live_, etc. | sk-abc123def456... |
| GitHub PAT | ghp_ prefix | ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| Waxell Key | wax_sk_ prefix | wax_sk_abc123 |
Prompt Injection
Detects common prompt injection patterns including:
- "Ignore previous instructions"
- "You are now a..."
- "Forget your instructions"
- "New instructions:"
- System prompt markers (
[system]:,<|system|>) - Jailbreak patterns (DAN mode, developer mode)
Configuration
Via init()
waxell_observe.init(
prompt_guard=True, # Enable local regex guard
prompt_guard_server=True, # Also check server-side ML
prompt_guard_action="redact", # "block", "warn", or "redact"
)
Via Environment Variables
export WAXELL_PROMPT_GUARD="true"
export WAXELL_PROMPT_GUARD_SERVER="true"
export WAXELL_PROMPT_GUARD_ACTION="block"
Handling Blocks
When prompt_guard_action="block", a PromptGuardError is raised:
from waxell_observe import PromptGuardError
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
)
except PromptGuardError as e:
print(f"Blocked: {e}")
print(f"Violations: {e.result.violations}")
print(f"Action: {e.result.action}")
The PromptGuardError.result is a PromptGuardResult:
| Field | Type | Description |
|---|---|---|
passed | bool | True if no violations (or action is "warn") |
action | str | The action taken: "allow", "block", "warn", or "redact" |
violations | list[str] | List of violation descriptions |
source | str | "local", "server", or "both" |
redacted_messages | list | None | Redacted messages (only when action is "redact") |
Redaction
When prompt_guard_action="redact", sensitive data is replaced with ##TYPE## placeholders before the LLM call:
Input: "My email is user@example.com and SSN is 123-45-6789"
Output: "My email is ##EMAIL## and SSN is ##SSN##"
The LLM receives the redacted version. The original is never sent.
Manual Checking
You can also check prompts manually using check_prompt():
from waxell_observe.instrumentors._guard import check_prompt
result = check_prompt(
messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
model="gpt-4o",
)
if result and not result.passed:
print(f"Violations found: {result.violations}")
Server-Side ML Detection
The optional server-side tier uses Presidio and HuggingFace models for deeper detection beyond regex patterns. Enable with prompt_guard_server=True.
Server-side detection adds:
- Named entity recognition for PII
- Contextual credential detection
- ML-powered injection classification
Server results are merged with local regex results — violations from both sources are deduplicated and combined.
Next Steps
- Auto-Instrumentation -- How prompt guard integrates with auto-instrumentation
- Policy & Governance -- Server-side policy enforcement
- Python SDK Reference --
PromptGuardErrorandPromptGuardResulttypes