Content Policy
The content policy category scans agent inputs and outputs for sensitive content using regex-based detection. It covers:
- PII detection -- SSN, email, phone, credit card
- Credential detection -- API keys, passwords, AWS keys, GitHub PATs, Waxell keys
- Prompt injection guard -- 14 patterns detecting common injection attempts
- Custom patterns -- user-defined regex with configurable actions
- Blocked phrases -- case-insensitive substring matching (always blocks)
- Length limits -- max input and output character counts
Unlike the Safety Policy (which also has content filters but always returns WARN), the content handler supports configurable actions per detection type: warn, redact, or block.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
scan_inputs | boolean | true | Scan agent inputs for content violations |
scan_outputs | boolean | true | Scan agent outputs for content violations |
pii_detection | object | {enabled: false} | PII scanning configuration |
pii_detection.enabled | boolean | false | Enable PII detection |
pii_detection.action | string | "warn" | Action on PII: "warn", "redact", or "block" |
pii_detection.types | string[] | all types | PII types to scan: ssn, email, phone, credit_card |
credential_detection | object | {enabled: false} | Credential scanning configuration |
credential_detection.enabled | boolean | false | Enable credential detection |
credential_detection.action | string | "block" | Action on credentials: "warn", "redact", or "block" |
credential_detection.patterns | string[] | all patterns | Patterns to scan (see table below) |
prompt_injection_guard | object | {enabled: false} | Prompt injection detection configuration |
prompt_injection_guard.enabled | boolean | false | Enable injection guard |
prompt_injection_guard.action | string | "block" | Action on injection: "warn", "redact", or "block" |
custom_patterns | object[] | [] | Custom regex patterns (see below) |
blocked_phrases | string[] | [] | Phrases to block (case-insensitive substring match) |
max_input_length | integer | (none) | Maximum characters in input |
max_output_length | integer | (none) | Maximum characters in output |
How It Works
The content handler runs at all three enforcement phases, scanning different data at each:
| Phase | What It Scans | Context Attribute |
|---|---|---|
before_workflow | Agent inputs | context.inputs |
mid_execution | LLM prompt and response | context.prompt_preview, context.response_preview |
after_workflow | Final output | result parameter |
At each phase, the handler runs all enabled checks (PII, credentials, injection, custom patterns, blocked phrases) against the text. If multiple violations are found, the worst action wins: warn < redact < block.
PII Detection
| Type | Pattern | Example Match |
|---|---|---|
ssn | \b\d{3}-\d{2}-\d{4}\b | 123-45-6789 |
email | \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b | user@example.com |
phone | \b(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b | (555) 123-4567 |
credit_card | \b(?:\d{4}[-\s]?){3}\d{4}\b | 4111-1111-1111-1111 |
When pii_detection.action is "redact", matched text is replaced with [REDACTED:ssn], [REDACTED:email], etc.
Credential Detection
| Pattern | What It Matches | Example |
|---|---|---|
password | password=, passwd=, pwd= assignments | password=hunter2 |
api_key | api_key=, apikey=, api_secret= assignments | api_key=abc123 |
secret | secret_key=, access_key=, client_secret= assignments | secret_key=xyz |
aws_key | AKIA prefix + 16 uppercase alphanumeric chars | AKIAIOSFODNN7EXAMPLE |
generic_token | sk-, pk_live_, sk_live_, rk_live_, sk_test_ prefix + 20+ chars | sk-abc123456789012345678901 |
github_pat | ghp_ prefix + 36 alphanumeric chars | ghp_abcdefghijklmnopqrstuvwxyz123456 |
waxell_key | wax_sk_ prefix + alphanumeric chars | wax_sk_abc123 |
When credential_detection.action is "redact", matched text is replaced with [REDACTED:password], [REDACTED:api_key], etc.
Prompt Injection Guard
Detects 14 common prompt injection patterns (English-only):
| Pattern | Example |
|---|---|
| "ignore previous instructions" | "Ignore all previous instructions and..." |
| "ignore above instructions" | "Ignore the above instructions" |
| "you are now a..." | "You are now a helpful hacker" |
| "forget your instructions" | "Forget all your previous instructions" |
| "disregard previous" | "Disregard all previous context" |
| "new instructions:" | "New instructions: do something else" |
| "override system" | "Override your system instructions" |
| "```system" | Code block system injection |
| "[system]:" | Bracket system injection |
| "<|system|>" | Tag system injection |
| "ADMIN MODE ENABLED" | Fake admin mode activation |
| "developer mode enabled" | Fake developer mode |
| "jailbreak" | Direct jailbreak attempt |
| "DAN mode" | "Do Anything Now" mode attempt |
All patterns are case-insensitive.
Custom Patterns
Define your own regex patterns with configurable actions:
{
"custom_patterns": [
{
"name": "Internal IPs",
"pattern": "10\\.\\d+\\.\\d+\\.\\d+",
"action": "warn"
},
{
"name": "Internal URLs",
"pattern": "https?://internal\\.",
"action": "block"
}
]
}
Each pattern object requires:
name-- human-readable label (appears in violation messages)pattern-- regex string (compiled withre.IGNORECASE)action--"warn","redact", or"block"(default:"warn")
Invalid regex patterns are silently skipped. Matched text is truncated to 100 characters in violation reports.
Blocked Phrases
Case-insensitive substring matching. The action is always "block" (hardcoded).
{
"blocked_phrases": [
"ignore previous instructions",
"reveal system prompt",
"jailbreak"
]
}
Unlike PII and credential detection where you can choose warn/redact/block, blocked phrases always produce a block action. There is no way to configure them as warn-only.
Length Limits
max_input_length-- checked at before_workflow, before content scanning. Returns BLOCK if exceeded.max_output_length-- checked at mid_execution (response_preview) and after_workflow (final result). Returns WARN if exceeded.
max_input_length is evaluated before any content scanning. If the input exceeds the limit, the handler returns BLOCK immediately without scanning for PII, credentials, or other violations.
Redaction
When any detection type has action: "redact", the ContentHandler.redact_content(text, rules) method can be called to replace sensitive content with markers:
- PII:
[REDACTED:ssn],[REDACTED:email],[REDACTED:phone],[REDACTED:credit_card] - Credentials:
[REDACTED:password],[REDACTED:api_key],[REDACTED:secret], etc.
Only categories whose action is explicitly "redact" are redacted. A policy with pii_detection.action: "block" and credential_detection.action: "redact" will only redact credentials, not PII.
Example Policies
PII-Only Scanning (Warn)
Detect PII but don't block:
{
"scan_inputs": true,
"scan_outputs": true,
"pii_detection": {
"enabled": true,
"action": "warn",
"types": ["ssn", "credit_card"]
}
}
Full Security (All Checks, Block)
Enable all detection types with block action:
{
"scan_inputs": true,
"scan_outputs": true,
"pii_detection": {
"enabled": true,
"action": "block",
"types": ["ssn", "email", "phone", "credit_card"]
},
"credential_detection": {
"enabled": true,
"action": "block"
},
"prompt_injection_guard": {
"enabled": true,
"action": "block"
},
"blocked_phrases": ["jailbreak", "reveal system prompt"],
"max_input_length": 50000,
"max_output_length": 10000
}
Custom Pattern (Internal IPs)
Warn on internal IP addresses in outputs:
{
"scan_outputs": true,
"custom_patterns": [
{
"name": "Internal IPs",
"pattern": "10\\.\\d+\\.\\d+\\.\\d+",
"action": "warn"
},
{
"name": "Private IPs",
"pattern": "192\\.168\\.\\d+\\.\\d+",
"action": "warn"
}
]
}
Redact Mode
Redact PII and credentials instead of blocking:
{
"pii_detection": {
"enabled": true,
"action": "redact",
"types": ["ssn", "credit_card"]
},
"credential_detection": {
"enabled": true,
"action": "redact"
}
}
SDK Integration
Using the Context Manager
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
waxell.init()
try:
async with waxell.WaxellContext(
agent_name="support-agent",
enforce_policy=True,
inputs={"query": user_query},
) as ctx:
# before_workflow: content handler scans inputs
# If PII/credential/injection detected -> BLOCK/WARN/REDACT
response = await process_query(user_query)
ctx.record_llm_call(
model="gpt-4o-mini",
prompt_preview=user_query[:200],
response_preview=response[:200],
)
# mid_execution: scans prompt_preview and response_preview
ctx.set_result(response)
# after_workflow: scans final result
except PolicyViolationError as e:
print(f"Content block: {e}")
# e.g. "Input content violations: [input] PII detected: ssn"
Using the Decorator
@waxell.observe(
agent_name="support-agent",
enforce_policy=True,
)
async def handle_support(query: str):
# Content scans happen at all three phases
return await process_query(query)
Enforcement Flow
Agent starts (WaxellContext.__aenter__)
|
+-- before_workflow
| |
| +-- scan_inputs disabled? -> ALLOW (skip)
| +-- No input text? -> ALLOW
| +-- max_input_length exceeded? -> BLOCK
| +-- Scan input text:
| +-- PII detection -> action per config
| +-- Credential detection -> action per config
| +-- Prompt injection guard -> action per config
| +-- Custom patterns -> action per pattern
| +-- Blocked phrases -> always BLOCK
| +-- Worst action wins (warn < redact < block)
|
+-- Agent executes...
|
+-- mid_execution (per LLM call)
| |
| +-- scan_inputs? -> scan prompt_preview
| +-- scan_outputs? -> scan response_preview
| +-- scan_outputs? -> check max_output_length on response
| +-- Worst action wins
|
+-- Agent finishes
|
+-- after_workflow
|
+-- scan_outputs disabled? -> ALLOW (skip)
+-- No result? -> ALLOW
+-- max_output_length exceeded? -> WARN
+-- Scan result text (same checks as before_workflow)
+-- Worst action wins
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Content
- Enable detection types (PII, credentials, injection guard)
- Set action per type (warn, redact, block)
- Optionally add custom patterns and blocked phrases
- Set input/output length limits
- Set scope to target specific agents
- Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Content Security",
"category": "content",
"rules": {
"scan_inputs": true,
"scan_outputs": true,
"pii_detection": {
"enabled": true,
"action": "block",
"types": ["ssn", "email", "phone", "credit_card"]
},
"credential_detection": {
"enabled": true,
"action": "block"
},
"prompt_injection_guard": {
"enabled": true,
"action": "block"
},
"blocked_phrases": ["jailbreak", "reveal system prompt"],
"max_input_length": 50000,
"max_output_length": 10000
},
"scope": {
"agents": ["support-agent"]
},
"enabled": true
}'
Observability
Governance Tab
Content evaluations appear with:
| Field | Example (ALLOW) |
|---|---|
| Policy name | Content Security |
| Action | allow |
| Category | content |
| Reason | "Input content scan passed (PII, credentials, injection guard, 2 blocked phrases)" |
For violations:
| Field | Example (BLOCK) |
|---|---|
| Action | block |
| Reason | "Input content violations: [input] PII detected: ssn; [input] Credential detected: api_key" |
| Metadata | {"violations": [{"type": "pii", "message": "[input] PII detected: ssn", "action": "block"}], "scan_target": "input"} |
For prompt injection:
| Field | Example |
|---|---|
| Reason | "Input content violations: [input] Prompt injection pattern: 'Ignore all previous instructions'" |
| Metadata | {"violations": [{"type": "prompt_injection", "message": "...", "action": "block"}]} |
Combining with Other Policies
- Content + Safety: Safety has simpler content filters (pii/profanity/credentials with WARN-only). Content provides more granular control with configurable actions per detection type
- Content + Compliance: HIPAA compliance can require
content.pii_detection.enabled: trueas a required rule - Content + Privacy: Privacy handles data access controls; content handles data leakage detection in text
Common Gotchas
-
blocked_phrasesaction is always"block". You cannot configure them as warn-only. The action is hardcoded in the handler. -
PII detection is regex-based, not ML-based. It can miss edge cases (e.g., SSNs without dashes) and may false-positive on patterns that look like PII (e.g., formatted dates).
-
Prompt injection patterns are English-only. Non-English injection attempts will not be detected by the built-in patterns. Use
custom_patternsfor other languages. -
scan_inputs: falsedisables before_workflow entirely. The handler returns ALLOW immediately without checking anything, includingmax_input_length. -
scan_outputs: falsedisables after_workflow entirely. No output scanning or length checking occurs. -
Custom pattern regex is case-insensitive. All custom patterns are compiled with
re.IGNORECASE. You cannot make them case-sensitive. -
max_input_lengthis checked BEFORE content scanning. If input exceeds the limit, the handler returns BLOCK without running any content detection. This is intentional -- there's no point scanning very large inputs. -
Action priority: warn < redact < block. If PII detection is set to "warn" but credential detection is set to "block", and both trigger, the overall action is "block".
-
Redaction only applies to categories with action
"redact". A policy withpii_detection.action: "block"will not redact PII -- it will block. Set action to"redact"explicitly. -
The
_worst_actionescalation applies per-phase. Each phase independently determines its action. A WARN at before_workflow does not prevent a BLOCK at mid_execution.
Next Steps
- Safety Policy -- Broader safety controls including step/tool limits
- Policy & Governance -- How policy enforcement works
- Compliance Policy -- Meta-validator for regulatory frameworks
- Policy Categories & Templates -- All 26 categories