Input Validation Policy
The input-validation policy category is a pre-flight data validator -- it checks inbound data before the agent begins execution. It validates emptiness, size limits, input type, HTML/script injection, and JSON schema compliance.
Use it when you need to ensure agents only process well-formed, safe, appropriately-sized inputs.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
validate_schema | boolean | false | Enable JSON schema validation against input_schema |
input_schema | object | {} | JSON Schema-like object with a required field list |
max_input_size_kb | integer | 100 | Maximum input size in kilobytes |
allowed_input_types | string[] | [] (allow all) | Allowed input types: text, json, binary |
reject_empty_input | boolean | true | Block empty/null/empty-dict inputs |
sanitize_html | boolean | true | Block inputs containing <script or javascript: |
action_on_violation | string | "block" | Action when validation fails: block or warn |
How It Works
The input validation handler runs at before_workflow only. It reads context.inputs -- the dict or string passed to WaxellContext(inputs=...). There is no mid_execution phase. The after_workflow phase is a stub that always returns ALLOW.
Validation Order
Checks run in this exact order, short-circuiting on first failure:
- Empty check -- is the input null, empty string, or empty dict?
- Size check -- does the serialized input exceed
max_input_size_kb? - Type check -- is the input type in
allowed_input_types? - HTML check -- does the input contain
<scriptorjavascript:? - Schema check -- are all
requiredfields present?
If all checks pass, the handler returns ALLOW with reason "Input validation passed".
Validation Details
Empty Check
The handler considers these values empty:
None""(empty string){}(empty dict)
{"query": ""} is NOT emptyA dict with keys -- even if the values are empty strings -- passes the empty check. Only None, "", and {} are considered empty. If you need to reject dicts with empty values, use schema validation with required fields.
Size Check
The handler serializes inputs to a JSON string (for dicts/lists) or converts to str(), then measures the UTF-8 byte length in kilobytes. Uses strict > comparison:
size_kb = len(json.dumps(inputs).encode("utf-8")) / 1024
if size_kb > max_input_size_kb: BLOCK
Type Check
Type detection is based on Python types:
dictorlist="json"- Everything else =
"text"
If allowed_input_types is an empty list [], all types are allowed. Only a non-empty list restricts types.
HTML/Script Check
Case-insensitive substring match in the serialized input string:
<script(catches<script>,<SCRIPT>,<Script type="...">)javascript:(catchesjavascript:alert(),JAVASCRIPT:void(0))
This is a simple substring check, not a full HTML parser or sanitizer.
Schema Check
Only checks the required field from input_schema. For each field name in the required list, verifies the key exists in the input dict. Does not validate field types, formats, or nested structures.
{
"input_schema": {
"required": ["query", "user_id"]
}
}
With this schema, {"query": "hello", "user_id": "u1"} passes but {"query": "hello"} fails with "Required field 'user_id' missing from input".
The input_schema only checks required field presence in dict inputs. It does not perform full JSON Schema validation (no type checks, no pattern matching, no nested object validation). For complex validation needs, implement custom logic in your agent.
Example Policies
Strict API Input
Full validation -- schema, size limit, type restriction, and HTML sanitization:
{
"validate_schema": true,
"input_schema": {
"required": ["query"]
},
"max_input_size_kb": 100,
"allowed_input_types": ["json"],
"reject_empty_input": true,
"sanitize_html": true,
"action_on_violation": "block"
}
Lenient (Reject Empty Only)
Minimal validation -- only reject null/empty inputs:
{
"reject_empty_input": true,
"sanitize_html": false,
"validate_schema": false,
"allowed_input_types": [],
"action_on_violation": "warn"
}
Size-Limited (Large Payload Protection)
Protect against oversized inputs without other restrictions:
{
"max_input_size_kb": 50,
"reject_empty_input": true,
"sanitize_html": true,
"validate_schema": false,
"allowed_input_types": [],
"action_on_violation": "block"
}
SDK Integration
Using the Context Manager
Input validation reads context.inputs, so the inputs dict passed to WaxellContext is what gets validated:
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
waxell.init()
try:
async with waxell.WaxellContext(
agent_name="processor",
inputs={"query": user_input, "format": "json"},
enforce_policy=True,
) as ctx:
# If input validation policy is active and inputs fail
# validation -> PolicyViolationError raised here
# (before any agent work happens)
result = await process_data(user_input)
ctx.set_result(result)
except PolicyViolationError as e:
print(f"Input validation block: {e}")
# e.g. "Empty input rejected"
# e.g. "Input size (150.3KB) exceeds limit (100KB)"
# e.g. "HTML/script content detected in input"
# e.g. "Required field 'query' missing from input"
Using the Decorator
@waxell.observe(
agent_name="processor",
enforce_policy=True,
)
async def process_input(query: str):
# Input validation happens before this function body runs
# The inputs dict is constructed from the function arguments
return await process_data(query)
Enforcement Flow
Agent starts (WaxellContext.__aenter__ or decorator entry)
|
+-- before_workflow governance runs
|
+-- Input validation handler reads context.inputs
|
+-- reject_empty_input?
| +-- inputs is None / "" / {} -> violation
| +-- otherwise -> continue
|
+-- max_input_size_kb?
| +-- serialize to JSON/str, measure UTF-8 bytes
| +-- size > limit -> violation
| +-- otherwise -> continue
|
+-- allowed_input_types (non-empty)?
| +-- dict/list = "json", else "text"
| +-- type not in list -> violation
| +-- otherwise -> continue
|
+-- sanitize_html?
| +-- "<script" or "javascript:" in input -> violation
| +-- otherwise -> continue
|
+-- validate_schema + input_schema?
| +-- check required fields in dict
| +-- missing field -> violation
| +-- otherwise -> continue
|
+-- All checks pass -> ALLOW
|
+-- Violation?
+-- action_on_violation = "block" -> BLOCK (PolicyViolationError)
+-- action_on_violation = "warn" -> WARN (agent continues)
The input validation handler only runs at before_workflow. There is no mid_execution check. The after_workflow phase always returns ALLOW with reason "Input validation audit complete".
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Input Validation
- Configure the validation rules:
- Enable
reject_empty_inputto block null/empty inputs - Set
max_input_size_kbfor size limits - Set
allowed_input_typesto restrict input formats - Enable
sanitize_htmlto block script injection - Enable
validate_schemaand setinput_schemawith required fields
- Enable
- Set
action_on_violationtoblockorwarn - Set scope to target specific agents (e.g.,
input-validation-agent) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Input Validation",
"category": "input-validation",
"rules": {
"validate_schema": true,
"input_schema": {"required": ["query"]},
"max_input_size_kb": 100,
"allowed_input_types": ["json"],
"reject_empty_input": true,
"sanitize_html": true,
"action_on_violation": "block"
},
"scope": {
"agents": ["processor"]
},
"enabled": true
}'
Observability
Governance Tab
Input validation evaluations appear with:
| Field | Example |
|---|---|
| Policy name | Input Validation |
| Action | allow, warn, or block |
| Category | input-validation |
| Reason | "Input validation passed" |
For violations:
| Field | Example |
|---|---|
| Reason | "Empty input rejected" |
| Reason | "Input size (150.3KB) exceeds limit (100KB)" |
| Reason | "Input type 'json' not in allowed types: text" |
| Reason | "HTML/script content detected in input" |
| Reason | "Required field 'query' missing from input" |
| Metadata | {"reject_empty_input": true} |
| Metadata | {"input_size_kb": 150.3, "limit_kb": 100} |
| Metadata | {"input_type": "json", "allowed_types": ["text"]} |
| Metadata | {"sanitize_html": true} |
| Metadata | {"missing_field": "query", "required": ["query"]} |
After-Workflow
The after_workflow phase always returns ALLOW with reason "Input validation audit complete". This is a stub -- no post-execution validation is performed.
Common Gotchas
-
{"query": ""}is NOT empty. OnlyNone,"", and{}are treated as empty. A dict with any keys passes the empty check, even if the values are empty strings. -
Schema validation is basic. It only checks that
requiredfield names exist as keys in the input dict. It does not validate types, patterns, or nested structures.{"query": 123}passes a schema requiring"query". -
allowed_input_types: []means ALL types allowed. An empty list does not block anything. To restrict types, you must explicitly list the allowed types (e.g.,["json"]). -
HTML check is simple substring matching. It searches for
<scriptandjavascript:(case-insensitive) in the serialized input. It does not parse HTML, does not catch all XSS vectors, and may produce false positives on legitimate content mentioning those strings. -
No mid_execution phase. All validation happens at before_workflow. Once inputs pass validation, the agent runs without further input checks. If your agent accepts additional inputs during execution, they are not validated by this handler.
-
after_workflow is a stub. It always returns ALLOW. Do not rely on it for post-execution validation.
-
Size is measured on serialized form. Dicts and lists are serialized with
json.dumps()before measuring. The JSON serialization adds characters (braces, quotes, colons), so the measured size is slightly larger than the raw content. -
Type detection uses Python isinstance.
dictandlistare "json", everything else is "text". If you pass a string that contains JSON, it is classified as "text", not "json".
Next Steps
- Policy & Governance -- How policy enforcement works
- Compliance Policy -- Meta-validator for regulatory frameworks
- Content Policy -- Govern output content
- Code Execution Policy -- Govern generated code
- Policy Categories & Templates -- All 26 categories