Input Validation Policy

The input-validation policy category is a pre-flight data validator -- it checks inbound data before the agent begins execution. It validates emptiness, size limits, input type, HTML/script injection, and JSON schema compliance.

Use it when you need to ensure agents only process well-formed, safe, appropriately-sized inputs.

Rules

Rule	Type	Default	Description
`validate_schema`	boolean	`false`	Enable JSON schema validation against `input_schema`
`input_schema`	object	`{}`	JSON Schema-like object with a `required` field list
`max_input_size_kb`	integer	`100`	Maximum input size in kilobytes
`allowed_input_types`	string[]	`[]` (allow all)	Allowed input types: `text`, `json`, `binary`
`reject_empty_input`	boolean	`true`	Block empty/null/empty-dict inputs
`sanitize_html`	boolean	`true`	Block inputs containing `<script` or `javascript:`
`action_on_violation`	string	`"block"`	Action when validation fails: `block` or `warn`

How It Works

The input validation handler runs at before_workflow only. It reads context.inputs -- the dict or string passed to WaxellContext(inputs=...). There is no mid_execution phase. The after_workflow phase is a stub that always returns ALLOW.

Validation Order

Checks run in this exact order, short-circuiting on first failure:

Empty check -- is the input null, empty string, or empty dict?
Size check -- does the serialized input exceed max_input_size_kb?
Type check -- is the input type in allowed_input_types?
HTML check -- does the input contain <script or javascript:?
Schema check -- are all required fields present?

If all checks pass, the handler returns ALLOW with reason "Input validation passed".

Validation Details

Empty Check

The handler considers these values empty:

None
"" (empty string)
{} (empty dict)

{"query": ""} is NOT empty

A dict with keys -- even if the values are empty strings -- passes the empty check. Only None, "", and {} are considered empty. If you need to reject dicts with empty values, use schema validation with required fields.

Size Check

The handler serializes inputs to a JSON string (for dicts/lists) or converts to str(), then measures the UTF-8 byte length in kilobytes. Uses strict > comparison:

size_kb = len(json.dumps(inputs).encode("utf-8")) / 1024
if size_kb > max_input_size_kb: BLOCK

Type Check

Type detection is based on Python types:

dict or list = "json"
Everything else = "text"

If allowed_input_types is an empty list [], all types are allowed. Only a non-empty list restricts types.

HTML/Script Check

Case-insensitive substring match in the serialized input string:

<script (catches <script>, <SCRIPT>, <Script type="...">)
javascript: (catches javascript:alert(), JAVASCRIPT:void(0))

This is a simple substring check, not a full HTML parser or sanitizer.

Schema Check

Only checks the required field from input_schema. For each field name in the required list, verifies the key exists in the input dict. Does not validate field types, formats, or nested structures.

{
  "input_schema": {
    "required": ["query", "user_id"]
  }
}

With this schema, {"query": "hello", "user_id": "u1"} passes but {"query": "hello"} fails with "Required field 'user_id' missing from input".

Schema validation is basic

The input_schema only checks required field presence in dict inputs. It does not perform full JSON Schema validation (no type checks, no pattern matching, no nested object validation). For complex validation needs, implement custom logic in your agent.

Example Policies

Strict API Input

Full validation -- schema, size limit, type restriction, and HTML sanitization:

{
  "validate_schema": true,
  "input_schema": {
    "required": ["query"]
  },
  "max_input_size_kb": 100,
  "allowed_input_types": ["json"],
  "reject_empty_input": true,
  "sanitize_html": true,
  "action_on_violation": "block"
}

Lenient (Reject Empty Only)

Minimal validation -- only reject null/empty inputs:

{
  "reject_empty_input": true,
  "sanitize_html": false,
  "validate_schema": false,
  "allowed_input_types": [],
  "action_on_violation": "warn"
}

Size-Limited (Large Payload Protection)

Protect against oversized inputs without other restrictions:

{
  "max_input_size_kb": 50,
  "reject_empty_input": true,
  "sanitize_html": true,
  "validate_schema": false,
  "allowed_input_types": [],
  "action_on_violation": "block"
}

SDK Integration

Using the Context Manager

Input validation reads context.inputs, so the inputs dict passed to WaxellContext is what gets validated:

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
    async with waxell.WaxellContext(
        agent_name="processor",
        inputs={"query": user_input, "format": "json"},
        enforce_policy=True,
    ) as ctx:
        # If input validation policy is active and inputs fail
        # validation -> PolicyViolationError raised here
        # (before any agent work happens)

        result = await process_data(user_input)
        ctx.set_result(result)

except PolicyViolationError as e:
    print(f"Input validation block: {e}")
    # e.g. "Empty input rejected"
    # e.g. "Input size (150.3KB) exceeds limit (100KB)"
    # e.g. "HTML/script content detected in input"
    # e.g. "Required field 'query' missing from input"

Using the Decorator

@waxell.observe(
    agent_name="processor",
    enforce_policy=True,
)
async def process_input(query: str):
    # Input validation happens before this function body runs
    # The inputs dict is constructed from the function arguments
    return await process_data(query)

Enforcement Flow

Agent starts (WaxellContext.__aenter__ or decorator entry)
    |
    +-- before_workflow governance runs
        |
        +-- Input validation handler reads context.inputs
        |
        +-- reject_empty_input?
        |   +-- inputs is None / "" / {} -> violation
        |   +-- otherwise -> continue
        |
        +-- max_input_size_kb?
        |   +-- serialize to JSON/str, measure UTF-8 bytes
        |   +-- size > limit -> violation
        |   +-- otherwise -> continue
        |
        +-- allowed_input_types (non-empty)?
        |   +-- dict/list = "json", else "text"
        |   +-- type not in list -> violation
        |   +-- otherwise -> continue
        |
        +-- sanitize_html?
        |   +-- "<script" or "javascript:" in input -> violation
        |   +-- otherwise -> continue
        |
        +-- validate_schema + input_schema?
        |   +-- check required fields in dict
        |   +-- missing field -> violation
        |   +-- otherwise -> continue
        |
        +-- All checks pass -> ALLOW
        |
        +-- Violation?
            +-- action_on_violation = "block" -> BLOCK (PolicyViolationError)
            +-- action_on_violation = "warn" -> WARN (agent continues)

The input validation handler only runs at before_workflow. There is no mid_execution check. The after_workflow phase always returns ALLOW with reason "Input validation audit complete".

Creating via Dashboard

Navigate to Governance > Policies
Click New Policy
Select category Input Validation
Configure the validation rules:
- Enable reject_empty_input to block null/empty inputs
- Set max_input_size_kb for size limits
- Set allowed_input_types to restrict input formats
- Enable sanitize_html to block script injection
- Enable validate_schema and set input_schema with required fields
Set action_on_violation to block or warn
Set scope to target specific agents (e.g., input-validation-agent)
Enable

Creating via API

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://acme.waxell.dev/waxell/v1/policies/ \
  -d '{
    "name": "Input Validation",
    "category": "input-validation",
    "rules": {
      "validate_schema": true,
      "input_schema": {"required": ["query"]},
      "max_input_size_kb": 100,
      "allowed_input_types": ["json"],
      "reject_empty_input": true,
      "sanitize_html": true,
      "action_on_violation": "block"
    },
    "scope": {
      "agents": ["processor"]
    },
    "enabled": true
  }'

Observability

Governance Tab

Input validation evaluations appear with:

Field	Example
Policy name	Input Validation
Action	`allow`, `warn`, or `block`
Category	`input-validation`
Reason	"Input validation passed"

For violations:

Field	Example
Reason	"Empty input rejected"
Reason	"Input size (150.3KB) exceeds limit (100KB)"
Reason	"Input type 'json' not in allowed types: text"
Reason	"HTML/script content detected in input"
Reason	"Required field 'query' missing from input"
Metadata	`{"reject_empty_input": true}`
Metadata	`{"input_size_kb": 150.3, "limit_kb": 100}`
Metadata	`{"input_type": "json", "allowed_types": ["text"]}`
Metadata	`{"sanitize_html": true}`
Metadata	`{"missing_field": "query", "required": ["query"]}`

After-Workflow

The after_workflow phase always returns ALLOW with reason "Input validation audit complete". This is a stub -- no post-execution validation is performed.

Common Gotchas

{"query": ""} is NOT empty. Only None, "", and {} are treated as empty. A dict with any keys passes the empty check, even if the values are empty strings.
Schema validation is basic. It only checks that required field names exist as keys in the input dict. It does not validate types, patterns, or nested structures. {"query": 123} passes a schema requiring "query".
allowed_input_types: [] means ALL types allowed. An empty list does not block anything. To restrict types, you must explicitly list the allowed types (e.g., ["json"]).
HTML check is simple substring matching. It searches for <script and javascript: (case-insensitive) in the serialized input. It does not parse HTML, does not catch all XSS vectors, and may produce false positives on legitimate content mentioning those strings.
No mid_execution phase. All validation happens at before_workflow. Once inputs pass validation, the agent runs without further input checks. If your agent accepts additional inputs during execution, they are not validated by this handler.
after_workflow is a stub. It always returns ALLOW. Do not rely on it for post-execution validation.
Size is measured on serialized form. Dicts and lists are serialized with json.dumps() before measuring. The JSON serialization adds characters (braces, quotes, colons), so the measured size is slightly larger than the raw content.
Type detection uses Python isinstance. dict and list are "json", everything else is "text". If you pass a string that contains JSON, it is classified as "text", not "json".

Next Steps

Policy & Governance -- How policy enforcement works
Compliance Policy -- Meta-validator for regulatory frameworks
Content Policy -- Govern output content
Code Execution Policy -- Govern generated code
Policy Categories & Templates -- All 26 categories

Rules​

How It Works​

Validation Order​

Validation Details​

Empty Check​

Size Check​

Type Check​

HTML/Script Check​

Schema Check​

Example Policies​

Strict API Input​

Lenient (Reject Empty Only)​

Size-Limited (Large Payload Protection)​

SDK Integration​

Using the Context Manager​

Using the Decorator​

Enforcement Flow​

Creating via Dashboard​

Creating via API​

Observability​

Governance Tab​

After-Workflow​

Common Gotchas​

Next Steps​