Quality Policy

The quality policy category validates agent output quality after execution. It inspects the result text against configurable checks: template-based (contains, regex, length, JSON schema) and LLM-based (judge scoring).

Use it when you need to enforce output standards -- required keywords, forbidden phrases, length constraints, structured output validation, or AI-judged quality scoring.

Rules

Rule	Type	Default	Description
`min_confidence_score`	float (0-1)	None	Minimum confidence score threshold. Informational only -- not enforced
`require_sources`	boolean	`false`	Whether sources are required. Informational only -- not enforced
`max_hallucination_score`	float (0-1)	None	Maximum hallucination score. Informational only -- not enforced
`validate_json_output`	boolean	`false`	Whether to validate output as JSON against `output_schema`
`output_schema`	object	None	JSON Schema to validate output against (requires `validate_json_output: true`)
`template_checks`	array	`[]`	Deterministic template-based checks on output text
`llm_checks`	array	`[]`	LLM judge-based quality evaluation
`retry_config`	object	`{}`	Retry configuration: `max_retries`, `feedback_template`

Template Check Types

Type	Fields	Behavior
`contains`	`value`, `action`, `message`	Fails if output does NOT contain `value` (case-insensitive)
`not_contains`	`value`, `action`, `message`	Fails if output DOES contain `value` (case-insensitive)
`regex`	`pattern`, `action`, `invert`, `message`	Matches regex against output. If `invert: true`, fails when pattern IS found
`json_schema`	`schema`, `action`, `message`	Parses output as JSON and validates against provided schema
`length`	`min`, `max`, `action`, `message`	Checks `len(output)` is within `[min, max]` range

Each check has an action field: "warn", "error", or "retry".

LLM Check Configuration

Field	Type	Default	Description
`criteria`	string	required	What to evaluate (e.g. "Response is factually accurate")
`action`	string	`"warn"`	`"warn"`, `"error"`, or `"retry"`
`model`	string	`"gpt-4o-mini"`	Model to use for judging
`threshold`	float (0-1)	`0.5`	Score below this threshold = fail

How It Works

Enforcement Phases

Phase	Behavior
`before_workflow`	Stores rules in context. Always returns ALLOW
`mid_execution`	Not implemented
`after_workflow`	Runs template checks, LLM checks, JSON validation. Returns ALLOW/WARN/BLOCK/RETRY

No Mid-Execution Phase

Quality has no mid_execution phase. The agent always runs to completion before quality checks happen. This means the agent produces output regardless of whether it will pass quality validation.

Action Escalation

Failures are collected, then the worst action determines the result:

Any "error" or "retry" action + retry_config.max_retries > 0 --> RETRY (with feedback)
Any "error" action without retry config --> BLOCK
Only "warn" actions --> WARN

Retry Flow

When a check with action: "retry" or action: "error" fails and retry_config.max_retries > 0:

Quality handler returns RETRY with feedback
Feedback is built from retry_config.feedback_template with a {failures} placeholder
The runtime (or SDK) can use this feedback to re-prompt the agent
Up to max_retries attempts before falling back to BLOCK

Example Policies

Simple Keyword Requirement

Ensure reports include a recommendation:

{
  "template_checks": [
    {
      "type": "contains",
      "value": "recommendation",
      "action": "error",
      "message": "Report must include a recommendation"
    }
  ]
}

Length Constraints

Require output between 100 and 5000 characters:

{
  "template_checks": [
    {
      "type": "length",
      "min": 100,
      "max": 5000,
      "action": "warn",
      "message": "Output should be between 100-5000 characters"
    }
  ]
}

Forbidden Content

Block outputs containing uncertain language:

{
  "template_checks": [
    {
      "type": "not_contains",
      "value": "I don't know",
      "action": "error",
      "message": "Output must not contain uncertain language"
    },
    {
      "type": "not_contains",
      "value": "I'm not sure",
      "action": "error",
      "message": "Output must not express uncertainty"
    }
  ]
}

Combined Quality Gate

Full quality validation with retry:

{
  "template_checks": [
    {
      "type": "contains",
      "value": "recommendation",
      "action": "error",
      "message": "Report must include a recommendation"
    },
    {
      "type": "not_contains",
      "value": "I don't know",
      "action": "error",
      "message": "Report must not contain uncertain language"
    },
    {
      "type": "length",
      "min": 100,
      "max": 5000,
      "action": "warn",
      "message": "Report should be between 100-5000 characters"
    }
  ],
  "retry_config": {
    "max_retries": 2,
    "feedback_template": "Previous response failed: {failures}. Please regenerate."
  }
}

SDK Integration

Using the Context Manager

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
    async with waxell.WaxellContext(
        agent_name="report-generator",
        enforce_policy=True,
    ) as ctx:
        report = await generate_report(query)

        # Quality checks run on this result at after_workflow
        ctx.set_result({"report": report})

except PolicyViolationError as e:
    print(f"Quality block: {e}")
    # e.g. "Report must include a recommendation"

Using the Decorator

@waxell.observe(
    agent_name="report-generator",
    enforce_policy=True,
)
async def generate_report(query: str):
    report = await llm_generate(query)
    return {"report": report}
    # Quality checks run after this function returns

Enforcement Flow

Agent starts (WaxellContext.__aenter__)
    |
    +-- before_workflow: stores quality rules in context
    |
    +-- Agent generates output (no mid_execution checks)
    |   |
    |   +-- LLM calls, tool calls, etc.
    |   +-- ctx.set_result(output)
    |
    +-- Agent completes
    |
    +-- after_workflow: quality validation
        |
        +-- JSON schema validation (if validate_json_output=true)
        +-- Template checks (contains, not_contains, regex, length)
        +-- LLM checks (if llm_judge_fn available)
        |
        +-- Collect all failures
        |
        +-- Any errors + retry config? -> RETRY
        +-- Any errors, no retry? -> BLOCK
        +-- Only warnings? -> WARN
        +-- No failures? -> ALLOW

Creating via Dashboard

Navigate to Governance > Policies
Click New Policy
Select category Quality
Configure template_checks with desired check types
Optionally add LLM checks and retry configuration
Set scope to target specific agents (e.g., report-generator)
Enable

Creating via API

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://acme.waxell.dev/waxell/v1/policies/ \
  -d '{
    "name": "Report Quality Standards",
    "category": "quality",
    "rules": {
      "template_checks": [
        {"type": "contains", "value": "recommendation", "action": "error"},
        {"type": "not_contains", "value": "I don'\''t know", "action": "error"},
        {"type": "length", "min": 100, "max": 5000, "action": "warn"}
      ]
    },
    "scope": {
      "agents": ["report-generator"]
    },
    "enabled": true
  }'

Observability

Governance Tab

Quality evaluations appear with:

Field	Example
Policy name	Report Quality Standards
Action	`allow`, `warn`, `block`, or `retry`
Category	`quality`
Reason	"Report must include a recommendation"
Metadata	`{"failures": ["Report must include a recommendation", "Output length 15 not in range [100, 5000]"]}`

For retries:

Field	Example
Reason	"Report must include a recommendation; Output length 15 not in range [100, 5000]"
Metadata	`{"failures": [...], "retry_feedback": "Previous response failed: ...", "max_retries": 2}`

Common Gotchas

min_confidence_score, require_sources, max_hallucination_score are NOT enforced. They are informational only -- the handler does not check these values. Use template_checks or llm_checks for actual enforcement.
No mid_execution phase. The agent always runs to completion before quality checks happen. If you need to stop the agent mid-execution, use a different policy category (e.g., content for text scanning).
LLM checks require _llm_judge_fn to be injected. This callback is only available in the controlplane flow. In demos and standalone scripts, LLM checks are skipped. Template checks always work.
Template contains check is case-insensitive. "Recommendation" matches "RECOMMENDATION" and "recommendation". Design your checks accordingly.
retry action requires retry_config.max_retries > 0 to actually retry. If retry_config is missing or max_retries is 0, the retry action falls through to BLOCK.
Quality checks operate on str(result). The handler converts the result to a string before running checks. For structured results (dicts), this means the checks run against the string representation.

Next Steps

Policy & Governance -- How policy enforcement works
LLM Policy -- Model allowlists and token limits
Operations Policy -- Timeout enforcement
Policy Categories & Templates -- All 26 categories

Rules​

Template Check Types​

LLM Check Configuration​

How It Works​

Enforcement Phases​

Action Escalation​

Retry Flow​

Example Policies​

Simple Keyword Requirement​

Length Constraints​

Forbidden Content​

Combined Quality Gate​

SDK Integration​

Using the Context Manager​

Using the Decorator​

Enforcement Flow​

Creating via Dashboard​

Creating via API​

Observability​

Governance Tab​

Common Gotchas​

Next Steps​