Skip to main content

Quality Policy

The quality policy category validates agent output quality after execution. It inspects the result text against configurable checks: template-based (contains, regex, length, JSON schema) and LLM-based (judge scoring).

Use it when you need to enforce output standards -- required keywords, forbidden phrases, length constraints, structured output validation, or AI-judged quality scoring.

Rules

RuleTypeDefaultDescription
min_confidence_scorefloat (0-1)NoneMinimum confidence score threshold. Informational only -- not enforced
require_sourcesbooleanfalseWhether sources are required. Informational only -- not enforced
max_hallucination_scorefloat (0-1)NoneMaximum hallucination score. Informational only -- not enforced
validate_json_outputbooleanfalseWhether to validate output as JSON against output_schema
output_schemaobjectNoneJSON Schema to validate output against (requires validate_json_output: true)
template_checksarray[]Deterministic template-based checks on output text
llm_checksarray[]LLM judge-based quality evaluation
retry_configobject{}Retry configuration: max_retries, feedback_template

Template Check Types

TypeFieldsBehavior
containsvalue, action, messageFails if output does NOT contain value (case-insensitive)
not_containsvalue, action, messageFails if output DOES contain value (case-insensitive)
regexpattern, action, invert, messageMatches regex against output. If invert: true, fails when pattern IS found
json_schemaschema, action, messageParses output as JSON and validates against provided schema
lengthmin, max, action, messageChecks len(output) is within [min, max] range

Each check has an action field: "warn", "error", or "retry".

LLM Check Configuration

FieldTypeDefaultDescription
criteriastringrequiredWhat to evaluate (e.g. "Response is factually accurate")
actionstring"warn""warn", "error", or "retry"
modelstring"gpt-4o-mini"Model to use for judging
thresholdfloat (0-1)0.5Score below this threshold = fail

How It Works

Enforcement Phases

PhaseBehavior
before_workflowStores rules in context. Always returns ALLOW
mid_executionNot implemented
after_workflowRuns template checks, LLM checks, JSON validation. Returns ALLOW/WARN/BLOCK/RETRY
No Mid-Execution Phase

Quality has no mid_execution phase. The agent always runs to completion before quality checks happen. This means the agent produces output regardless of whether it will pass quality validation.

Action Escalation

Failures are collected, then the worst action determines the result:

  1. Any "error" or "retry" action + retry_config.max_retries > 0 --> RETRY (with feedback)
  2. Any "error" action without retry config --> BLOCK
  3. Only "warn" actions --> WARN

Retry Flow

When a check with action: "retry" or action: "error" fails and retry_config.max_retries > 0:

  1. Quality handler returns RETRY with feedback
  2. Feedback is built from retry_config.feedback_template with a {failures} placeholder
  3. The runtime (or SDK) can use this feedback to re-prompt the agent
  4. Up to max_retries attempts before falling back to BLOCK

Example Policies

Simple Keyword Requirement

Ensure reports include a recommendation:

{
"template_checks": [
{
"type": "contains",
"value": "recommendation",
"action": "error",
"message": "Report must include a recommendation"
}
]
}

Length Constraints

Require output between 100 and 5000 characters:

{
"template_checks": [
{
"type": "length",
"min": 100,
"max": 5000,
"action": "warn",
"message": "Output should be between 100-5000 characters"
}
]
}

Forbidden Content

Block outputs containing uncertain language:

{
"template_checks": [
{
"type": "not_contains",
"value": "I don't know",
"action": "error",
"message": "Output must not contain uncertain language"
},
{
"type": "not_contains",
"value": "I'm not sure",
"action": "error",
"message": "Output must not express uncertainty"
}
]
}

Combined Quality Gate

Full quality validation with retry:

{
"template_checks": [
{
"type": "contains",
"value": "recommendation",
"action": "error",
"message": "Report must include a recommendation"
},
{
"type": "not_contains",
"value": "I don't know",
"action": "error",
"message": "Report must not contain uncertain language"
},
{
"type": "length",
"min": 100,
"max": 5000,
"action": "warn",
"message": "Report should be between 100-5000 characters"
}
],
"retry_config": {
"max_retries": 2,
"feedback_template": "Previous response failed: {failures}. Please regenerate."
}
}

SDK Integration

Using the Context Manager

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
async with waxell.WaxellContext(
agent_name="report-generator",
enforce_policy=True,
) as ctx:
report = await generate_report(query)

# Quality checks run on this result at after_workflow
ctx.set_result({"report": report})

except PolicyViolationError as e:
print(f"Quality block: {e}")
# e.g. "Report must include a recommendation"

Using the Decorator

@waxell.observe(
agent_name="report-generator",
enforce_policy=True,
)
async def generate_report(query: str):
report = await llm_generate(query)
return {"report": report}
# Quality checks run after this function returns

Enforcement Flow

Agent starts (WaxellContext.__aenter__)
|
+-- before_workflow: stores quality rules in context
|
+-- Agent generates output (no mid_execution checks)
| |
| +-- LLM calls, tool calls, etc.
| +-- ctx.set_result(output)
|
+-- Agent completes
|
+-- after_workflow: quality validation
|
+-- JSON schema validation (if validate_json_output=true)
+-- Template checks (contains, not_contains, regex, length)
+-- LLM checks (if llm_judge_fn available)
|
+-- Collect all failures
|
+-- Any errors + retry config? -> RETRY
+-- Any errors, no retry? -> BLOCK
+-- Only warnings? -> WARN
+-- No failures? -> ALLOW

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Quality
  4. Configure template_checks with desired check types
  5. Optionally add LLM checks and retry configuration
  6. Set scope to target specific agents (e.g., report-generator)
  7. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Report Quality Standards",
"category": "quality",
"rules": {
"template_checks": [
{"type": "contains", "value": "recommendation", "action": "error"},
{"type": "not_contains", "value": "I don'\''t know", "action": "error"},
{"type": "length", "min": 100, "max": 5000, "action": "warn"}
]
},
"scope": {
"agents": ["report-generator"]
},
"enabled": true
}'

Observability

Governance Tab

Quality evaluations appear with:

FieldExample
Policy nameReport Quality Standards
Actionallow, warn, block, or retry
Categoryquality
Reason"Report must include a recommendation"
Metadata{"failures": ["Report must include a recommendation", "Output length 15 not in range [100, 5000]"]}

For retries:

FieldExample
Reason"Report must include a recommendation; Output length 15 not in range [100, 5000]"
Metadata{"failures": [...], "retry_feedback": "Previous response failed: ...", "max_retries": 2}

Common Gotchas

  1. min_confidence_score, require_sources, max_hallucination_score are NOT enforced. They are informational only -- the handler does not check these values. Use template_checks or llm_checks for actual enforcement.

  2. No mid_execution phase. The agent always runs to completion before quality checks happen. If you need to stop the agent mid-execution, use a different policy category (e.g., content for text scanning).

  3. LLM checks require _llm_judge_fn to be injected. This callback is only available in the controlplane flow. In demos and standalone scripts, LLM checks are skipped. Template checks always work.

  4. Template contains check is case-insensitive. "Recommendation" matches "RECOMMENDATION" and "recommendation". Design your checks accordingly.

  5. retry action requires retry_config.max_retries > 0 to actually retry. If retry_config is missing or max_retries is 0, the retry action falls through to BLOCK.

  6. Quality checks operate on str(result). The handler converts the result to a string before running checks. For structured results (dicts), this means the checks run against the string representation.

Next Steps