Quality Policy
The quality policy category validates agent output quality after execution. It inspects the result text against configurable checks: template-based (contains, regex, length, JSON schema) and LLM-based (judge scoring).
Use it when you need to enforce output standards -- required keywords, forbidden phrases, length constraints, structured output validation, or AI-judged quality scoring.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
min_confidence_score | float (0-1) | None | Minimum confidence score threshold. Informational only -- not enforced |
require_sources | boolean | false | Whether sources are required. Informational only -- not enforced |
max_hallucination_score | float (0-1) | None | Maximum hallucination score. Informational only -- not enforced |
validate_json_output | boolean | false | Whether to validate output as JSON against output_schema |
output_schema | object | None | JSON Schema to validate output against (requires validate_json_output: true) |
template_checks | array | [] | Deterministic template-based checks on output text |
llm_checks | array | [] | LLM judge-based quality evaluation |
retry_config | object | {} | Retry configuration: max_retries, feedback_template |
Template Check Types
| Type | Fields | Behavior |
|---|---|---|
contains | value, action, message | Fails if output does NOT contain value (case-insensitive) |
not_contains | value, action, message | Fails if output DOES contain value (case-insensitive) |
regex | pattern, action, invert, message | Matches regex against output. If invert: true, fails when pattern IS found |
json_schema | schema, action, message | Parses output as JSON and validates against provided schema |
length | min, max, action, message | Checks len(output) is within [min, max] range |
Each check has an action field: "warn", "error", or "retry".
LLM Check Configuration
| Field | Type | Default | Description |
|---|---|---|---|
criteria | string | required | What to evaluate (e.g. "Response is factually accurate") |
action | string | "warn" | "warn", "error", or "retry" |
model | string | "gpt-4o-mini" | Model to use for judging |
threshold | float (0-1) | 0.5 | Score below this threshold = fail |
How It Works
Enforcement Phases
| Phase | Behavior |
|---|---|
before_workflow | Stores rules in context. Always returns ALLOW |
mid_execution | Not implemented |
after_workflow | Runs template checks, LLM checks, JSON validation. Returns ALLOW/WARN/BLOCK/RETRY |
Quality has no mid_execution phase. The agent always runs to completion before quality checks happen. This means the agent produces output regardless of whether it will pass quality validation.
Action Escalation
Failures are collected, then the worst action determines the result:
- Any
"error"or"retry"action +retry_config.max_retries > 0--> RETRY (with feedback) - Any
"error"action without retry config --> BLOCK - Only
"warn"actions --> WARN
Retry Flow
When a check with action: "retry" or action: "error" fails and retry_config.max_retries > 0:
- Quality handler returns RETRY with feedback
- Feedback is built from
retry_config.feedback_templatewith a{failures}placeholder - The runtime (or SDK) can use this feedback to re-prompt the agent
- Up to
max_retriesattempts before falling back to BLOCK
Example Policies
Simple Keyword Requirement
Ensure reports include a recommendation:
{
"template_checks": [
{
"type": "contains",
"value": "recommendation",
"action": "error",
"message": "Report must include a recommendation"
}
]
}
Length Constraints
Require output between 100 and 5000 characters:
{
"template_checks": [
{
"type": "length",
"min": 100,
"max": 5000,
"action": "warn",
"message": "Output should be between 100-5000 characters"
}
]
}
Forbidden Content
Block outputs containing uncertain language:
{
"template_checks": [
{
"type": "not_contains",
"value": "I don't know",
"action": "error",
"message": "Output must not contain uncertain language"
},
{
"type": "not_contains",
"value": "I'm not sure",
"action": "error",
"message": "Output must not express uncertainty"
}
]
}
Combined Quality Gate
Full quality validation with retry:
{
"template_checks": [
{
"type": "contains",
"value": "recommendation",
"action": "error",
"message": "Report must include a recommendation"
},
{
"type": "not_contains",
"value": "I don't know",
"action": "error",
"message": "Report must not contain uncertain language"
},
{
"type": "length",
"min": 100,
"max": 5000,
"action": "warn",
"message": "Report should be between 100-5000 characters"
}
],
"retry_config": {
"max_retries": 2,
"feedback_template": "Previous response failed: {failures}. Please regenerate."
}
}
SDK Integration
Using the Context Manager
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
waxell.init()
try:
async with waxell.WaxellContext(
agent_name="report-generator",
enforce_policy=True,
) as ctx:
report = await generate_report(query)
# Quality checks run on this result at after_workflow
ctx.set_result({"report": report})
except PolicyViolationError as e:
print(f"Quality block: {e}")
# e.g. "Report must include a recommendation"
Using the Decorator
@waxell.observe(
agent_name="report-generator",
enforce_policy=True,
)
async def generate_report(query: str):
report = await llm_generate(query)
return {"report": report}
# Quality checks run after this function returns
Enforcement Flow
Agent starts (WaxellContext.__aenter__)
|
+-- before_workflow: stores quality rules in context
|
+-- Agent generates output (no mid_execution checks)
| |
| +-- LLM calls, tool calls, etc.
| +-- ctx.set_result(output)
|
+-- Agent completes
|
+-- after_workflow: quality validation
|
+-- JSON schema validation (if validate_json_output=true)
+-- Template checks (contains, not_contains, regex, length)
+-- LLM checks (if llm_judge_fn available)
|
+-- Collect all failures
|
+-- Any errors + retry config? -> RETRY
+-- Any errors, no retry? -> BLOCK
+-- Only warnings? -> WARN
+-- No failures? -> ALLOW
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Quality
- Configure template_checks with desired check types
- Optionally add LLM checks and retry configuration
- Set scope to target specific agents (e.g.,
report-generator) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Report Quality Standards",
"category": "quality",
"rules": {
"template_checks": [
{"type": "contains", "value": "recommendation", "action": "error"},
{"type": "not_contains", "value": "I don'\''t know", "action": "error"},
{"type": "length", "min": 100, "max": 5000, "action": "warn"}
]
},
"scope": {
"agents": ["report-generator"]
},
"enabled": true
}'
Observability
Governance Tab
Quality evaluations appear with:
| Field | Example |
|---|---|
| Policy name | Report Quality Standards |
| Action | allow, warn, block, or retry |
| Category | quality |
| Reason | "Report must include a recommendation" |
| Metadata | {"failures": ["Report must include a recommendation", "Output length 15 not in range [100, 5000]"]} |
For retries:
| Field | Example |
|---|---|
| Reason | "Report must include a recommendation; Output length 15 not in range [100, 5000]" |
| Metadata | {"failures": [...], "retry_feedback": "Previous response failed: ...", "max_retries": 2} |
Common Gotchas
-
min_confidence_score,require_sources,max_hallucination_scoreare NOT enforced. They are informational only -- the handler does not check these values. Use template_checks or llm_checks for actual enforcement. -
No
mid_executionphase. The agent always runs to completion before quality checks happen. If you need to stop the agent mid-execution, use a different policy category (e.g., content for text scanning). -
LLM checks require
_llm_judge_fnto be injected. This callback is only available in the controlplane flow. In demos and standalone scripts, LLM checks are skipped. Template checks always work. -
Template
containscheck is case-insensitive. "Recommendation" matches "RECOMMENDATION" and "recommendation". Design your checks accordingly. -
retryaction requiresretry_config.max_retries > 0to actually retry. If retry_config is missing or max_retries is 0, the retry action falls through to BLOCK. -
Quality checks operate on
str(result). The handler converts the result to a string before running checks. For structured results (dicts), this means the checks run against the string representation.
Next Steps
- Policy & Governance -- How policy enforcement works
- LLM Policy -- Model allowlists and token limits
- Operations Policy -- Timeout enforcement
- Policy Categories & Templates -- All 26 categories