Skip to main content

Guardrails Agent

A multi-agent guardrails validation pipeline that coordinates a guardrails-runner (input validation, output validation, PII detection via @waxell.tool(tool_type="guardrail_validator")) and a guardrails-evaluator (reasoning about validation results, fix strategy decision, scoring). Built with OpenAI and waxell-observe decorator patterns.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Guardrail validator tool decorators

The tool_type="guardrail_validator" value marks these as guardrails validation calls in the trace, showing which validators ran and whether they passed or required fixes.

@waxell.tool(tool_type="guardrail_validator")
def validate_input(query: str, validators: list) -> dict:
"""Run input validators against the query."""
results = [{"validator": v, "passed": True, "action": "pass"} for v in validators]
return {"framework": "guardrails_ai", "stage": "input", "all_passed": True, "results": results}

@waxell.tool(tool_type="guardrail_validator")
def validate_output(answer: str, validators: list) -> dict:
"""Run output validators against the generated answer."""
results = [{"validator": v, "passed": True, "action": "pass"} for v in validators]
return {"framework": "guardrails_ai", "stage": "output", "all_passed": True, "results": results}

@waxell.tool(tool_type="guardrail_validator")
def check_pii(text: str) -> dict:
"""Check text for PII and redact if found."""
has_pii = "@" in text or "email" in text.lower()
return {
"framework": "guardrails_ai",
"validator": "pii_detection",
"passed": not has_pii,
"action": "fix" if has_pii else "pass",
"pii_found": ["email"] if has_pii else [],
}

Validation level decision

The orchestrator decides how strict validation should be based on query content, recorded as a decision span.

@waxell.decision(
name="select_validation_level",
options=["strict", "standard", "relaxed"],
)
async def select_validation_level(query: str) -> dict:
has_sensitive_terms = any(w in query.lower() for w in ["pii", "personal", "email"])
has_safety_terms = any(w in query.lower() for w in ["safety", "guardrail", "protect"])

if has_sensitive_terms:
return {"chosen": "strict", "reasoning": "Query references sensitive data"}
elif has_safety_terms:
return {"chosen": "standard", "reasoning": "Safety-related query"}
else:
return {"chosen": "relaxed", "reasoning": "General query"}

Evaluator with reasoning and inline decision

The evaluator reasons about combined validation results, then decides on a fix strategy using the inline waxell.decide() convenience function.

@waxell.observe(agent_name="guardrails-evaluator", workflow_name="guardrails-evaluation")
async def run_guardrails_evaluator(validation_results, answer, waxell_ctx=None):
waxell.tag("agent_role", "evaluator")
waxell.tag("framework", "guardrails_ai")

quality = await evaluate_validation_results( # @reasoning
input_result, output_result, pii_result
)

waxell.decide(
"fix_strategy",
chosen="auto_redact" if fixes_applied > 0 else "no_action",
options=["auto_redact", "reject_output", "no_action"],
reasoning=f"{fixes_applied} PII items found -- auto-redaction applied",
confidence=0.92,
)

waxell.score("validation_pass_rate", 1.0, comment="All 7 validators passed or fixed")
waxell.score("pii_clean", pii_result["passed"], data_type="boolean")
return {"answer": answer, "validations_run": 7, "all_passed": True}

What this demonstrates

  • @waxell.tool(tool_type="guardrail_validator") -- three guardrails validation tools (input, output, PII) recorded with the guardrail_validator tool type for clear trace attribution.
  • @waxell.decision -- validation level selection (strict/standard/relaxed) based on query analysis, recorded with options and reasoning.
  • @waxell.reasoning_dec -- combined evaluation of all validation stages (input, output, PII) with structured thought, evidence, and conclusion.
  • waxell.decide() -- inline fix strategy decision (auto_redact/reject_output/no_action) using the top-level convenience function.
  • @waxell.step_dec -- validation config preparation recorded as an execution step.
  • waxell.score() with mixed types -- numeric pass rate plus boolean PII cleanliness score.
  • Auto-instrumented LLM calls -- OpenAI response generation captured automatically via waxell.init().
  • Nested @waxell.observe -- orchestrator is parent; guardrails-runner and guardrails-evaluator are child agents with automatic lineage.
  • Guardrails AI integration pattern -- shows how to wrap Guardrails AI validators with waxell-observe for full validation observability.

Run it

# Dry-run (no API key needed)
python -m app.demos.guardrails_agent --dry-run

# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.guardrails_agent

Source

dev/waxell-dev/app/demos/guardrails_agent.py