LLM Guard Agent
A multi-agent LLM Guard pipeline that coordinates a guard-scanner (runs 4 input scanners and 4 output scanners via @waxell.tool(tool_type="guardrail_scanner")) and a guard-evaluator (reasons about scan results, determines sanitization action, scores pass rates). Demonstrates per-scanner pass/fail attribution and risk score tracking across the full LLM Guard scanner pipeline.
This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.
Architecture
Key Code
Input and output scanner tools
The tool_type="guardrail_scanner" marks each scanner invocation in the trace, showing which scanners ran and their risk scores.
@waxell.tool(tool_type="guardrail_scanner")
def run_input_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
"""Run a single input scanner from the LLM Guard pipeline."""
return {
"passed": passed,
"action": action,
"scanner": scanner_name,
"scan_type": "input",
"risk_score": risk,
}
@waxell.tool(tool_type="guardrail_scanner")
def run_output_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
"""Run a single output scanner from the LLM Guard pipeline."""
return {
"passed": passed,
"action": action,
"scanner": scanner_name,
"scan_type": "output",
"risk_score": risk,
}
Evaluator with reasoning and inline decision
The evaluator reasons about combined scanner results and decides whether to sanitize, pass through, or block the response.
@waxell.observe(agent_name="guard-evaluator", workflow_name="llm-guard-evaluation")
async def run_guard_evaluator(scan_results: dict, answer: str, waxell_ctx=None):
waxell.tag("agent_role", "evaluator")
waxell.tag("framework", "llm_guard")
quality = await evaluate_scan_results(
input_results=scan_results["input_results"],
output_results=scan_results["output_results"],
total_scanners=total_scanners,
)
waxell.decide(
"sanitization_action",
chosen="apply_sanitization" if sanitized > 0 else "pass_through",
options=["apply_sanitization", "pass_through", "block_response"],
reasoning=f"{sanitized} scanner(s) flagged output",
confidence=0.88,
)
waxell.score("scanner_pass_rate", (total_scanners - sanitized) / total_scanners)
waxell.score("all_clean", sanitized == 0, data_type="boolean")
What this demonstrates
@waxell.tool(tool_type="guardrail_scanner")-- 8 LLM Guard scanners (4 input, 4 output) recorded with per-scanner risk scores and pass/fail status.@waxell.step_dec-- scanner config preparation recorded as an execution step.@waxell.decision-- scan mode selection (full_pipeline/input_only/output_only) based on query content analysis.@waxell.reasoning_dec-- combined evaluation of all scanner results with thought, evidence, and conclusion.waxell.decide()-- inline sanitization action decision (apply_sanitization/pass_through/block_response).waxell.score()with mixed types -- numeric pass rate plus boolean all_clean score.- Auto-instrumented LLM calls -- OpenAI response generation captured automatically.
- Nested
@waxell.observe-- orchestrator is parent; guard-scanner and guard-evaluator are child agents with automatic lineage.
Run it
# Dry-run (no API key needed)
python -m app.demos.llm_guard_agent --dry-run
# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.llm_guard_agent