Skip to main content

LLM Guard Agent

A multi-agent LLM Guard pipeline that coordinates a guard-scanner (runs 4 input scanners and 4 output scanners via @waxell.tool(tool_type="guardrail_scanner")) and a guard-evaluator (reasons about scan results, determines sanitization action, scores pass rates). Demonstrates per-scanner pass/fail attribution and risk score tracking across the full LLM Guard scanner pipeline.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Input and output scanner tools

The tool_type="guardrail_scanner" marks each scanner invocation in the trace, showing which scanners ran and their risk scores.

@waxell.tool(tool_type="guardrail_scanner")
def run_input_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
"""Run a single input scanner from the LLM Guard pipeline."""
return {
"passed": passed,
"action": action,
"scanner": scanner_name,
"scan_type": "input",
"risk_score": risk,
}

@waxell.tool(tool_type="guardrail_scanner")
def run_output_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
"""Run a single output scanner from the LLM Guard pipeline."""
return {
"passed": passed,
"action": action,
"scanner": scanner_name,
"scan_type": "output",
"risk_score": risk,
}

Evaluator with reasoning and inline decision

The evaluator reasons about combined scanner results and decides whether to sanitize, pass through, or block the response.

@waxell.observe(agent_name="guard-evaluator", workflow_name="llm-guard-evaluation")
async def run_guard_evaluator(scan_results: dict, answer: str, waxell_ctx=None):
waxell.tag("agent_role", "evaluator")
waxell.tag("framework", "llm_guard")

quality = await evaluate_scan_results(
input_results=scan_results["input_results"],
output_results=scan_results["output_results"],
total_scanners=total_scanners,
)

waxell.decide(
"sanitization_action",
chosen="apply_sanitization" if sanitized > 0 else "pass_through",
options=["apply_sanitization", "pass_through", "block_response"],
reasoning=f"{sanitized} scanner(s) flagged output",
confidence=0.88,
)

waxell.score("scanner_pass_rate", (total_scanners - sanitized) / total_scanners)
waxell.score("all_clean", sanitized == 0, data_type="boolean")

What this demonstrates

  • @waxell.tool(tool_type="guardrail_scanner") -- 8 LLM Guard scanners (4 input, 4 output) recorded with per-scanner risk scores and pass/fail status.
  • @waxell.step_dec -- scanner config preparation recorded as an execution step.
  • @waxell.decision -- scan mode selection (full_pipeline/input_only/output_only) based on query content analysis.
  • @waxell.reasoning_dec -- combined evaluation of all scanner results with thought, evidence, and conclusion.
  • waxell.decide() -- inline sanitization action decision (apply_sanitization/pass_through/block_response).
  • waxell.score() with mixed types -- numeric pass rate plus boolean all_clean score.
  • Auto-instrumented LLM calls -- OpenAI response generation captured automatically.
  • Nested @waxell.observe -- orchestrator is parent; guard-scanner and guard-evaluator are child agents with automatic lineage.

Run it

# Dry-run (no API key needed)
python -m app.demos.llm_guard_agent --dry-run

# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.llm_guard_agent

Source

dev/waxell-dev/app/demos/llm_guard_agent.py