LLM Guard Agent

A multi-agent LLM Guard pipeline that coordinates a guard-scanner (runs 4 input scanners and 4 output scanners via @waxell.tool(tool_type="guardrail_scanner")) and a guard-evaluator (reasons about scan results, determines sanitization action, scores pass rates). Demonstrates per-scanner pass/fail attribution and risk score tracking across the full LLM Guard scanner pipeline.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Input and output scanner tools

The tool_type="guardrail_scanner" marks each scanner invocation in the trace, showing which scanners ran and their risk scores.

@waxell.tool(tool_type="guardrail_scanner")
def run_input_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
    """Run a single input scanner from the LLM Guard pipeline."""
    return {
        "passed": passed,
        "action": action,
        "scanner": scanner_name,
        "scan_type": "input",
        "risk_score": risk,
    }

@waxell.tool(tool_type="guardrail_scanner")
def run_output_scanner(scanner_name: str, passed: bool, action: str, risk: float) -> dict:
    """Run a single output scanner from the LLM Guard pipeline."""
    return {
        "passed": passed,
        "action": action,
        "scanner": scanner_name,
        "scan_type": "output",
        "risk_score": risk,
    }

Evaluator with reasoning and inline decision

The evaluator reasons about combined scanner results and decides whether to sanitize, pass through, or block the response.

@waxell.observe(agent_name="guard-evaluator", workflow_name="llm-guard-evaluation")
async def run_guard_evaluator(scan_results: dict, answer: str, waxell_ctx=None):
    waxell.tag("agent_role", "evaluator")
    waxell.tag("framework", "llm_guard")

    quality = await evaluate_scan_results(
        input_results=scan_results["input_results"],
        output_results=scan_results["output_results"],
        total_scanners=total_scanners,
    )

    waxell.decide(
        "sanitization_action",
        chosen="apply_sanitization" if sanitized > 0 else "pass_through",
        options=["apply_sanitization", "pass_through", "block_response"],
        reasoning=f"{sanitized} scanner(s) flagged output",
        confidence=0.88,
    )

    waxell.score("scanner_pass_rate", (total_scanners - sanitized) / total_scanners)
    waxell.score("all_clean", sanitized == 0, data_type="boolean")

What this demonstrates

@waxell.tool(tool_type="guardrail_scanner") -- 8 LLM Guard scanners (4 input, 4 output) recorded with per-scanner risk scores and pass/fail status.
@waxell.step_dec -- scanner config preparation recorded as an execution step.
@waxell.decision -- scan mode selection (full_pipeline/input_only/output_only) based on query content analysis.
@waxell.reasoning_dec -- combined evaluation of all scanner results with thought, evidence, and conclusion.
waxell.decide() -- inline sanitization action decision (apply_sanitization/pass_through/block_response).
waxell.score() with mixed types -- numeric pass rate plus boolean all_clean score.
Auto-instrumented LLM calls -- OpenAI response generation captured automatically.
Nested @waxell.observe -- orchestrator is parent; guard-scanner and guard-evaluator are child agents with automatic lineage.

Run it

# Dry-run (no API key needed)
python -m app.demos.llm_guard_agent --dry-run

# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.llm_guard_agent

Source

dev/waxell-dev/app/demos/llm_guard_agent.py

Architecture​

Key Code​

Input and output scanner tools​

Evaluator with reasoning and inline decision​

What this demonstrates​

Run it​

Source​