Skip to main content

Safety Guardrails Agent

A multi-agent safety comparison pipeline that runs user input through 4 safety tools side-by-side -- Lakera Guard, Presidio, PolyGuard, and Azure Content Safety. A safety-scanner child agent invokes all 8 tool methods (check, classify, analyze, anonymize, validate, call, analyze_text, analyze_image), while a safety-evaluator compares detection capabilities, reasons about threat level, and generates safe content using anonymized input.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Multi-framework safety scanning tools

Each tool uses a distinct tool_type to categorize the safety tool in the trace (safety_scanner, pii_scanner, pii_anonymizer, content_safety).

@waxell.tool(tool_type="safety_scanner")
def lakera_check(lakera: MockLakeraGuard, text: str) -> dict:
result = lakera.check(prompt=text)
flagged_categories = [
cat for cat, flagged in result.results[0].categories.items() if flagged
]
return {"framework": "lakera_guard", "flagged": result.flagged, "detected_categories": flagged_categories}

@waxell.tool(tool_type="pii_scanner")
def presidio_analyze(analyzer: MockAnalyzerEngine, text: str) -> dict:
pii_results = analyzer.analyze(text=text, language="en")
return {"framework": "presidio", "entities_detected": len(pii_results)}

@waxell.tool(tool_type="content_safety")
def azure_analyze_text(azure_client: MockContentSafetyClient) -> dict:
result = azure_client.analyze_text()
return {"framework": "azure_content_safety", "modality": "text", "flagged": max_severity >= 2}

Cross-tool comparison and safe generation

The evaluator compares all 4 tools and generates a response using Presidio-anonymized input.

@waxell.observe(agent_name="safety-evaluator", workflow_name="safety-evaluation")
async def run_safety_evaluator(query, scan_results, openai_client, waxell_ctx=None):
comparison = {
"lakera_guard": {"flagged": scan_results["lakera_check"]["flagged"]},
"presidio": {"flagged": scan_results["presidio_analyze"]["entities_detected"] > 0},
"polyguard": {"flagged": not scan_results["polyguard_validate"]["passed"]},
"azure_content_safety": {"flagged": scan_results["azure_text"]["flagged"]},
}

quality = await evaluate_safety_results(comparison=comparison, ...)
waxell.decide("safe_content_action", chosen="generate_with_anonymized", ...)
waxell.score("safety_coverage", 0.95, comment="4 safety tools with full coverage")

What this demonstrates

  • @waxell.tool with 4 tool types -- safety_scanner, pii_scanner, pii_anonymizer, content_safety across 8 tool invocations covering all wrapped methods.
  • @waxell.decision -- safety strategy selection (full_scan/pii_only/content_only).
  • @waxell.reasoning_dec -- cross-tool threat level assessment.
  • waxell.decide() -- safe content action decision after comparing all 4 frameworks.
  • waxell.score() -- safety coverage score and boolean content_safe flag.
  • 4 safety frameworks compared -- Lakera Guard, Presidio, PolyGuard, Azure Content Safety.
  • PII anonymization before LLM -- Presidio anonymizes PII before sending to OpenAI.

Run it

# Dry-run (no API key needed)
python -m app.demos.safety_guardrails_agent --dry-run

# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.safety_guardrails_agent

Source

dev/waxell-dev/app/demos/safety_guardrails_agent.py