Safety Guardrails Agent
A multi-agent safety comparison pipeline that runs user input through 4 safety tools side-by-side -- Lakera Guard, Presidio, PolyGuard, and Azure Content Safety. A safety-scanner child agent invokes all 8 tool methods (check, classify, analyze, anonymize, validate, call, analyze_text, analyze_image), while a safety-evaluator compares detection capabilities, reasons about threat level, and generates safe content using anonymized input.
This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.
Architecture
Key Code
Multi-framework safety scanning tools
Each tool uses a distinct tool_type to categorize the safety tool in the trace (safety_scanner, pii_scanner, pii_anonymizer, content_safety).
@waxell.tool(tool_type="safety_scanner")
def lakera_check(lakera: MockLakeraGuard, text: str) -> dict:
result = lakera.check(prompt=text)
flagged_categories = [
cat for cat, flagged in result.results[0].categories.items() if flagged
]
return {"framework": "lakera_guard", "flagged": result.flagged, "detected_categories": flagged_categories}
@waxell.tool(tool_type="pii_scanner")
def presidio_analyze(analyzer: MockAnalyzerEngine, text: str) -> dict:
pii_results = analyzer.analyze(text=text, language="en")
return {"framework": "presidio", "entities_detected": len(pii_results)}
@waxell.tool(tool_type="content_safety")
def azure_analyze_text(azure_client: MockContentSafetyClient) -> dict:
result = azure_client.analyze_text()
return {"framework": "azure_content_safety", "modality": "text", "flagged": max_severity >= 2}
Cross-tool comparison and safe generation
The evaluator compares all 4 tools and generates a response using Presidio-anonymized input.
@waxell.observe(agent_name="safety-evaluator", workflow_name="safety-evaluation")
async def run_safety_evaluator(query, scan_results, openai_client, waxell_ctx=None):
comparison = {
"lakera_guard": {"flagged": scan_results["lakera_check"]["flagged"]},
"presidio": {"flagged": scan_results["presidio_analyze"]["entities_detected"] > 0},
"polyguard": {"flagged": not scan_results["polyguard_validate"]["passed"]},
"azure_content_safety": {"flagged": scan_results["azure_text"]["flagged"]},
}
quality = await evaluate_safety_results(comparison=comparison, ...)
waxell.decide("safe_content_action", chosen="generate_with_anonymized", ...)
waxell.score("safety_coverage", 0.95, comment="4 safety tools with full coverage")
What this demonstrates
@waxell.toolwith 4 tool types -- safety_scanner, pii_scanner, pii_anonymizer, content_safety across 8 tool invocations covering all wrapped methods.@waxell.decision-- safety strategy selection (full_scan/pii_only/content_only).@waxell.reasoning_dec-- cross-tool threat level assessment.waxell.decide()-- safe content action decision after comparing all 4 frameworks.waxell.score()-- safety coverage score and boolean content_safe flag.- 4 safety frameworks compared -- Lakera Guard, Presidio, PolyGuard, Azure Content Safety.
- PII anonymization before LLM -- Presidio anonymizes PII before sending to OpenAI.
Run it
# Dry-run (no API key needed)
python -m app.demos.safety_guardrails_agent --dry-run
# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.safety_guardrails_agent