Strands Agents

An AWS Strands agent with multi-step DevOps diagnostics across 3 agents. The orchestrator categorizes requests and selects actions, the runner executes health_check, metrics_query, and alert_status tools with finding correlation, and the evaluator assesses severity and generates a status report.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with diagnostic `@tool` decorators and `@retrieval` correlation

The runner executes DevOps tools and correlates findings across health, metrics, and alerts.

@waxell.tool(tool_type="function")
def health_check(services: list) -> dict:
    """Check health status of specified services."""
    return {svc: MOCK_HEALTH_RESULTS.get(svc, {"status": "unknown"}) for svc in services}

@waxell.tool(tool_type="function")
def metrics_query(services: list, window_minutes: int = 15) -> dict:
    """Query performance metrics for specified services."""
    return {svc: MOCK_METRICS.get(svc, {}) for svc in services}

@waxell.retrieval(source="diagnostics")
def correlate_findings(health: dict, metrics: dict, alerts: list) -> list[dict]:
    """Correlate health, metrics, and alerts into actionable findings."""
    findings = []
    for service, health_data in health.items():
        severity = "critical" if health_data.get("status") == "down" \
                   else "warning" if health_data.get("status") == "degraded" else "ok"
        findings.append({"service": service, "severity": severity,
                         "needs_attention": severity != "ok"})
    return sorted(findings, key=lambda f: {"critical": 0, "warning": 1, "ok": 2}[f["severity"]])

Evaluator with `@reasoning` severity assessment

The evaluator assesses overall system severity and generates a status report.

@waxell.reasoning_dec(step="severity_assessment")
async def assess_severity(findings: list) -> dict:
    degraded = [f for f in findings if f["needs_attention"]]
    total_alerts = sum(f["active_alerts"] for f in findings)
    return {
        "thought": f"Found {len(degraded)}/{len(findings)} services needing attention.",
        "evidence": [f"{f['service']}: {f['status']}" for f in findings],
        "conclusion": "Immediate attention required"
                      if any(f["severity"] == "critical" for f in findings)
                      else "Monitoring recommended" if degraded else "All systems healthy",
    }

healthy_count = sum(1 for f in findings if not f["needs_attention"])
waxell.score("system_health", healthy_count / len(findings))
waxell.score("response_quality", 0.91, comment="report completeness")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage
@waxell.step_dec -- request categorization recorded as execution step
@waxell.tool -- health check, metrics query, and alert status tools
@waxell.retrieval -- finding correlation with source="diagnostics"
@waxell.decision -- action selection via OpenAI
waxell.decide() -- manual execution strategy decision
@waxell.reasoning_dec -- severity assessment chain-of-thought
waxell.score() -- system health ratio and response quality scores
Auto-instrumented LLM calls -- interpretation and report generation captured
Strands pattern -- DevOps agent with tool-based diagnostics and correlation

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.strands_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.strands_agent

# Custom query
python -m app.demos.strands_agent --query "Investigate high latency on the model-server"

Source

dev/waxell-dev/app/demos/strands_agent.py

Architecture​

Key Code​

Runner with diagnostic @tool decorators and @retrieval correlation​

Evaluator with @reasoning severity assessment​

What this demonstrates​

Run it​

Source​