Code Review Agent

A multi-agent code review pipeline that coordinates a code-analyzer (static analysis tools, linter, security scanner, coding standards retrieval) and a code-evaluator (reasoning chain, decision, LLM-generated review with retry). Built with Anthropic and waxell-observe decorator patterns.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set ANTHROPIC_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Tool decorators -- static analysis tools

Each tool call is automatically recorded as a span with inputs, outputs, and timing.

@waxell.tool(tool_type="function")
def parse_diff(diff_text: str) -> dict:
    """Parse a code diff and extract file change metadata."""
    return {
        "files_changed": 1,
        "lines_added": 25,
        "lines_removed": 0,
        "files": [{"path": "src/auth/handler.py", "status": "added"}],
    }

@waxell.tool(tool_type="function")
def run_security_scan(file_path: str, scan_type: str = "full") -> dict:
    """Run security scanner on a file."""
    return {
        "vulnerabilities": [
            {"severity": "critical", "type": "sql_injection", "cwe": "CWE-89"},
            {"severity": "high", "type": "hardcoded_secret", "cwe": "CWE-798"},
            {"severity": "medium", "type": "weak_hash", "cwe": "CWE-328"},
        ],
        "risk_score": 9.2,
    }

Retrieval decorator -- coding standards lookup

The @retrieval decorator records source, query, and returned documents automatically.

@waxell.retrieval(source="standards_db")
def retrieve_coding_standards(query: str) -> list[dict]:
    """Retrieve relevant coding standards from the standards database."""
    return [
        {"id": "std-001", "title": "Python Security Best Practices", "score": 0.96},
        {"id": "std-002", "title": "Authentication Implementation Guide", "score": 0.91},
        {"id": "std-003", "title": "JWT Token Standards", "score": 0.87},
    ]

Reasoning chain and decision

Four reasoning steps build up evidence, then a decision decorator records the review outcome with options and confidence.

@waxell.reasoning_dec(step="overall_assessment")
async def overall_assessment(linter_result, test_result, security_result, standards):
    return {
        "thought": "Authentication logic has fundamental security flaws...",
        "evidence": ["linter:2_warnings", "tests:1_failed", "security:3_vulnerabilities"],
        "conclusion": "Request changes: fix SQL injection, use bcrypt, externalize secrets",
    }

@waxell.decision(
    name="review_outcome",
    options=["approve", "request_changes", "suggest_improvements"],
)
async def decide_review_outcome(security_result, test_result) -> dict:
    return {
        "chosen": "request_changes",
        "reasoning": "Critical SQL injection and hardcoded secret make PR unsafe",
        "confidence": 0.95,
    }

Orchestrator with child agent lineage

The orchestrator uses @waxell.observe to create the parent run. Child agents (code-analyzer, code-evaluator) automatically link via WaxellContext lineage.

@waxell.observe(agent_name="code-review-orchestrator", workflow_name="pr-review")
async def run_pipeline(pr_description: str, dry_run: bool = True):
    waxell.tag("demo", "code-review")
    waxell.metadata("pr_number", 42)

    preprocessed = await preprocess_pr(pr_description)          # @step
    analysis = await run_code_analyzer(pr_description, diff)    # child agent
    result = await run_code_evaluator(pr_description, analysis) # child agent
    return result

What this demonstrates

@waxell.tool(tool_type="function") -- four static analysis tools (diff parser, linter, test runner, security scanner) each auto-recorded with inputs, outputs, and timing.
@waxell.retrieval(source="standards_db") -- coding standards lookup recorded as a retrieval span with source attribution.
@waxell.reasoning_dec -- four-step reasoning chain (evaluate warnings, assess test failure, check security, overall assessment) building up structured evidence.
@waxell.decision -- review outcome decision recorded with options, chosen value, reasoning, and confidence score.
@waxell.retry_dec(max_attempts=3) -- LLM review generation with automatic retry tracking (simulates timeout on first attempt).
waxell.score() -- five quality metrics (code_quality, test_coverage, security, style_compliance, overall) attached to the evaluator run.
Nested @waxell.observe -- orchestrator is parent; code-analyzer and code-evaluator are child agents with automatic lineage.
waxell.tag() and waxell.metadata() -- per-agent tags (agent_role, language, provider) and metadata (pr_number, repo) scoped to each run.

Run it

# Dry-run (no API key needed)
python -m app.demos.code_review_agent --dry-run

# Live mode with Anthropic
ANTHROPIC_API_KEY=sk-ant-... python -m app.demos.code_review_agent

Source

dev/waxell-dev/app/demos/code_review_agent.py

Architecture​

Key Code​

Tool decorators -- static analysis tools​

Retrieval decorator -- coding standards lookup​

Reasoning chain and decision​

Orchestrator with child agent lineage​

What this demonstrates​

Run it​

Source​