Code Review Agent
A multi-agent code review pipeline that coordinates a code-analyzer (static analysis tools, linter, security scanner, coding standards retrieval) and a code-evaluator (reasoning chain, decision, LLM-generated review with retry). Built with Anthropic and waxell-observe decorator patterns.
This example runs in dry-run mode by default (no API key needed). For live mode, set ANTHROPIC_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.
Architecture
Key Code
Tool decorators -- static analysis tools
Each tool call is automatically recorded as a span with inputs, outputs, and timing.
@waxell.tool(tool_type="function")
def parse_diff(diff_text: str) -> dict:
"""Parse a code diff and extract file change metadata."""
return {
"files_changed": 1,
"lines_added": 25,
"lines_removed": 0,
"files": [{"path": "src/auth/handler.py", "status": "added"}],
}
@waxell.tool(tool_type="function")
def run_security_scan(file_path: str, scan_type: str = "full") -> dict:
"""Run security scanner on a file."""
return {
"vulnerabilities": [
{"severity": "critical", "type": "sql_injection", "cwe": "CWE-89"},
{"severity": "high", "type": "hardcoded_secret", "cwe": "CWE-798"},
{"severity": "medium", "type": "weak_hash", "cwe": "CWE-328"},
],
"risk_score": 9.2,
}
Retrieval decorator -- coding standards lookup
The @retrieval decorator records source, query, and returned documents automatically.
@waxell.retrieval(source="standards_db")
def retrieve_coding_standards(query: str) -> list[dict]:
"""Retrieve relevant coding standards from the standards database."""
return [
{"id": "std-001", "title": "Python Security Best Practices", "score": 0.96},
{"id": "std-002", "title": "Authentication Implementation Guide", "score": 0.91},
{"id": "std-003", "title": "JWT Token Standards", "score": 0.87},
]
Reasoning chain and decision
Four reasoning steps build up evidence, then a decision decorator records the review outcome with options and confidence.
@waxell.reasoning_dec(step="overall_assessment")
async def overall_assessment(linter_result, test_result, security_result, standards):
return {
"thought": "Authentication logic has fundamental security flaws...",
"evidence": ["linter:2_warnings", "tests:1_failed", "security:3_vulnerabilities"],
"conclusion": "Request changes: fix SQL injection, use bcrypt, externalize secrets",
}
@waxell.decision(
name="review_outcome",
options=["approve", "request_changes", "suggest_improvements"],
)
async def decide_review_outcome(security_result, test_result) -> dict:
return {
"chosen": "request_changes",
"reasoning": "Critical SQL injection and hardcoded secret make PR unsafe",
"confidence": 0.95,
}
Orchestrator with child agent lineage
The orchestrator uses @waxell.observe to create the parent run. Child agents (code-analyzer, code-evaluator) automatically link via WaxellContext lineage.
@waxell.observe(agent_name="code-review-orchestrator", workflow_name="pr-review")
async def run_pipeline(pr_description: str, dry_run: bool = True, waxell_ctx=None):
waxell.tag("demo", "code-review")
waxell.metadata("pr_number", 42)
preprocessed = await preprocess_pr(pr_description) # @step
analysis = await run_code_analyzer(pr_description, diff) # child agent
result = await run_code_evaluator(pr_description, analysis) # child agent
return result
What this demonstrates
@waxell.tool(tool_type="function")-- four static analysis tools (diff parser, linter, test runner, security scanner) each auto-recorded with inputs, outputs, and timing.@waxell.retrieval(source="standards_db")-- coding standards lookup recorded as a retrieval span with source attribution.@waxell.reasoning_dec-- four-step reasoning chain (evaluate warnings, assess test failure, check security, overall assessment) building up structured evidence.@waxell.decision-- review outcome decision recorded with options, chosen value, reasoning, and confidence score.@waxell.retry_dec(max_attempts=3)-- LLM review generation with automatic retry tracking (simulates timeout on first attempt).waxell.score()-- five quality metrics (code_quality, test_coverage, security, style_compliance, overall) attached to the evaluator run.- Nested
@waxell.observe-- orchestrator is parent; code-analyzer and code-evaluator are child agents with automatic lineage. waxell.tag()andwaxell.metadata()-- per-agent tags (agent_role, language, provider) and metadata (pr_number, repo) scoped to each run.
Run it
# Dry-run (no API key needed)
python -m app.demos.code_review_agent --dry-run
# Live mode with Anthropic
ANTHROPIC_API_KEY=sk-ant-... python -m app.demos.code_review_agent