Claude Agents
A Claude-based security analysis agent system with tool use and multi-turn reasoning across 3 agents. The orchestrator analyzes tasks and selects tools via Anthropic, the runner executes code_scanner and dependency_checker tools then generates fix suggestions, and the evaluator assesses fix quality and coverage.
Environment variables
This example requires ANTHROPIC_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
Tool selection via Anthropic and security scanning tools
The orchestrator uses Anthropic to decide which tools to run, then the runner executes them.
@waxell.decision(name="select_tools", options=["code_scanner", "dependency_checker", "both"])
async def select_tools(query: str, anthropic_client) -> dict:
response = await anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929", max_tokens=512,
messages=[{"role": "user", "content": f"Decide which tools to use. Request: {query}"}],
)
return json.loads(response.content[0].text)
@waxell.tool(tool_type="function")
def code_scanner(path: str, scan_type: str = "security") -> dict:
"""Scan source code for security vulnerabilities."""
return MOCK_SCAN_RESULTS
@waxell.tool(tool_type="function")
def dependency_checker(manifest_path: str = "requirements.txt") -> dict:
"""Check project dependencies for known CVEs."""
return MOCK_DEPENDENCY_RESULTS
Evaluator with @reasoning fix assessment
The evaluator assesses whether all critical issues are addressed in the suggested fixes.
@waxell.reasoning_dec(step="fix_assessment")
async def assess_fixes(fixes_text: str, scan_results: dict) -> dict:
issues = scan_results.get("issues", [])
critical_count = sum(1 for i in issues if i["severity"] == "critical")
high_count = sum(1 for i in issues if i["severity"] == "high")
return {
"thought": f"Evaluating fixes for {len(issues)} issues "
f"({critical_count} critical, {high_count} high).",
"evidence": [f"{i['type']} in {i['file']}:{i['line']}" for i in issues],
"conclusion": "All critical issues addressed" if critical_count <= 1
else "Multiple critical issues need review",
}
waxell.score("fix_quality", 0.88, comment="coverage of critical issues")
waxell.score("coverage", 0.92, data_type="float", comment="issues with suggested fixes")
What this demonstrates
@waxell.observe-- parent-child agent hierarchy with automatic lineage@waxell.step_dec-- task analysis recorded as execution step@waxell.tool-- code scanner and dependency checker tools@waxell.decision-- tool selection via Anthropic Claude@waxell.reasoning_dec-- fix quality assessment chain-of-thoughtwaxell.score()-- fix quality and coverage scoreswaxell.tag()/waxell.metadata()-- provider (anthropic), tools executed metadata- Auto-instrumented LLM calls -- two Anthropic claude-sonnet calls captured automatically
- Claude Agents pattern -- security analysis with code scanning and fix generation
Run it
# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.claude_agents_agent --dry-run
# Live (real Anthropic)
export ANTHROPIC_API_KEY="sk-ant-..."
python -m app.demos.claude_agents_agent
# Custom query
python -m app.demos.claude_agents_agent --query "Review authentication flow for vulnerabilities"