Claude Agents

A Claude-based security analysis agent system with tool use and multi-turn reasoning across 3 agents. The orchestrator analyzes tasks and selects tools via Anthropic, the runner executes code_scanner and dependency_checker tools then generates fix suggestions, and the evaluator assesses fix quality and coverage.

Environment variables

This example requires ANTHROPIC_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Tool selection via Anthropic and security scanning tools

The orchestrator uses Anthropic to decide which tools to run, then the runner executes them.

@waxell.decision(name="select_tools", options=["code_scanner", "dependency_checker", "both"])
async def select_tools(query: str, anthropic_client) -> dict:
    response = await anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250929", max_tokens=512,
        messages=[{"role": "user", "content": f"Decide which tools to use. Request: {query}"}],
    )
    return json.loads(response.content[0].text)

@waxell.tool(tool_type="function")
def code_scanner(path: str, scan_type: str = "security") -> dict:
    """Scan source code for security vulnerabilities."""
    return MOCK_SCAN_RESULTS

@waxell.tool(tool_type="function")
def dependency_checker(manifest_path: str = "requirements.txt") -> dict:
    """Check project dependencies for known CVEs."""
    return MOCK_DEPENDENCY_RESULTS

Evaluator with `@reasoning` fix assessment

The evaluator assesses whether all critical issues are addressed in the suggested fixes.

@waxell.reasoning_dec(step="fix_assessment")
async def assess_fixes(fixes_text: str, scan_results: dict) -> dict:
    issues = scan_results.get("issues", [])
    critical_count = sum(1 for i in issues if i["severity"] == "critical")
    high_count = sum(1 for i in issues if i["severity"] == "high")
    return {
        "thought": f"Evaluating fixes for {len(issues)} issues "
                   f"({critical_count} critical, {high_count} high).",
        "evidence": [f"{i['type']} in {i['file']}:{i['line']}" for i in issues],
        "conclusion": "All critical issues addressed" if critical_count <= 1
                      else "Multiple critical issues need review",
    }

waxell.score("fix_quality", 0.88, comment="coverage of critical issues")
waxell.score("coverage", 0.92, data_type="float", comment="issues with suggested fixes")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage
@waxell.step_dec -- task analysis recorded as execution step
@waxell.tool -- code scanner and dependency checker tools
@waxell.decision -- tool selection via Anthropic Claude
@waxell.reasoning_dec -- fix quality assessment chain-of-thought
waxell.score() -- fix quality and coverage scores
waxell.tag() / waxell.metadata() -- provider (anthropic), tools executed metadata
Auto-instrumented LLM calls -- two Anthropic claude-sonnet calls captured automatically
Claude Agents pattern -- security analysis with code scanning and fix generation

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.claude_agents_agent --dry-run

# Live (real Anthropic)
export ANTHROPIC_API_KEY="sk-ant-..."
python -m app.demos.claude_agents_agent

# Custom query
python -m app.demos.claude_agents_agent --query "Review authentication flow for vulnerabilities"

Source

dev/waxell-dev/app/demos/claude_agents_agent.py

Architecture​

Key Code​

Tool selection via Anthropic and security scanning tools​

Evaluator with @reasoning fix assessment​

What this demonstrates​

Run it​

Source​