Agno

An Agno-style research agent system with multi-step reasoning and tool use across 3 agents. The orchestrator preprocesses queries and selects research strategies, the runner executes web_search and summarizer tools, and the evaluator synthesizes findings and scores research coverage.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with `@tool` and LLM-driven tool selection

The runner reasons about which tools to use, then executes web_search and summarizer in sequence.

@waxell.tool(tool_type="function")
def web_search(query: str, max_results: int = 5) -> dict:
    """Simulate a web search tool (Agno-style)."""
    return {"results": MOCK_SEARCH_RESULTS[:max_results],
            "result_count": min(max_results, len(MOCK_SEARCH_RESULTS))}

@waxell.tool(tool_type="function")
def summarizer(texts: list, max_length: int = 200) -> dict:
    """Summarize a list of texts into a concise overview."""
    combined = " | ".join(texts[:3])
    return {"summary": combined[:max_length], "input_count": len(texts)}

# Agent reasoning -- LLM decides tool use (auto-instrumented)
resp = await openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "system", "content": "Decide which tool to use."},
              {"role": "user", "content": query}],
)

Evaluator with `@reasoning` and quality scores

The evaluator synthesizes a final response and assesses research quality.

@waxell.reasoning_dec(step="research_quality_assessment")
async def assess_research_quality(summary: str, search_results: list, query: str) -> dict:
    result_count = len(search_results)
    summary_covers_query = any(
        keyword in summary.lower() for keyword in query.lower().split() if len(keyword) > 3
    )
    return {
        "thought": f"Research produced {result_count} results synthesized into a summary.",
        "evidence": [f"Search results: {result_count}", f"Summary length: {len(summary)}"],
        "conclusion": "Research adequately covers the topic"
                      if result_count >= 2 and summary_covers_query
                      else "Research may need additional sources",
    }

waxell.score("research_coverage", 0.85, comment="query term coverage")
waxell.score("source_diversity", len(search_results) >= 3, data_type="boolean")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage
@waxell.step_dec -- research query preprocessing recorded as step
@waxell.tool -- web search and summarizer tools with tool_type="function"
@waxell.decision -- research strategy selection via OpenAI
waxell.decide() -- manual tool execution order decision
@waxell.reasoning_dec -- research quality assessment chain-of-thought
waxell.score() -- research coverage and source diversity scores
Auto-instrumented LLM calls -- reasoning, synthesis, and evaluation calls captured
Agno pattern -- multi-step tool reasoning with LLM-driven tool selection

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.agno_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.agno_agent

# Custom query
python -m app.demos.agno_agent --query "Research developments in AI governance"

Source

dev/waxell-dev/app/demos/agno_agent.py

Architecture​

Key Code​

Runner with @tool and LLM-driven tool selection​

Evaluator with @reasoning and quality scores​

What this demonstrates​

Run it​

Source​