Skip to main content

Agno

An Agno-style research agent system with multi-step reasoning and tool use across 3 agents. The orchestrator preprocesses queries and selects research strategies, the runner executes web_search and summarizer tools, and the evaluator synthesizes findings and scores research coverage.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with @tool and LLM-driven tool selection

The runner reasons about which tools to use, then executes web_search and summarizer in sequence.

@waxell.tool(tool_type="function")
def web_search(query: str, max_results: int = 5) -> dict:
"""Simulate a web search tool (Agno-style)."""
return {"results": MOCK_SEARCH_RESULTS[:max_results],
"result_count": min(max_results, len(MOCK_SEARCH_RESULTS))}

@waxell.tool(tool_type="function")
def summarizer(texts: list, max_length: int = 200) -> dict:
"""Summarize a list of texts into a concise overview."""
combined = " | ".join(texts[:3])
return {"summary": combined[:max_length], "input_count": len(texts)}

# Agent reasoning -- LLM decides tool use (auto-instrumented)
resp = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": "Decide which tool to use."},
{"role": "user", "content": query}],
)

Evaluator with @reasoning and quality scores

The evaluator synthesizes a final response and assesses research quality.

@waxell.reasoning_dec(step="research_quality_assessment")
async def assess_research_quality(summary: str, search_results: list, query: str) -> dict:
result_count = len(search_results)
summary_covers_query = any(
keyword in summary.lower() for keyword in query.lower().split() if len(keyword) > 3
)
return {
"thought": f"Research produced {result_count} results synthesized into a summary.",
"evidence": [f"Search results: {result_count}", f"Summary length: {len(summary)}"],
"conclusion": "Research adequately covers the topic"
if result_count >= 2 and summary_covers_query
else "Research may need additional sources",
}

waxell.score("research_coverage", 0.85, comment="query term coverage")
waxell.score("source_diversity", len(search_results) >= 3, data_type="boolean")

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy with automatic lineage
  • @waxell.step_dec -- research query preprocessing recorded as step
  • @waxell.tool -- web search and summarizer tools with tool_type="function"
  • @waxell.decision -- research strategy selection via OpenAI
  • waxell.decide() -- manual tool execution order decision
  • @waxell.reasoning_dec -- research quality assessment chain-of-thought
  • waxell.score() -- research coverage and source diversity scores
  • Auto-instrumented LLM calls -- reasoning, synthesis, and evaluation calls captured
  • Agno pattern -- multi-step tool reasoning with LLM-driven tool selection

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.agno_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.agno_agent

# Custom query
python -m app.demos.agno_agent --query "Research developments in AI governance"

Source

dev/waxell-dev/app/demos/agno_agent.py