PydanticAI

A PydanticAI-style agent pipeline with typed tools and structured output validation across 3 agents. The runner executes typed tools (search_docs) and synthesizes answers, while the evaluator validates the output as a ResearchResult Pydantic model and scores confidence.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with `@tool`, `@retrieval`, and auto-instrumented LLM synthesis

The runner executes typed tools and synthesizes answers from retrieved documents.

@waxell.tool(tool_type="function")
def search_docs(query: str, top_k: int = 3) -> dict:
    """PydanticAI typed tool: search the document corpus."""
    docs = retrieve_documents(query, top_k=top_k)
    return {"docs_found": len(docs), "titles": [d["title"] for d in docs], "documents": docs}

@waxell.retrieval(source="pydanticai-corpus")
def retrieve_and_rank(query: str, documents: list) -> list[dict]:
    """Rank retrieved documents by keyword relevance."""
    query_words = {w.lower() for w in query.split() if len(w) > 2}
    ranked = []
    for doc in documents:
        searchable = f"{doc.get('title', '')} {doc.get('content', '')}".lower()
        score = sum(1 for w in query_words if w in searchable)
        ranked.append({**doc, "relevance_score": round(score / max(len(query_words), 1), 4)})
    ranked.sort(key=lambda d: d["relevance_score"], reverse=True)
    return ranked

Evaluator with structured output validation and `@reasoning`

The evaluator validates the output as a PydanticAI-style typed result and assesses quality.

@waxell.reasoning_dec(step="output_quality_assessment")
async def assess_output_quality(answer: str, documents: list) -> dict:
    coverage = len([t for t in doc_titles if t.lower() in answer.lower()])
    return {
        "thought": f"Answer references {coverage}/{len(documents)} source documents.",
        "evidence": [f"Source: {t}" for t in doc_titles],
        "conclusion": "Answer adequately covers source material" if coverage > 0
                      else "Answer may need more grounding",
    }

# Validate structured output (PydanticAI ResearchResult)
output = {
    "answer": answer, "sources": [...], "confidence": 0.92,
    "valid": True, "output_type": "ResearchResult",
}

waxell.score("confidence", 0.92, comment="answer length and doc coverage")
waxell.score("output_quality", 0.88, comment="structured output validation")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage
@waxell.step_dec -- query preprocessing recorded as execution step
@waxell.tool -- typed tool invocation (search_docs) with tool_type="function"
@waxell.retrieval -- document retrieval and ranking with source="pydanticai-corpus"
@waxell.decision -- agent reasoning for tool selection via OpenAI
@waxell.reasoning_dec -- output quality assessment
waxell.score() -- confidence and output quality scores
Auto-instrumented LLM calls -- OpenAI synthesis call captured automatically
PydanticAI pattern -- typed tools, structured output validation, confidence scoring

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.pydanticai_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.pydanticai_agent

# Custom query
python -m app.demos.pydanticai_agent --query "Explain multi-agent architectures"

Source

dev/waxell-dev/app/demos/pydanticai_agent.py

Architecture​

Key Code​

Runner with @tool, @retrieval, and auto-instrumented LLM synthesis​

Evaluator with structured output validation and @reasoning​

What this demonstrates​

Run it​

Source​