LiteLLM

A multi-provider comparison pipeline using LiteLLM's unified API to call OpenAI (gpt-4o-mini), Anthropic (claude-sonnet-4-5), and Groq (llama-3.3-70b-versatile) within a single trace. The orchestrator creates a child agent per provider, compares token usage and response quality with @reasoning, and tracks total cost across all providers. Demonstrates LiteLLM's model routing with the provider/model naming convention.

Environment variables

This example requires at least one of OPENAI_API_KEY, ANTHROPIC_API_KEY, or GROQ_API_KEY, plus WAXELL_API_KEY and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

LiteLLM unified API with `@tool(llm)`

The call_llm function wraps LiteLLM's acompletion with the @tool(llm) decorator for automatic span creation and token tracking.

@waxell_observe.tool(tool_type="llm")
async def call_llm(model: str, messages: list, dry_run: bool):
    """Call LiteLLM or return a mock response."""
    if dry_run:
        content = _contextual_response(messages)
        return MockChatCompletion(content=content, model=model.split("/")[-1])

    import litellm
    return await litellm.acompletion(model=model, messages=messages)

Per-provider child agents with dynamic creation

Each provider gets its own child agent via a closure inside a loop, enabling per-provider tagging and scoring.

for i, model_config in enumerate(models_to_run, 1):
    model = model_config["model"]
    provider = model_config["provider"]
    tier = model_config["tier"]

    @observe(agent_name="litellm-provider-caller", workflow_name=f"call-{provider.lower()}",
             session_id=session, client=observe_client, enforce_policy=False)
    async def call_provider(query: str, model=model, provider=provider, tier=tier) -> dict:
        waxell_observe.tag("provider", provider.lower())
        waxell_observe.tag("tier", tier)
        waxell_observe.metadata("model", model)

        messages = [
            {"role": "system", "content": f"You are an AI safety expert. Provide a {tier}-tier analysis."},
            {"role": "user", "content": query},
        ]
        response = await call_llm(model, messages, dry_run=args.dry_run)
        content = response.choices[0].message.content
        tokens = response.usage.prompt_tokens + response.usage.completion_tokens

        waxell_observe.score("tokens_in", float(response.usage.prompt_tokens))
        return {"provider": provider, "content": content, "tokens": tokens}

    result = await call_provider(query)
    results.append(result)

What this demonstrates

@waxell.observe -- parent orchestrator with dynamically-created child agents per provider
@waxell.tool(llm) -- unified LLM call wrapper for LiteLLM
@waxell.step_dec -- query preprocessing with complexity detection
@waxell.decision -- comparison strategy selection (all_models vs subset)
@waxell.reasoning_dec -- cross-provider token and quality comparison
waxell.tag() -- per-provider and per-tier tagging
waxell.score() -- token counts and model comparison metrics
waxell.metadata() -- model config and comparison conclusions
LiteLLM unified API -- provider/model routing convention

Run it

cd dev/waxell-dev
python -m app.demos.litellm_agent --dry-run

Source

dev/waxell-dev/app/demos/litellm_agent.py

Architecture​

Key Code​

LiteLLM unified API with @tool(llm)​

Per-provider child agents with dynamic creation​

What this demonstrates​

Run it​

Source​