LiteLLM
A multi-provider comparison pipeline using LiteLLM's unified API to call OpenAI (gpt-4o-mini), Anthropic (claude-sonnet-4-5), and Groq (llama-3.3-70b-versatile) within a single trace. The orchestrator creates a child agent per provider, compares token usage and response quality with @reasoning, and tracks total cost across all providers. Demonstrates LiteLLM's model routing with the provider/model naming convention.
This example requires at least one of OPENAI_API_KEY, ANTHROPIC_API_KEY, or GROQ_API_KEY, plus WAXELL_API_KEY and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
LiteLLM unified API with @tool(llm)
The call_llm function wraps LiteLLM's acompletion with the @tool(llm) decorator for automatic span creation and token tracking.
@waxell_observe.tool(tool_type="llm")
async def call_llm(model: str, messages: list, dry_run: bool):
"""Call LiteLLM or return a mock response."""
if dry_run:
content = _contextual_response(messages)
return MockChatCompletion(content=content, model=model.split("/")[-1])
import litellm
return await litellm.acompletion(model=model, messages=messages)
Per-provider child agents with dynamic creation
Each provider gets its own child agent via a closure inside a loop, enabling per-provider tagging and scoring.
for i, model_config in enumerate(models_to_run, 1):
model = model_config["model"]
provider = model_config["provider"]
tier = model_config["tier"]
@observe(agent_name="litellm-provider-caller", workflow_name=f"call-{provider.lower()}",
session_id=session, client=observe_client, enforce_policy=False)
async def call_provider(query: str, model=model, provider=provider, tier=tier) -> dict:
waxell_observe.tag("provider", provider.lower())
waxell_observe.tag("tier", tier)
waxell_observe.metadata("model", model)
messages = [
{"role": "system", "content": f"You are an AI safety expert. Provide a {tier}-tier analysis."},
{"role": "user", "content": query},
]
response = await call_llm(model, messages, dry_run=args.dry_run)
content = response.choices[0].message.content
tokens = response.usage.prompt_tokens + response.usage.completion_tokens
waxell_observe.score("tokens_in", float(response.usage.prompt_tokens))
return {"provider": provider, "content": content, "tokens": tokens}
result = await call_provider(query)
results.append(result)
What this demonstrates
@waxell.observe-- parent orchestrator with dynamically-created child agents per provider@waxell.tool(llm)-- unified LLM call wrapper for LiteLLM@waxell.step_dec-- query preprocessing with complexity detection@waxell.decision-- comparison strategy selection (all_models vs subset)@waxell.reasoning_dec-- cross-provider token and quality comparisonwaxell.tag()-- per-provider and per-tier taggingwaxell.score()-- token counts and model comparison metricswaxell.metadata()-- model config and comparison conclusions- LiteLLM unified API --
provider/modelrouting convention
Run it
cd dev/waxell-dev
python -m app.demos.litellm_agent --dry-run