Skip to main content

LiteLLM

A multi-provider comparison pipeline using LiteLLM's unified API to call OpenAI (gpt-4o-mini), Anthropic (claude-sonnet-4-5), and Groq (llama-3.3-70b-versatile) within a single trace. The orchestrator creates a child agent per provider, compares token usage and response quality with @reasoning, and tracks total cost across all providers. Demonstrates LiteLLM's model routing with the provider/model naming convention.

Environment variables

This example requires at least one of OPENAI_API_KEY, ANTHROPIC_API_KEY, or GROQ_API_KEY, plus WAXELL_API_KEY and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

LiteLLM unified API with @tool(llm)

The call_llm function wraps LiteLLM's acompletion with the @tool(llm) decorator for automatic span creation and token tracking.

@waxell_observe.tool(tool_type="llm")
async def call_llm(model: str, messages: list, dry_run: bool):
"""Call LiteLLM or return a mock response."""
if dry_run:
content = _contextual_response(messages)
return MockChatCompletion(content=content, model=model.split("/")[-1])

import litellm
return await litellm.acompletion(model=model, messages=messages)

Per-provider child agents with dynamic creation

Each provider gets its own child agent via a closure inside a loop, enabling per-provider tagging and scoring.

for i, model_config in enumerate(models_to_run, 1):
model = model_config["model"]
provider = model_config["provider"]
tier = model_config["tier"]

@observe(agent_name="litellm-provider-caller", workflow_name=f"call-{provider.lower()}",
session_id=session, client=observe_client, enforce_policy=False)
async def call_provider(query: str, model=model, provider=provider, tier=tier) -> dict:
waxell_observe.tag("provider", provider.lower())
waxell_observe.tag("tier", tier)
waxell_observe.metadata("model", model)

messages = [
{"role": "system", "content": f"You are an AI safety expert. Provide a {tier}-tier analysis."},
{"role": "user", "content": query},
]
response = await call_llm(model, messages, dry_run=args.dry_run)
content = response.choices[0].message.content
tokens = response.usage.prompt_tokens + response.usage.completion_tokens

waxell_observe.score("tokens_in", float(response.usage.prompt_tokens))
return {"provider": provider, "content": content, "tokens": tokens}

result = await call_provider(query)
results.append(result)

What this demonstrates

  • @waxell.observe -- parent orchestrator with dynamically-created child agents per provider
  • @waxell.tool(llm) -- unified LLM call wrapper for LiteLLM
  • @waxell.step_dec -- query preprocessing with complexity detection
  • @waxell.decision -- comparison strategy selection (all_models vs subset)
  • @waxell.reasoning_dec -- cross-provider token and quality comparison
  • waxell.tag() -- per-provider and per-tier tagging
  • waxell.score() -- token counts and model comparison metrics
  • waxell.metadata() -- model config and comparison conclusions
  • LiteLLM unified API -- provider/model routing convention

Run it

cd dev/waxell-dev
python -m app.demos.litellm_agent --dry-run

Source

dev/waxell-dev/app/demos/litellm_agent.py