Skip to main content

LiteLLM Integration

Use LiteLLM's unified API to call multiple LLM providers (OpenAI, Anthropic, Groq, etc.) with consistent observability. waxell.init() patches LiteLLM automatically -- every litellm.completion(...) and litellm.acompletion(...) call is captured with no manual recording.

What is LiteLLM?

LiteLLM provides a unified interface to 100+ LLM providers. Instead of learning each provider's SDK, you use one API:

import litellm

# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])

# Anthropic
response = litellm.completion(model="anthropic/claude-sonnet-4", messages=[...])

# Groq
response = litellm.completion(model="groq/llama-3.3-70b-versatile", messages=[...])

Quick Start

Call init() before importing LiteLLM -- that's all the setup auto-instrumentation needs.

import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://waxell.dev")

# Import AFTER init() so LiteLLM is auto-instrumented
import litellm

@waxell.observe(agent_name="litellm-agent", workflow_name="multi-provider")
async def ask(query: str) -> str:
# LLM call auto-captured -- model, tokens, cost all recorded
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content


# Run it
answer = await ask("Hello!", session_id="sess_001", user_id="user_alice")

Multi-Provider Comparison

Compare responses across providers in a single trace. Each litellm.acompletion(...) is auto-captured with the right provider and model.

import waxell_observe as waxell
waxell.init(api_key="wax_sk_...")

import litellm

MODELS = [
{"model": "gpt-4o-mini", "provider": "OpenAI", "tier": "fast"},
{"model": "anthropic/claude-sonnet-4", "provider": "Anthropic", "tier": "premium"},
{"model": "groq/llama-3.3-70b-versatile", "provider": "Groq", "tier": "open-source"},
]


@waxell.observe(agent_name="litellm-demo", workflow_name="multi-provider")
async def compare_providers(query: str) -> dict:
waxell.tag("demo", "litellm")
waxell.tag("providers", "openai,anthropic,groq")
waxell.metadata("num_models", len(MODELS))

results = []
for config in MODELS:
messages = [
{"role": "system", "content": f"Provide a {config['tier']}-tier analysis."},
{"role": "user", "content": query},
]

# Each call auto-captured by init()
response = await litellm.acompletion(model=config["model"], messages=messages)
content = response.choices[0].message.content

waxell.step(
f"call_{config['provider'].lower()}",
output={
"model": config["model"],
"tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
"content_length": len(content),
},
)
results.append({
"provider": config["provider"],
"model": config["model"],
"tier": config["tier"],
"content": content,
"tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
})

comparison = {
r["provider"]: {
"model": r["model"],
"tokens": r["tokens"],
"content_length": len(r["content"]),
}
for r in results
}
waxell.step("compare_providers", output=comparison)
return {"comparison": comparison, "results": results}


# Run comparison
results = await compare_providers(
"Compare AI safety approaches",
session_id="sess_compare_001",
)

Model Name Conventions

LiteLLM uses prefixes to identify providers:

ProviderModel FormatExample
OpenAImodel-namegpt-4o, gpt-4o-mini
Anthropicanthropic/modelanthropic/claude-sonnet-4
Groqgroq/modelgroq/llama-3.3-70b-versatile
Bedrockbedrock/modelbedrock/anthropic.claude-3
Azureazure/deploymentazure/gpt-4-deployment
Coherecohere/modelcohere/command-r

Costs and Tokens

Auto-instrumentation pulls model, tokens_in, tokens_out, and cost straight from the LiteLLM response. You don't need to record anything manually -- inspect them in the dashboard or read response.usage.* in code if you need the values inline.

@waxell.observe(agent_name="cost-aware")
async def ask(query: str) -> str:
response = await litellm.acompletion(
model="groq/llama-3.3-70b-versatile",
messages=[{"role": "user", "content": query}],
)
# response.usage.prompt_tokens, response.usage.completion_tokens are still available
# but they're already captured in the trace -- no need to record again
return response.choices[0].message.content

Fallback Chains

LiteLLM's fallbacks feature works transparently -- the actual model used is captured in the auto-instrumented span.

@waxell.observe(agent_name="fallback-agent")
async def ask_with_fallback(query: str) -> str:
response = await litellm.acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
fallbacks=["anthropic/claude-sonnet-4", "groq/llama-3.3-70b-versatile"],
)
# Tag with the model that actually answered
waxell.tag("actual_model", response.model)
return response.choices[0].message.content

Streaming with LiteLLM

Auto-instrumentation handles LiteLLM streams the same way it handles OpenAI / Anthropic streams -- iterate the stream as normal:

@waxell.observe(agent_name="streaming-litellm")
async def stream_litellm(query: str) -> str:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
stream=True,
)

content = ""
async for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content, end="", flush=True)

return content

See Streaming Integration for advanced patterns when you need custom stream processing.

Environment Variables

LiteLLM reads API keys from environment:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."

Best Practices

  1. Call init() before importing litellm -- patches the SDK for auto-instrumentation
  2. Use provider prefixes -- anthropic/claude-sonnet-4, not just claude-sonnet-4
  3. Tag with response.model for fallbacks -- waxell.tag("actual_model", response.model)
  4. Pass session_id at call time -- correlate multi-provider comparisons
  5. Skip manual record_llm_call -- auto-instrumentation already captures model, tokens, and cost

Supported Providers

LiteLLM supports 100+ providers. Common ones with cost tracking in Waxell:

ProviderCost TrackingNotes
OpenAIFullGPT-4, GPT-4o, etc.
AnthropicFullClaude models
GroqFullLlama, Mixtral on Groq
GoogleFullGemini models
MistralFullMistral models
CoherePartialCommand-R
BedrockFullAWS Bedrock models

Next Steps