LiteLLM Integration
Use LiteLLM's unified API to call multiple LLM providers (OpenAI, Anthropic, Groq, etc.) with consistent observability. waxell.init() patches LiteLLM automatically -- every litellm.completion(...) and litellm.acompletion(...) call is captured with no manual recording.
What is LiteLLM?
LiteLLM provides a unified interface to 100+ LLM providers. Instead of learning each provider's SDK, you use one API:
import litellm
# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])
# Anthropic
response = litellm.completion(model="anthropic/claude-sonnet-4", messages=[...])
# Groq
response = litellm.completion(model="groq/llama-3.3-70b-versatile", messages=[...])
Quick Start
Call init() before importing LiteLLM -- that's all the setup auto-instrumentation needs.
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://waxell.dev")
# Import AFTER init() so LiteLLM is auto-instrumented
import litellm
@waxell.observe(agent_name="litellm-agent", workflow_name="multi-provider")
async def ask(query: str) -> str:
# LLM call auto-captured -- model, tokens, cost all recorded
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content
# Run it
answer = await ask("Hello!", session_id="sess_001", user_id="user_alice")
Multi-Provider Comparison
Compare responses across providers in a single trace. Each litellm.acompletion(...) is auto-captured with the right provider and model.
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...")
import litellm
MODELS = [
{"model": "gpt-4o-mini", "provider": "OpenAI", "tier": "fast"},
{"model": "anthropic/claude-sonnet-4", "provider": "Anthropic", "tier": "premium"},
{"model": "groq/llama-3.3-70b-versatile", "provider": "Groq", "tier": "open-source"},
]
@waxell.observe(agent_name="litellm-demo", workflow_name="multi-provider")
async def compare_providers(query: str) -> dict:
waxell.tag("demo", "litellm")
waxell.tag("providers", "openai,anthropic,groq")
waxell.metadata("num_models", len(MODELS))
results = []
for config in MODELS:
messages = [
{"role": "system", "content": f"Provide a {config['tier']}-tier analysis."},
{"role": "user", "content": query},
]
# Each call auto-captured by init()
response = await litellm.acompletion(model=config["model"], messages=messages)
content = response.choices[0].message.content
waxell.step(
f"call_{config['provider'].lower()}",
output={
"model": config["model"],
"tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
"content_length": len(content),
},
)
results.append({
"provider": config["provider"],
"model": config["model"],
"tier": config["tier"],
"content": content,
"tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
})
comparison = {
r["provider"]: {
"model": r["model"],
"tokens": r["tokens"],
"content_length": len(r["content"]),
}
for r in results
}
waxell.step("compare_providers", output=comparison)
return {"comparison": comparison, "results": results}
# Run comparison
results = await compare_providers(
"Compare AI safety approaches",
session_id="sess_compare_001",
)
Model Name Conventions
LiteLLM uses prefixes to identify providers:
| Provider | Model Format | Example |
|---|---|---|
| OpenAI | model-name | gpt-4o, gpt-4o-mini |
| Anthropic | anthropic/model | anthropic/claude-sonnet-4 |
| Groq | groq/model | groq/llama-3.3-70b-versatile |
| Bedrock | bedrock/model | bedrock/anthropic.claude-3 |
| Azure | azure/deployment | azure/gpt-4-deployment |
| Cohere | cohere/model | cohere/command-r |
Costs and Tokens
Auto-instrumentation pulls model, tokens_in, tokens_out, and cost straight from the LiteLLM response. You don't need to record anything manually -- inspect them in the dashboard or read response.usage.* in code if you need the values inline.
@waxell.observe(agent_name="cost-aware")
async def ask(query: str) -> str:
response = await litellm.acompletion(
model="groq/llama-3.3-70b-versatile",
messages=[{"role": "user", "content": query}],
)
# response.usage.prompt_tokens, response.usage.completion_tokens are still available
# but they're already captured in the trace -- no need to record again
return response.choices[0].message.content
Fallback Chains
LiteLLM's fallbacks feature works transparently -- the actual model used is captured in the auto-instrumented span.
@waxell.observe(agent_name="fallback-agent")
async def ask_with_fallback(query: str) -> str:
response = await litellm.acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
fallbacks=["anthropic/claude-sonnet-4", "groq/llama-3.3-70b-versatile"],
)
# Tag with the model that actually answered
waxell.tag("actual_model", response.model)
return response.choices[0].message.content
Streaming with LiteLLM
Auto-instrumentation handles LiteLLM streams the same way it handles OpenAI / Anthropic streams -- iterate the stream as normal:
@waxell.observe(agent_name="streaming-litellm")
async def stream_litellm(query: str) -> str:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
stream=True,
)
content = ""
async for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
print(chunk.choices[0].delta.content, end="", flush=True)
return content
See Streaming Integration for advanced patterns when you need custom stream processing.
Environment Variables
LiteLLM reads API keys from environment:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
Best Practices
- Call
init()before importing litellm -- patches the SDK for auto-instrumentation - Use provider prefixes --
anthropic/claude-sonnet-4, not justclaude-sonnet-4 - Tag with
response.modelfor fallbacks --waxell.tag("actual_model", response.model) - Pass
session_idat call time -- correlate multi-provider comparisons - Skip manual
record_llm_call-- auto-instrumentation already captures model, tokens, and cost
Supported Providers
LiteLLM supports 100+ providers. Common ones with cost tracking in Waxell:
| Provider | Cost Tracking | Notes |
|---|---|---|
| OpenAI | Full | GPT-4, GPT-4o, etc. |
| Anthropic | Full | Claude models |
| Groq | Full | Llama, Mixtral on Groq |
| Full | Gemini models | |
| Mistral | Full | Mistral models |
| Cohere | Partial | Command-R |
| Bedrock | Full | AWS Bedrock models |
Next Steps
- Multi-Agent -- Coordinate agents across providers
- Streaming Integration -- Detailed streaming patterns
- Cost Management -- Track costs across providers