LiteLLM Integration
Use LiteLLM's unified API to call multiple LLM providers (OpenAI, Anthropic, Groq, etc.) with consistent observability.
What is LiteLLM?
LiteLLM provides a unified interface to 100+ LLM providers. Instead of learning each provider's SDK, you use one API:
import litellm
# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])
# Anthropic
response = litellm.completion(model="anthropic/claude-sonnet-4", messages=[...])
# Groq
response = litellm.completion(model="groq/llama-3.3-70b-versatile", messages=[...])
Quick Start
import waxell_observe
waxell_observe.init(api_key="wax_sk_...", api_url="https://waxell.dev")
from waxell_observe import WaxellContext, generate_session_id
import litellm
async with WaxellContext(
agent_name="litellm-agent",
workflow_name="multi-provider",
session_id=generate_session_id(),
) as ctx:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
ctx.record_llm_call(
model=response.model,
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)
Multi-Provider Comparison
Compare responses across providers in a single trace:
import waxell_observe
waxell_observe.init(api_key="wax_sk_...")
from waxell_observe import WaxellContext, generate_session_id
import litellm
MODELS = [
{"model": "gpt-4o-mini", "provider": "OpenAI", "tier": "fast"},
{"model": "anthropic/claude-sonnet-4", "provider": "Anthropic", "tier": "premium"},
{"model": "groq/llama-3.3-70b-versatile", "provider": "Groq", "tier": "open-source"},
]
async def compare_providers(query: str):
session = generate_session_id()
async with WaxellContext(
agent_name="litellm-demo",
workflow_name="multi-provider",
inputs={"query": query, "models": [m["model"] for m in MODELS]},
session_id=session,
) as ctx:
ctx.set_tag("demo", "litellm")
ctx.set_tag("providers", "openai,anthropic,groq")
ctx.set_metadata("num_models", len(MODELS))
results = []
for i, config in enumerate(MODELS, 1):
model = config["model"]
provider = config["provider"]
tier = config["tier"]
messages = [
{"role": "system", "content": f"Provide a {tier}-tier analysis."},
{"role": "user", "content": query},
]
response = await litellm.acompletion(model=model, messages=messages)
content = response.choices[0].message.content
tokens_in = response.usage.prompt_tokens
tokens_out = response.usage.completion_tokens
# Record step for this provider
ctx.record_step(f"call_{provider.lower()}", output={
"model": model,
"provider": provider,
"tier": tier,
"tokens_in": tokens_in,
"tokens_out": tokens_out,
"content_length": len(content),
})
# Record LLM call for cost tracking
ctx.record_llm_call(
model=response.model,
tokens_in=tokens_in,
tokens_out=tokens_out,
task=f"call_{provider.lower()}",
prompt_preview=query[:200],
response_preview=content[:200],
)
results.append({
"provider": provider,
"model": model,
"tier": tier,
"content": content,
"tokens": tokens_in + tokens_out,
})
# Comparison step
comparison = {
r["provider"]: {
"model": r["model"],
"tokens": r["tokens"],
"content_length": len(r["content"]),
}
for r in results
}
ctx.record_step("compare_providers", output=comparison)
ctx.set_result({
"comparison": comparison,
"models_tested": len(MODELS),
})
return results
# Run comparison
results = await compare_providers("Compare AI safety approaches")
Model Name Conventions
LiteLLM uses prefixes to identify providers:
| Provider | Model Format | Example |
|---|---|---|
| OpenAI | model-name | gpt-4o, gpt-4o-mini |
| Anthropic | anthropic/model | anthropic/claude-sonnet-4 |
| Groq | groq/model | groq/llama-3.3-70b-versatile |
| Bedrock | bedrock/model | bedrock/anthropic.claude-3 |
| Azure | azure/deployment | azure/gpt-4-deployment |
| Cohere | cohere/model | cohere/command-r |
Recording Costs
LiteLLM normalizes the response format, so token fields are consistent:
response = await litellm.acompletion(model="groq/llama-3.3-70b-versatile", messages=[...])
ctx.record_llm_call(
model=response.model, # Returns the actual model name
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
# Cost is auto-estimated based on model
)
Fallback Chains
Use LiteLLM's fallback feature with observability:
async with WaxellContext(agent_name="fallback-agent") as ctx:
try:
response = await litellm.acompletion(
model="gpt-4o",
messages=messages,
fallbacks=["anthropic/claude-sonnet-4", "groq/llama-3.3-70b-versatile"],
)
# Record which model was actually used
ctx.set_tag("actual_model", response.model)
ctx.record_llm_call(
model=response.model,
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)
except Exception as e:
ctx.set_tag("error", str(e))
raise
Streaming with LiteLLM
async with WaxellContext(agent_name="streaming-litellm") as ctx:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
stream=True,
)
content = ""
async for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
# Estimate tokens for streaming (LiteLLM may not provide usage in stream)
tokens_in = 150 # Estimate based on prompt
tokens_out = len(content.split()) * 2 # Rough estimate
ctx.record_llm_call(
model="gpt-4o-mini",
tokens_in=tokens_in,
tokens_out=tokens_out,
)
Environment Variables
LiteLLM reads API keys from environment:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
Best Practices
- Use provider prefixes --
anthropic/claude-sonnet-4, not justclaude-sonnet-4 - Record the actual model --
response.modelreturns what was used (important for fallbacks) - Tag by provider -- Makes filtering easy in the UI
- Handle token estimation for streaming -- LiteLLM may not provide usage in streams
- Use session_id for comparisons -- Correlate multi-provider runs
Supported Providers
LiteLLM supports 100+ providers. Common ones with cost tracking in Waxell:
| Provider | Cost Tracking | Notes |
|---|---|---|
| OpenAI | Full | GPT-4, GPT-4o, etc. |
| Anthropic | Full | Claude models |
| Groq | Full | Llama, Mixtral on Groq |
| Full | Gemini models | |
| Mistral | Full | Mistral models |
| Cohere | Partial | Command-R |
| Bedrock | Full | AWS Bedrock models |
Next Steps
- Multi-Agent -- Coordinate agents across providers
- Streaming Integration -- Detailed streaming patterns
- Cost Management -- Track costs across providers