Skip to main content

LiteLLM Integration

Use LiteLLM's unified API to call multiple LLM providers (OpenAI, Anthropic, Groq, etc.) with consistent observability.

What is LiteLLM?

LiteLLM provides a unified interface to 100+ LLM providers. Instead of learning each provider's SDK, you use one API:

import litellm

# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])

# Anthropic
response = litellm.completion(model="anthropic/claude-sonnet-4", messages=[...])

# Groq
response = litellm.completion(model="groq/llama-3.3-70b-versatile", messages=[...])

Quick Start

import waxell_observe
waxell_observe.init(api_key="wax_sk_...", api_url="https://waxell.dev")

from waxell_observe import WaxellContext, generate_session_id
import litellm

async with WaxellContext(
agent_name="litellm-agent",
workflow_name="multi-provider",
session_id=generate_session_id(),
) as ctx:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)

ctx.record_llm_call(
model=response.model,
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)

Multi-Provider Comparison

Compare responses across providers in a single trace:

import waxell_observe
waxell_observe.init(api_key="wax_sk_...")

from waxell_observe import WaxellContext, generate_session_id
import litellm

MODELS = [
{"model": "gpt-4o-mini", "provider": "OpenAI", "tier": "fast"},
{"model": "anthropic/claude-sonnet-4", "provider": "Anthropic", "tier": "premium"},
{"model": "groq/llama-3.3-70b-versatile", "provider": "Groq", "tier": "open-source"},
]

async def compare_providers(query: str):
session = generate_session_id()

async with WaxellContext(
agent_name="litellm-demo",
workflow_name="multi-provider",
inputs={"query": query, "models": [m["model"] for m in MODELS]},
session_id=session,
) as ctx:
ctx.set_tag("demo", "litellm")
ctx.set_tag("providers", "openai,anthropic,groq")
ctx.set_metadata("num_models", len(MODELS))

results = []

for i, config in enumerate(MODELS, 1):
model = config["model"]
provider = config["provider"]
tier = config["tier"]

messages = [
{"role": "system", "content": f"Provide a {tier}-tier analysis."},
{"role": "user", "content": query},
]

response = await litellm.acompletion(model=model, messages=messages)

content = response.choices[0].message.content
tokens_in = response.usage.prompt_tokens
tokens_out = response.usage.completion_tokens

# Record step for this provider
ctx.record_step(f"call_{provider.lower()}", output={
"model": model,
"provider": provider,
"tier": tier,
"tokens_in": tokens_in,
"tokens_out": tokens_out,
"content_length": len(content),
})

# Record LLM call for cost tracking
ctx.record_llm_call(
model=response.model,
tokens_in=tokens_in,
tokens_out=tokens_out,
task=f"call_{provider.lower()}",
prompt_preview=query[:200],
response_preview=content[:200],
)

results.append({
"provider": provider,
"model": model,
"tier": tier,
"content": content,
"tokens": tokens_in + tokens_out,
})

# Comparison step
comparison = {
r["provider"]: {
"model": r["model"],
"tokens": r["tokens"],
"content_length": len(r["content"]),
}
for r in results
}

ctx.record_step("compare_providers", output=comparison)
ctx.set_result({
"comparison": comparison,
"models_tested": len(MODELS),
})

return results

# Run comparison
results = await compare_providers("Compare AI safety approaches")

Model Name Conventions

LiteLLM uses prefixes to identify providers:

ProviderModel FormatExample
OpenAImodel-namegpt-4o, gpt-4o-mini
Anthropicanthropic/modelanthropic/claude-sonnet-4
Groqgroq/modelgroq/llama-3.3-70b-versatile
Bedrockbedrock/modelbedrock/anthropic.claude-3
Azureazure/deploymentazure/gpt-4-deployment
Coherecohere/modelcohere/command-r

Recording Costs

LiteLLM normalizes the response format, so token fields are consistent:

response = await litellm.acompletion(model="groq/llama-3.3-70b-versatile", messages=[...])

ctx.record_llm_call(
model=response.model, # Returns the actual model name
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
# Cost is auto-estimated based on model
)

Fallback Chains

Use LiteLLM's fallback feature with observability:

async with WaxellContext(agent_name="fallback-agent") as ctx:
try:
response = await litellm.acompletion(
model="gpt-4o",
messages=messages,
fallbacks=["anthropic/claude-sonnet-4", "groq/llama-3.3-70b-versatile"],
)

# Record which model was actually used
ctx.set_tag("actual_model", response.model)
ctx.record_llm_call(
model=response.model,
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)
except Exception as e:
ctx.set_tag("error", str(e))
raise

Streaming with LiteLLM

async with WaxellContext(agent_name="streaming-litellm") as ctx:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": query}],
stream=True,
)

content = ""
async for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content

# Estimate tokens for streaming (LiteLLM may not provide usage in stream)
tokens_in = 150 # Estimate based on prompt
tokens_out = len(content.split()) * 2 # Rough estimate

ctx.record_llm_call(
model="gpt-4o-mini",
tokens_in=tokens_in,
tokens_out=tokens_out,
)

Environment Variables

LiteLLM reads API keys from environment:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."

Best Practices

  1. Use provider prefixes -- anthropic/claude-sonnet-4, not just claude-sonnet-4
  2. Record the actual model -- response.model returns what was used (important for fallbacks)
  3. Tag by provider -- Makes filtering easy in the UI
  4. Handle token estimation for streaming -- LiteLLM may not provide usage in streams
  5. Use session_id for comparisons -- Correlate multi-provider runs

Supported Providers

LiteLLM supports 100+ providers. Common ones with cost tracking in Waxell:

ProviderCost TrackingNotes
OpenAIFullGPT-4, GPT-4o, etc.
AnthropicFullClaude models
GroqFullLlama, Mixtral on Groq
GoogleFullGemini models
MistralFullMistral models
CoherePartialCommand-R
BedrockFullAWS Bedrock models

Next Steps