LiteLLM Integration

Use LiteLLM's unified API to call multiple LLM providers (OpenAI, Anthropic, Groq, etc.) with consistent observability.

What is LiteLLM?

LiteLLM provides a unified interface to 100+ LLM providers. Instead of learning each provider's SDK, you use one API:

import litellm

# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])

# Anthropic
response = litellm.completion(model="anthropic/claude-sonnet-4", messages=[...])

# Groq
response = litellm.completion(model="groq/llama-3.3-70b-versatile", messages=[...])

Quick Start

import waxell_observe
waxell_observe.init(api_key="wax_sk_...", api_url="https://waxell.dev")

from waxell_observe import WaxellContext, generate_session_id
import litellm

async with WaxellContext(
    agent_name="litellm-agent",
    workflow_name="multi-provider",
    session_id=generate_session_id(),
) as ctx:
    response = await litellm.acompletion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    ctx.record_llm_call(
        model=response.model,
        tokens_in=response.usage.prompt_tokens,
        tokens_out=response.usage.completion_tokens,
    )

Multi-Provider Comparison

Compare responses across providers in a single trace:

import waxell_observe
waxell_observe.init(api_key="wax_sk_...")

from waxell_observe import WaxellContext, generate_session_id
import litellm

MODELS = [
    {"model": "gpt-4o-mini", "provider": "OpenAI", "tier": "fast"},
    {"model": "anthropic/claude-sonnet-4", "provider": "Anthropic", "tier": "premium"},
    {"model": "groq/llama-3.3-70b-versatile", "provider": "Groq", "tier": "open-source"},
]

async def compare_providers(query: str):
    session = generate_session_id()

    async with WaxellContext(
        agent_name="litellm-demo",
        workflow_name="multi-provider",
        inputs={"query": query, "models": [m["model"] for m in MODELS]},
        session_id=session,
    ) as ctx:
        ctx.set_tag("demo", "litellm")
        ctx.set_tag("providers", "openai,anthropic,groq")
        ctx.set_metadata("num_models", len(MODELS))

        results = []

        for i, config in enumerate(MODELS, 1):
            model = config["model"]
            provider = config["provider"]
            tier = config["tier"]

            messages = [
                {"role": "system", "content": f"Provide a {tier}-tier analysis."},
                {"role": "user", "content": query},
            ]

            response = await litellm.acompletion(model=model, messages=messages)

            content = response.choices[0].message.content
            tokens_in = response.usage.prompt_tokens
            tokens_out = response.usage.completion_tokens

            # Record step for this provider
            ctx.record_step(f"call_{provider.lower()}", output={
                "model": model,
                "provider": provider,
                "tier": tier,
                "tokens_in": tokens_in,
                "tokens_out": tokens_out,
                "content_length": len(content),
            })

            # Record LLM call for cost tracking
            ctx.record_llm_call(
                model=response.model,
                tokens_in=tokens_in,
                tokens_out=tokens_out,
                task=f"call_{provider.lower()}",
                prompt_preview=query[:200],
                response_preview=content[:200],
            )

            results.append({
                "provider": provider,
                "model": model,
                "tier": tier,
                "content": content,
                "tokens": tokens_in + tokens_out,
            })

        # Comparison step
        comparison = {
            r["provider"]: {
                "model": r["model"],
                "tokens": r["tokens"],
                "content_length": len(r["content"]),
            }
            for r in results
        }

        ctx.record_step("compare_providers", output=comparison)
        ctx.set_result({
            "comparison": comparison,
            "models_tested": len(MODELS),
        })

        return results

# Run comparison
results = await compare_providers("Compare AI safety approaches")

Model Name Conventions

LiteLLM uses prefixes to identify providers:

Provider	Model Format	Example
OpenAI	`model-name`	`gpt-4o`, `gpt-4o-mini`
Anthropic	`anthropic/model`	`anthropic/claude-sonnet-4`
Groq	`groq/model`	`groq/llama-3.3-70b-versatile`
Bedrock	`bedrock/model`	`bedrock/anthropic.claude-3`
Azure	`azure/deployment`	`azure/gpt-4-deployment`
Cohere	`cohere/model`	`cohere/command-r`

Recording Costs

LiteLLM normalizes the response format, so token fields are consistent:

response = await litellm.acompletion(model="groq/llama-3.3-70b-versatile", messages=[...])

ctx.record_llm_call(
    model=response.model,  # Returns the actual model name
    tokens_in=response.usage.prompt_tokens,
    tokens_out=response.usage.completion_tokens,
    # Cost is auto-estimated based on model
)

Fallback Chains

Use LiteLLM's fallback feature with observability:

async with WaxellContext(agent_name="fallback-agent") as ctx:
    try:
        response = await litellm.acompletion(
            model="gpt-4o",
            messages=messages,
            fallbacks=["anthropic/claude-sonnet-4", "groq/llama-3.3-70b-versatile"],
        )

        # Record which model was actually used
        ctx.set_tag("actual_model", response.model)
        ctx.record_llm_call(
            model=response.model,
            tokens_in=response.usage.prompt_tokens,
            tokens_out=response.usage.completion_tokens,
        )
    except Exception as e:
        ctx.set_tag("error", str(e))
        raise

Streaming with LiteLLM

async with WaxellContext(agent_name="streaming-litellm") as ctx:
    response = await litellm.acompletion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": query}],
        stream=True,
    )

    content = ""
    async for chunk in response:
        if chunk.choices[0].delta.content:
            content += chunk.choices[0].delta.content

    # Estimate tokens for streaming (LiteLLM may not provide usage in stream)
    tokens_in = 150  # Estimate based on prompt
    tokens_out = len(content.split()) * 2  # Rough estimate

    ctx.record_llm_call(
        model="gpt-4o-mini",
        tokens_in=tokens_in,
        tokens_out=tokens_out,
    )

Environment Variables

LiteLLM reads API keys from environment:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."

Best Practices

Use provider prefixes -- anthropic/claude-sonnet-4, not just claude-sonnet-4
Record the actual model -- response.model returns what was used (important for fallbacks)
Tag by provider -- Makes filtering easy in the UI
Handle token estimation for streaming -- LiteLLM may not provide usage in streams
Use session_id for comparisons -- Correlate multi-provider runs

Supported Providers

LiteLLM supports 100+ providers. Common ones with cost tracking in Waxell:

Provider	Cost Tracking	Notes
OpenAI	Full	GPT-4, GPT-4o, etc.
Anthropic	Full	Claude models
Groq	Full	Llama, Mixtral on Groq
Google	Full	Gemini models
Mistral	Full	Mistral models
Cohere	Partial	Command-R
Bedrock	Full	AWS Bedrock models

Next Steps

Multi-Agent -- Coordinate agents across providers
Streaming Integration -- Detailed streaming patterns
Cost Management -- Track costs across providers

What is LiteLLM?​

Quick Start​

Multi-Provider Comparison​

Model Name Conventions​

Recording Costs​

Fallback Chains​

Streaming with LiteLLM​

Environment Variables​

Best Practices​

Supported Providers​

Next Steps​