LLM Call Tracking

Waxell Observe records every LLM API call made by your agents, capturing the model, token counts, estimated cost, and optional previews of prompts and responses. This data powers dashboards, cost analysis, and governance enforcement.

What Data is Captured

Each LLM call record includes:

Field	Type	Description
`model`	`str`	Model identifier (e.g., `"gpt-4o"`, `"claude-sonnet-4"`)
`tokens_in`	`int`	Number of input/prompt tokens
`tokens_out`	`int`	Number of output/completion tokens
`cost`	`float`	Cost in USD (auto-estimated if not provided)
`task`	`str`	Optional label describing the call's purpose
`prompt_preview`	`str`	Optional preview of the prompt text
`response_preview`	`str`	Optional preview of the response text

How to Capture LLM Calls

Recommended: Auto-Instrumentation with `@observe`

Call waxell.init() before importing your LLM SDK. This patches the SDK so every call is captured automatically with model, tokens, cost, latency, and previews -- no manual recording needed. Wrap your agent function with @observe to group calls into a tracked run.

import waxell_observe as waxell

waxell.init()  # patches LLM SDKs -- call BEFORE importing them

import openai
client = openai.OpenAI()

@waxell.observe(agent_name="my-agent")
async def answer(query: str) -> str:
    response = client.chat.completions.create(  # auto-captured
        model="gpt-4o",
        messages=[{"role": "user", "content": query}],
    )
    return response.choices[0].message.content

Auto-instrumentation supports OpenAI, Anthropic, LiteLLM, Groq, Mistral, Together, Cohere, Bedrock, Vertex AI, Gemini, and 190+ other libraries. See Auto-Instrumentation for the full list.

LangChain

For LangChain pipelines, init() instruments LangChain LLM calls too. You can also use the explicit callback handler if you need finer control:

from waxell_observe.integrations.langchain import WaxellLangChainHandler

handler = WaxellLangChainHandler(agent_name="my-agent")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result})

Advanced: Manual Recording for Unsupported Providers

If you're calling an LLM API that isn't in the supported provider list (a custom HTTP endpoint, a niche provider, or a model proxy you've built), you can record the call manually with WaxellContext:

from waxell_observe import WaxellContext

async with WaxellContext(agent_name="my-agent") as ctx:
    # Call your custom/unsupported LLM endpoint
    response = await custom_llm_client.complete(
        model="my-custom-model",
        prompt=query,
    )

    # Manually record the call
    ctx.record_llm_call(
        model="my-custom-model",
        tokens_in=response.usage.prompt_tokens,
        tokens_out=response.usage.completion_tokens,
        task="answer_question",
        prompt_preview=query[:500],
        response_preview=response.text[:500],
    )

For everything in the supported list, prefer the @observe pattern above -- it's less code and harder to get wrong.

Supported Models

Cost estimation is built in for the following models. For unlisted models, provide the cost parameter manually or configure a tenant override on the server (see Cost Management).

OpenAI

Model	Input (per 1M tokens)	Output (per 1M tokens)
`gpt-4o`	$2.50	$10.00
`gpt-4o-mini`	$0.15	$0.60
`gpt-4-turbo`	$10.00	$30.00
`gpt-4`	$30.00	$60.00
`gpt-3.5-turbo`	$0.50	$1.50
`o1`	$15.00	$60.00
`o1-mini`	$3.00	$12.00
`o3-mini`	$1.10	$4.40

Anthropic

Model	Input (per 1M tokens)	Output (per 1M tokens)
`claude-opus-4`	$15.00	$75.00
`claude-sonnet-4`	$3.00	$15.00
`claude-3-5-sonnet`	$3.00	$15.00
`claude-3-5-haiku`	$0.80	$4.00
`claude-3-haiku`	$0.25	$1.25

Google

Model	Input (per 1M tokens)	Output (per 1M tokens)
`gemini-2.0-flash`	$0.10	$0.40
`gemini-1.5-pro`	$1.25	$5.00
`gemini-1.5-flash`	$0.075	$0.30

Meta (via Groq, Together, etc.)

Model	Input (per 1M tokens)	Output (per 1M tokens)
`llama-3.3-70b`	$0.59	$0.79
`llama-3.1-8b`	$0.05	$0.08

Mistral

Model	Input (per 1M tokens)	Output (per 1M tokens)
`mistral-large`	$2.00	$6.00

Model Name Matching

The cost estimator uses prefix matching, so versioned model names work automatically:

"gpt-4o-2024-08-06" matches "gpt-4o"
"claude-3-5-sonnet-20241022" matches "claude-3-5-sonnet"

Longer prefixes are matched first for specificity, so "gpt-4o-mini" matches before "gpt-4o".

If no match is found, cost is 0.0. You can provide the cost explicitly or set up a server-side tenant override.

How Data Flows

Your agent makes an LLM call (auto-captured by init() or recorded manually for unsupported providers)
Calls are buffered in memory during execution
On flush/context exit, calls are sent to the control plane via POST /api/v1/observe/runs/{run_id}/llm-calls/
The control plane stores the data, recalculates costs if server-side pricing is available, and updates dashboards
Dashboards show per-agent, per-model, and per-run cost breakdowns

info

Server-side cost calculation takes precedence over client-side estimates. If you configure tenant-level model cost overrides on the control plane, those prices are used instead of the client-side MODEL_COSTS table.

Next Steps

Cost Management -- Budget enforcement and tenant overrides
Decorator Pattern -- Quick integration with @waxell_agent
LangChain Integration -- Automatic capture with LangChain

What Data is Captured​

How to Capture LLM Calls​

Recommended: Auto-Instrumentation with @observe​

LangChain​

Advanced: Manual Recording for Unsupported Providers​

Supported Models​

OpenAI​

Anthropic​

Google​

Meta (via Groq, Together, etc.)​

Mistral​

Model Name Matching​

How Data Flows​

Next Steps​