Skip to main content

LLM Call Tracking

Waxell Observe records every LLM API call made by your agents, capturing the model, token counts, estimated cost, and optional previews of prompts and responses. This data powers dashboards, cost analysis, and governance enforcement.

What Data is Captured

Each LLM call record includes:

FieldTypeDescription
modelstrModel identifier (e.g., "gpt-4o", "claude-sonnet-4")
tokens_inintNumber of input/prompt tokens
tokens_outintNumber of output/completion tokens
costfloatCost in USD (auto-estimated if not provided)
taskstrOptional label describing the call's purpose
prompt_previewstrOptional preview of the prompt text
response_previewstrOptional preview of the response text

How to Capture LLM Calls

Call waxell.init() before importing your LLM SDK. This patches the SDK so every call is captured automatically with model, tokens, cost, latency, and previews -- no manual recording needed. Wrap your agent function with @observe to group calls into a tracked run.

import waxell_observe as waxell

waxell.init() # patches LLM SDKs -- call BEFORE importing them

import openai
client = openai.OpenAI()

@waxell.observe(agent_name="my-agent")
async def answer(query: str) -> str:
response = client.chat.completions.create( # auto-captured
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content

Auto-instrumentation supports OpenAI, Anthropic, LiteLLM, Groq, Mistral, Together, Cohere, Bedrock, Vertex AI, Gemini, and 190+ other libraries. See Auto-Instrumentation for the full list.

LangChain

For LangChain pipelines, init() instruments LangChain LLM calls too. You can also use the explicit callback handler if you need finer control:

from waxell_observe.integrations.langchain import WaxellLangChainHandler

handler = WaxellLangChainHandler(agent_name="my-agent")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result})

Advanced: Manual Recording for Unsupported Providers

If you're calling an LLM API that isn't in the supported provider list (a custom HTTP endpoint, a niche provider, or a model proxy you've built), you can record the call manually with WaxellContext:

from waxell_observe import WaxellContext

async with WaxellContext(agent_name="my-agent") as ctx:
# Call your custom/unsupported LLM endpoint
response = await custom_llm_client.complete(
model="my-custom-model",
prompt=query,
)

# Manually record the call
ctx.record_llm_call(
model="my-custom-model",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
task="answer_question",
prompt_preview=query[:500],
response_preview=response.text[:500],
)

For everything in the supported list, prefer the @observe pattern above -- it's less code and harder to get wrong.

Supported Models

Cost estimation is built in for the following models. For unlisted models, provide the cost parameter manually or configure a tenant override on the server (see Cost Management).

OpenAI

ModelInput (per 1M tokens)Output (per 1M tokens)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
gpt-4$30.00$60.00
gpt-3.5-turbo$0.50$1.50
o1$15.00$60.00
o1-mini$3.00$12.00
o3-mini$1.10$4.40

Anthropic

ModelInput (per 1M tokens)Output (per 1M tokens)
claude-opus-4$15.00$75.00
claude-sonnet-4$3.00$15.00
claude-3-5-sonnet$3.00$15.00
claude-3-5-haiku$0.80$4.00
claude-3-haiku$0.25$1.25

Google

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-2.0-flash$0.10$0.40
gemini-1.5-pro$1.25$5.00
gemini-1.5-flash$0.075$0.30

Meta (via Groq, Together, etc.)

ModelInput (per 1M tokens)Output (per 1M tokens)
llama-3.3-70b$0.59$0.79
llama-3.1-8b$0.05$0.08

Mistral

ModelInput (per 1M tokens)Output (per 1M tokens)
mistral-large$2.00$6.00

Model Name Matching

The cost estimator uses prefix matching, so versioned model names work automatically:

  • "gpt-4o-2024-08-06" matches "gpt-4o"
  • "claude-3-5-sonnet-20241022" matches "claude-3-5-sonnet"

Longer prefixes are matched first for specificity, so "gpt-4o-mini" matches before "gpt-4o".

If no match is found, cost is 0.0. You can provide the cost explicitly or set up a server-side tenant override.

How Data Flows

  1. Your agent makes an LLM call (auto-captured by init() or recorded manually for unsupported providers)
  2. Calls are buffered in memory during execution
  3. On flush/context exit, calls are sent to the control plane via POST /api/v1/observe/runs/{run_id}/llm-calls/
  4. The control plane stores the data, recalculates costs if server-side pricing is available, and updates dashboards
  5. Dashboards show per-agent, per-model, and per-run cost breakdowns
info

Server-side cost calculation takes precedence over client-side estimates. If you configure tenant-level model cost overrides on the control plane, those prices are used instead of the client-side MODEL_COSTS table.

Next Steps