LLM Call Tracking
Waxell Observe records every LLM API call made by your agents, capturing the model, token counts, estimated cost, and optional previews of prompts and responses. This data powers dashboards, cost analysis, and governance enforcement.
What Data is Captured
Each LLM call record includes:
| Field | Type | Description |
|---|---|---|
model | str | Model identifier (e.g., "gpt-4o", "claude-sonnet-4") |
tokens_in | int | Number of input/prompt tokens |
tokens_out | int | Number of output/completion tokens |
cost | float | Cost in USD (auto-estimated if not provided) |
task | str | Optional label describing the call's purpose |
prompt_preview | str | Optional preview of the prompt text |
response_preview | str | Optional preview of the response text |
How to Capture LLM Calls
Recommended: Auto-Instrumentation with @observe
Call waxell.init() before importing your LLM SDK. This patches the SDK so every call is captured automatically with model, tokens, cost, latency, and previews -- no manual recording needed. Wrap your agent function with @observe to group calls into a tracked run.
import waxell_observe as waxell
waxell.init() # patches LLM SDKs -- call BEFORE importing them
import openai
client = openai.OpenAI()
@waxell.observe(agent_name="my-agent")
async def answer(query: str) -> str:
response = client.chat.completions.create( # auto-captured
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content
Auto-instrumentation supports OpenAI, Anthropic, LiteLLM, Groq, Mistral, Together, Cohere, Bedrock, Vertex AI, Gemini, and 190+ other libraries. See Auto-Instrumentation for the full list.
LangChain
For LangChain pipelines, init() instruments LangChain LLM calls too. You can also use the explicit callback handler if you need finer control:
from waxell_observe.integrations.langchain import WaxellLangChainHandler
handler = WaxellLangChainHandler(agent_name="my-agent")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result})
Advanced: Manual Recording for Unsupported Providers
If you're calling an LLM API that isn't in the supported provider list (a custom HTTP endpoint, a niche provider, or a model proxy you've built), you can record the call manually with WaxellContext:
from waxell_observe import WaxellContext
async with WaxellContext(agent_name="my-agent") as ctx:
# Call your custom/unsupported LLM endpoint
response = await custom_llm_client.complete(
model="my-custom-model",
prompt=query,
)
# Manually record the call
ctx.record_llm_call(
model="my-custom-model",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
task="answer_question",
prompt_preview=query[:500],
response_preview=response.text[:500],
)
For everything in the supported list, prefer the @observe pattern above -- it's less code and harder to get wrong.
Supported Models
Cost estimation is built in for the following models. For unlisted models, provide the cost parameter manually or configure a tenant override on the server (see Cost Management).
OpenAI
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
gpt-4o | $2.50 | $10.00 |
gpt-4o-mini | $0.15 | $0.60 |
gpt-4-turbo | $10.00 | $30.00 |
gpt-4 | $30.00 | $60.00 |
gpt-3.5-turbo | $0.50 | $1.50 |
o1 | $15.00 | $60.00 |
o1-mini | $3.00 | $12.00 |
o3-mini | $1.10 | $4.40 |
Anthropic
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
claude-opus-4 | $15.00 | $75.00 |
claude-sonnet-4 | $3.00 | $15.00 |
claude-3-5-sonnet | $3.00 | $15.00 |
claude-3-5-haiku | $0.80 | $4.00 |
claude-3-haiku | $0.25 | $1.25 |
Google
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
gemini-2.0-flash | $0.10 | $0.40 |
gemini-1.5-pro | $1.25 | $5.00 |
gemini-1.5-flash | $0.075 | $0.30 |
Meta (via Groq, Together, etc.)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
llama-3.3-70b | $0.59 | $0.79 |
llama-3.1-8b | $0.05 | $0.08 |
Mistral
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
mistral-large | $2.00 | $6.00 |
Model Name Matching
The cost estimator uses prefix matching, so versioned model names work automatically:
"gpt-4o-2024-08-06"matches"gpt-4o""claude-3-5-sonnet-20241022"matches"claude-3-5-sonnet"
Longer prefixes are matched first for specificity, so "gpt-4o-mini" matches before "gpt-4o".
If no match is found, cost is 0.0. You can provide the cost explicitly or set up a server-side tenant override.
How Data Flows
- Your agent makes an LLM call (auto-captured by
init()or recorded manually for unsupported providers) - Calls are buffered in memory during execution
- On flush/context exit, calls are sent to the control plane via
POST /api/v1/observe/runs/{run_id}/llm-calls/ - The control plane stores the data, recalculates costs if server-side pricing is available, and updates dashboards
- Dashboards show per-agent, per-model, and per-run cost breakdowns
Server-side cost calculation takes precedence over client-side estimates. If you configure tenant-level model cost overrides on the control plane, those prices are used instead of the client-side MODEL_COSTS table.
Next Steps
- Cost Management -- Budget enforcement and tenant overrides
- Decorator Pattern -- Quick integration with
@waxell_agent - LangChain Integration -- Automatic capture with LangChain