Skip to main content

LLM Call Tracking

Waxell Observe records every LLM API call made by your agents, capturing the model, token counts, estimated cost, and optional previews of prompts and responses. This data powers dashboards, cost analysis, and governance enforcement.

What Data is Captured

Each LLM call record includes:

FieldTypeDescription
modelstrModel identifier (e.g., "gpt-4o", "claude-sonnet-4")
tokens_inintNumber of input/prompt tokens
tokens_outintNumber of output/completion tokens
costfloatCost in USD (auto-estimated if not provided)
taskstrOptional label describing the call's purpose
prompt_previewstrOptional preview of the prompt text
response_previewstrOptional preview of the response text

Two Ways to Capture

Automatic: LangChain Handler

The WaxellLangChainHandler intercepts LangChain's callback system and automatically extracts:

  • Model name from the serialized LLM configuration
  • Token counts from LangChain's token_usage in the LLM response
  • Prompt preview (first 500 characters of the prompt)
  • Response preview (first 500 characters of the generated text)
  • Cost estimated automatically from token counts and model pricing
from waxell_observe.integrations.langchain import WaxellLangChainHandler

handler = WaxellLangChainHandler(agent_name="my-agent")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result})

No manual recording needed -- every LLM call in the chain is captured.

Manual: Context Methods

For non-LangChain frameworks, record LLM calls explicitly using record_llm_call:

from waxell_observe import WaxellContext

async with WaxellContext(agent_name="my-agent") as ctx:
# Call your LLM
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)

# Record the call
ctx.record_llm_call(
model="gpt-4o",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
task="answer_question",
prompt_preview=query[:500],
response_preview=response.choices[0].message.content[:500],
)

Or with the decorator and context injection:

from waxell_observe import waxell_agent

@waxell_agent(agent_name="my-agent")
async def answer(query: str, waxell_ctx=None) -> str:
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)

if waxell_ctx:
waxell_ctx.record_llm_call(
model="gpt-4o",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
task="answer_question",
)

return response.choices[0].message.content

Supported Models

Cost estimation is built in for the following models. For unlisted models, provide the cost parameter manually or configure a tenant override on the server (see Cost Management).

OpenAI

ModelInput (per 1M tokens)Output (per 1M tokens)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
gpt-4$30.00$60.00
gpt-3.5-turbo$0.50$1.50
o1$15.00$60.00
o1-mini$3.00$12.00
o3-mini$1.10$4.40

Anthropic

ModelInput (per 1M tokens)Output (per 1M tokens)
claude-opus-4$15.00$75.00
claude-sonnet-4$3.00$15.00
claude-3-5-sonnet$3.00$15.00
claude-3-5-haiku$0.80$4.00
claude-3-haiku$0.25$1.25

Google

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-2.0-flash$0.10$0.40
gemini-1.5-pro$1.25$5.00
gemini-1.5-flash$0.075$0.30

Meta (via Groq, Together, etc.)

ModelInput (per 1M tokens)Output (per 1M tokens)
llama-3.3-70b$0.59$0.79
llama-3.1-8b$0.05$0.08

Mistral

ModelInput (per 1M tokens)Output (per 1M tokens)
mistral-large$2.00$6.00

Model Name Matching

The cost estimator uses prefix matching, so versioned model names work automatically:

  • "gpt-4o-2024-08-06" matches "gpt-4o"
  • "claude-3-5-sonnet-20241022" matches "claude-3-5-sonnet"

Longer prefixes are matched first for specificity, so "gpt-4o-mini" matches before "gpt-4o".

If no match is found, cost is 0.0. You can provide the cost explicitly or set up a server-side tenant override.

How Data Flows

  1. Your agent records LLM calls (manually or via LangChain handler)
  2. Calls are buffered in memory during execution
  3. On flush/context exit, calls are sent to the control plane via POST /api/v1/observe/runs/{run_id}/llm-calls/
  4. The control plane stores the data, recalculates costs if server-side pricing is available, and updates dashboards
  5. Dashboards show per-agent, per-model, and per-run cost breakdowns
info

Server-side cost calculation takes precedence over client-side estimates. If you configure tenant-level model cost overrides on the control plane, those prices are used instead of the client-side MODEL_COSTS table.

Next Steps