Cost Management

Waxell Observe provides a layered cost management system: client-side estimation for immediate visibility, server-side calculation for accuracy, and tenant-level overrides for custom pricing.

How Cost Estimation Works

Client-Side: estimate_cost()

The estimate_cost function provides instant cost estimates based on built-in pricing data:

from waxell_observe.cost import estimate_cost

cost = estimate_cost("gpt-4o", tokens_in=1000, tokens_out=500)
print(f"Estimated cost: ${cost:.6f}")  # $0.007500

Parameters:

Parameter	Type	Description
`model`	`str`	Model name or prefix
`tokens_in`	`int`	Input/prompt token count
`tokens_out`	`int`	Output/completion token count

Returns: float -- estimated cost in USD. Returns 0.0 for unknown models.

The function uses a two-step matching strategy:

Exact match -- looks for the model name in the pricing table
Prefix match -- tries longer prefixes first for versioned model names (e.g., "gpt-4o-2024-08-06" matches "gpt-4o")

Automatic Cost Estimation

When you record an LLM call without specifying a cost, the client automatically estimates it:

# Cost is auto-estimated
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500)

# Or provide an explicit cost to override estimation
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500, cost=0.0085)

The LangChain handler always uses automatic estimation -- no manual cost input is needed.

Server-Side Calculation

The control plane maintains its own model pricing database that can differ from client-side estimates. When the server processes LLM call records, it can recalculate costs using:

System defaults -- baseline pricing maintained by the platform
Tenant overrides -- custom pricing set by your organization

Server-side costs take precedence over client-side estimates in dashboards and reports.

Built-In Model Pricing

The client includes pricing for 20+ models across major providers:

Provider	Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini
Anthropic	claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku
Google	gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash
Meta	llama-3.3-70b, llama-3.1-8b
Mistral	mistral-large

For the full pricing table with per-token costs, see LLM Call Tracking.

Tenant-Level Cost Overrides

If your organization has negotiated pricing or uses a provider with different rates, you can set custom costs per model via the REST API.

Set a Custom Cost

curl -X PUT "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
  -H "X-Wax-Key: wax_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "input_cost_per_million": 2.00,
    "output_cost_per_million": 8.00
  }'

View Current Costs

Retrieve the merged system + tenant cost table:

curl "https://acme.waxell.dev/api/v1/observe/model-costs/" \
  -H "X-Wax-Key: wax_sk_..."

This returns all model costs, with tenant overrides applied on top of system defaults.

Remove a Custom Cost

Delete a tenant override to revert to system defaults:

curl -X DELETE "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
  -H "X-Wax-Key: wax_sk_..."

Budget Enforcement

Cost management integrates with the policy and governance system. You can configure policies on the control plane that:

Block execution when daily or monthly spend exceeds a threshold
Warn when approaching budget limits
Throttle execution rate when costs are high

These policies are evaluated during the check_policy call that occurs before agent execution (when enforce_policy=True).

Example flow:

Agent attempts to run with enforce_policy=True
Control plane evaluates cost-based policies
If daily token spend exceeds the configured limit, the policy returns action: "block" with a reason like "Daily token budget exceeded"
A PolicyViolationError is raised and execution does not proceed

from waxell_observe import waxell_agent
from waxell_observe.errors import PolicyViolationError

@waxell_agent(agent_name="expensive-agent", enforce_policy=True)
async def run_expensive_task(query: str) -> str:
    ...

try:
    result = await run_expensive_task("analyze everything")
except PolicyViolationError as e:
    print(f"Budget exceeded: {e}")
    # e.policy_result.metadata may contain budget details

Cost Tracking Workflow

1. Agent makes LLM call
   |
2. Client estimates cost (MODEL_COSTS table)
   |
3. LLM call record sent to control plane
   |
4. Server recalculates with tenant overrides (if any)
   |
5. Cost aggregated in dashboards
   |
6. Budget policies evaluated on next agent run

tip

Even if you do not set up tenant overrides, the built-in client-side estimates provide useful cost visibility from day one. You can refine pricing later without changing any agent code.

Next Steps

LLM Call Tracking -- Full model pricing table and capture details
Policy & Governance -- Budget enforcement and policy configuration
REST API Reference -- Model cost API endpoints

How Cost Estimation Works​

Client-Side: estimate_cost()​

Automatic Cost Estimation​

Server-Side Calculation​

Built-In Model Pricing​

Tenant-Level Cost Overrides​

Set a Custom Cost​

View Current Costs​

Remove a Custom Cost​

Budget Enforcement​

Cost Tracking Workflow​

Next Steps​