Skip to main content

Cost Management

Waxell Observe provides a layered cost management system: client-side estimation for immediate visibility, server-side calculation for accuracy, and tenant-level overrides for custom pricing.

How Cost Estimation Works

Client-Side: estimate_cost()

The estimate_cost function provides instant cost estimates based on built-in pricing data:

from waxell_observe.cost import estimate_cost

cost = estimate_cost("gpt-4o", tokens_in=1000, tokens_out=500)
print(f"Estimated cost: ${cost:.6f}") # $0.007500

Parameters:

ParameterTypeDescription
modelstrModel name or prefix
tokens_inintInput/prompt token count
tokens_outintOutput/completion token count

Returns: float -- estimated cost in USD. Returns 0.0 for unknown models.

The function uses a two-step matching strategy:

  1. Exact match -- looks for the model name in the pricing table
  2. Prefix match -- tries longer prefixes first for versioned model names (e.g., "gpt-4o-2024-08-06" matches "gpt-4o")

Automatic Cost Estimation

When you record an LLM call without specifying a cost, the client automatically estimates it:

# Cost is auto-estimated
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500)

# Or provide an explicit cost to override estimation
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500, cost=0.0085)

The LangChain handler always uses automatic estimation -- no manual cost input is needed.

Server-Side Calculation

The control plane maintains its own model pricing database that can differ from client-side estimates. When the server processes LLM call records, it can recalculate costs using:

  • System defaults -- baseline pricing maintained by the platform
  • Tenant overrides -- custom pricing set by your organization

Server-side costs take precedence over client-side estimates in dashboards and reports.

Built-In Model Pricing

The client includes pricing for 20+ models across major providers:

ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini
Anthropicclaude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku
Googlegemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash
Metallama-3.3-70b, llama-3.1-8b
Mistralmistral-large

For the full pricing table with per-token costs, see LLM Call Tracking.

Tenant-Level Cost Overrides

If your organization has negotiated pricing or uses a provider with different rates, you can set custom costs per model via the REST API.

Set a Custom Cost

curl -X PUT "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"input_cost_per_million": 2.00,
"output_cost_per_million": 8.00
}'

View Current Costs

Retrieve the merged system + tenant cost table:

curl "https://acme.waxell.dev/api/v1/observe/model-costs/" \
-H "X-Wax-Key: wax_sk_..."

This returns all model costs, with tenant overrides applied on top of system defaults.

Remove a Custom Cost

Delete a tenant override to revert to system defaults:

curl -X DELETE "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
-H "X-Wax-Key: wax_sk_..."

Budget Enforcement

Cost management integrates with the policy and governance system. You can configure policies on the control plane that:

  • Block execution when daily or monthly spend exceeds a threshold
  • Warn when approaching budget limits
  • Throttle execution rate when costs are high

These policies are evaluated during the check_policy call that occurs before agent execution (when enforce_policy=True).

Example flow:

  1. Agent attempts to run with enforce_policy=True
  2. Control plane evaluates cost-based policies
  3. If daily token spend exceeds the configured limit, the policy returns action: "block" with a reason like "Daily token budget exceeded"
  4. A PolicyViolationError is raised and execution does not proceed
from waxell_observe import waxell_agent
from waxell_observe.errors import PolicyViolationError

@waxell_agent(agent_name="expensive-agent", enforce_policy=True)
async def run_expensive_task(query: str) -> str:
...

try:
result = await run_expensive_task("analyze everything")
except PolicyViolationError as e:
print(f"Budget exceeded: {e}")
# e.policy_result.metadata may contain budget details

Cost Tracking Workflow

1. Agent makes LLM call
|
2. Client estimates cost (MODEL_COSTS table)
|
3. LLM call record sent to control plane
|
4. Server recalculates with tenant overrides (if any)
|
5. Cost aggregated in dashboards
|
6. Budget policies evaluated on next agent run
tip

Even if you do not set up tenant overrides, the built-in client-side estimates provide useful cost visibility from day one. You can refine pricing later without changing any agent code.

Next Steps