Cost Management
Waxell Observe provides a layered cost management system: client-side estimation for immediate visibility, server-side calculation for accuracy, and tenant-level overrides for custom pricing.
How Cost Estimation Works
Client-Side: estimate_cost()
The estimate_cost function provides instant cost estimates based on built-in pricing data:
from waxell_observe.cost import estimate_cost
cost = estimate_cost("gpt-4o", tokens_in=1000, tokens_out=500)
print(f"Estimated cost: ${cost:.6f}") # $0.007500
Parameters:
| Parameter | Type | Description |
|---|---|---|
model | str | Model name or prefix |
tokens_in | int | Input/prompt token count |
tokens_out | int | Output/completion token count |
Returns: float -- estimated cost in USD. Returns 0.0 for unknown models.
The function uses a two-step matching strategy:
- Exact match -- looks for the model name in the pricing table
- Prefix match -- tries longer prefixes first for versioned model names (e.g.,
"gpt-4o-2024-08-06"matches"gpt-4o")
Automatic Cost Estimation
When you record an LLM call without specifying a cost, the client automatically estimates it:
# Cost is auto-estimated
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500)
# Or provide an explicit cost to override estimation
ctx.record_llm_call(model="gpt-4o", tokens_in=1000, tokens_out=500, cost=0.0085)
The LangChain handler always uses automatic estimation -- no manual cost input is needed.
Server-Side Calculation
The control plane maintains its own model pricing database that can differ from client-side estimates. When the server processes LLM call records, it can recalculate costs using:
- System defaults -- baseline pricing maintained by the platform
- Tenant overrides -- custom pricing set by your organization
Server-side costs take precedence over client-side estimates in dashboards and reports.
Built-In Model Pricing
The client includes pricing for 20+ models across major providers:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini |
| Anthropic | claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku |
| gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash | |
| Meta | llama-3.3-70b, llama-3.1-8b |
| Mistral | mistral-large |
For the full pricing table with per-token costs, see LLM Call Tracking.
Tenant-Level Cost Overrides
If your organization has negotiated pricing or uses a provider with different rates, you can set custom costs per model via the REST API.
Set a Custom Cost
curl -X PUT "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"input_cost_per_million": 2.00,
"output_cost_per_million": 8.00
}'
View Current Costs
Retrieve the merged system + tenant cost table:
curl "https://acme.waxell.dev/api/v1/observe/model-costs/" \
-H "X-Wax-Key: wax_sk_..."
This returns all model costs, with tenant overrides applied on top of system defaults.
Remove a Custom Cost
Delete a tenant override to revert to system defaults:
curl -X DELETE "https://acme.waxell.dev/api/v1/observe/model-costs/gpt-4o/" \
-H "X-Wax-Key: wax_sk_..."
Budget Enforcement
Cost management integrates with the policy and governance system. You can configure policies on the control plane that:
- Block execution when daily or monthly spend exceeds a threshold
- Warn when approaching budget limits
- Throttle execution rate when costs are high
These policies are evaluated during the check_policy call that occurs before agent execution (when enforce_policy=True).
Example flow:
- Agent attempts to run with
enforce_policy=True - Control plane evaluates cost-based policies
- If daily token spend exceeds the configured limit, the policy returns
action: "block"with a reason like"Daily token budget exceeded" - A
PolicyViolationErroris raised and execution does not proceed
from waxell_observe import waxell_agent
from waxell_observe.errors import PolicyViolationError
@waxell_agent(agent_name="expensive-agent", enforce_policy=True)
async def run_expensive_task(query: str) -> str:
...
try:
result = await run_expensive_task("analyze everything")
except PolicyViolationError as e:
print(f"Budget exceeded: {e}")
# e.policy_result.metadata may contain budget details
Cost Tracking Workflow
1. Agent makes LLM call
|
2. Client estimates cost (MODEL_COSTS table)
|
3. LLM call record sent to control plane
|
4. Server recalculates with tenant overrides (if any)
|
5. Cost aggregated in dashboards
|
6. Budget policies evaluated on next agent run
Even if you do not set up tenant overrides, the built-in client-side estimates provide useful cost visibility from day one. You can refine pricing later without changing any agent code.
Next Steps
- LLM Call Tracking -- Full model pricing table and capture details
- Policy & Governance -- Budget enforcement and policy configuration
- REST API Reference -- Model cost API endpoints