Skip to main content

Budget Policy

The budget policy category enforces token and cost ceilings on workflow execution. Use it to cap daily LLM spend across all agents, set per-workflow guardrails, get warned when you approach the cap, and apply per-model limits for expensive frontier models. Also known as cost.

Rules

RuleTypeDefaultDescription
daily_token_limitinteger(none)Maximum tokens per day across all workflows
daily_cost_limitnumber(none)Maximum spend per day in dollars
per_workflow_token_limitinteger(none)Token cap for a single workflow execution
per_workflow_cost_limitnumber(none)Cost cap for a single workflow execution
warning_threshold_percentinteger80Emit WARN when this percent of the budget is used
action_on_exceedstring"block"One of block, warn, throttle
model_limitsobject{}Per-model daily limits, e.g. {"gpt-4": {"daily_cost_limit": 5.00}}

How It Works

The budget handler runs at before_workflow, mid_execution, and after_workflow.

PhaseWhat It ChecksActions
before_workflowAggregates daily token/cost usage (today), compares to daily limits and per-model limitsBLOCK / WARN / THROTTLE on exceed, WARN at threshold
mid_executionRe-queries daily usage AND checks per-workflow tokens_used/cost_usedBLOCK / WARN / THROTTLE on exceed
after_workflowFinal per-workflow token/cost vs limitsWARN on exceed (audit-only)

Context Attributes Read

AttributePhasePurpose
context.modelbefore_workflowMatch against model_limits keys
context.tokens_usedmid_execution, after_workflowPer-workflow token count
context.cost_usedmid_execution, after_workflowPer-workflow cost
context._policy_is_global(internal)Global scope vs agent-scoped aggregation

Daily usage comes from a Redis-backed aggregator injected by the runtime plane via BudgetHandler._usage_query_fn (Phase 0c.3), or falls back to a Django ORM query of LlmCallRecord (observe plane).

Example Policy

{
"name": "Engineering Daily Budget",
"category": "budget",
"rules": {
"daily_token_limit": 1000000,
"daily_cost_limit": 50.00,
"per_workflow_token_limit": 50000,
"per_workflow_cost_limit": 2.00,
"warning_threshold_percent": 80,
"action_on_exceed": "block",
"model_limits": {
"gpt-4-turbo": {"daily_cost_limit": 20.00},
"claude-opus-4": {"daily_cost_limit": 15.00}
}
},
"scope": {"agents": ["research-agent"]},
"enabled": true
}

SDK Integration

import waxell_observe as waxell
waxell.init()

@waxell.observe(agent_name="research-agent", enforce_policy=True)
async def research(query: str) -> str:
# before_workflow: aggregates today's spend; blocks if daily cap reached.
# mid_execution: per LLM call, checks running tokens_used/cost_used.
# after_workflow: final audit; WARN if per-workflow cap exceeded.
return await llm_call(query)

Observability

FieldExample
Categorybudget
Actionblock
Reason"Daily cost budget exceeded ($52.4731/$50.0000)"
Metadata{"current": 52.4731, "limit": 50.0, "scope": "daily"}
FieldExample (WARN at threshold)
Actionwarn
Reason"Approaching token budget (82% used)"
Metadata{"percent_used": 82.0}

Common Gotchas

  1. supported_planes = ["observe"] by default. The Django ORM lookup in the eval path makes this observe-only until the runtime plane installs the Redis aggregator via BudgetHandler._usage_query_fn. Without the injection, governed-runtime agents will not enforce budget.
  2. action_on_exceed strings are case-sensitive at the Python layer. The handler does PolicyAction[action_on_exceed.upper()], so values must be one of block, warn, throttle -- anything else raises KeyError.
  3. Daily usage is "today" in UTC. The fallback aggregator uses timezone.now().date(), which is the Django/server timezone setting. For tenants in other timezones, the day boundary will not match local midnight.
  4. per_workflow_*_limit requires the SDK to populate context.tokens_used/cost_used. If your agent doesn't record token usage, mid_execution and after_workflow will see 0 and never trip.
  5. mid_execution re-queries the database for daily usage on every LLM call. With many agents and no Redis aggregator, this is the most expensive policy in the catalog. Inject the Redis aggregator before enabling it broadly.

Next Steps