Budget Policy
The budget policy category enforces token and cost ceilings on workflow execution. Use it to cap daily LLM spend across all agents, set per-workflow guardrails, get warned when you approach the cap, and apply per-model limits for expensive frontier models. Also known as cost.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
daily_token_limit | integer | (none) | Maximum tokens per day across all workflows |
daily_cost_limit | number | (none) | Maximum spend per day in dollars |
per_workflow_token_limit | integer | (none) | Token cap for a single workflow execution |
per_workflow_cost_limit | number | (none) | Cost cap for a single workflow execution |
warning_threshold_percent | integer | 80 | Emit WARN when this percent of the budget is used |
action_on_exceed | string | "block" | One of block, warn, throttle |
model_limits | object | {} | Per-model daily limits, e.g. {"gpt-4": {"daily_cost_limit": 5.00}} |
How It Works
The budget handler runs at before_workflow, mid_execution, and after_workflow.
| Phase | What It Checks | Actions |
|---|---|---|
before_workflow | Aggregates daily token/cost usage (today), compares to daily limits and per-model limits | BLOCK / WARN / THROTTLE on exceed, WARN at threshold |
mid_execution | Re-queries daily usage AND checks per-workflow tokens_used/cost_used | BLOCK / WARN / THROTTLE on exceed |
after_workflow | Final per-workflow token/cost vs limits | WARN on exceed (audit-only) |
Context Attributes Read
| Attribute | Phase | Purpose |
|---|---|---|
context.model | before_workflow | Match against model_limits keys |
context.tokens_used | mid_execution, after_workflow | Per-workflow token count |
context.cost_used | mid_execution, after_workflow | Per-workflow cost |
context._policy_is_global | (internal) | Global scope vs agent-scoped aggregation |
Daily usage comes from a Redis-backed aggregator injected by the runtime plane via BudgetHandler._usage_query_fn (Phase 0c.3), or falls back to a Django ORM query of LlmCallRecord (observe plane).
Example Policy
{
"name": "Engineering Daily Budget",
"category": "budget",
"rules": {
"daily_token_limit": 1000000,
"daily_cost_limit": 50.00,
"per_workflow_token_limit": 50000,
"per_workflow_cost_limit": 2.00,
"warning_threshold_percent": 80,
"action_on_exceed": "block",
"model_limits": {
"gpt-4-turbo": {"daily_cost_limit": 20.00},
"claude-opus-4": {"daily_cost_limit": 15.00}
}
},
"scope": {"agents": ["research-agent"]},
"enabled": true
}
SDK Integration
import waxell_observe as waxell
waxell.init()
@waxell.observe(agent_name="research-agent", enforce_policy=True)
async def research(query: str) -> str:
# before_workflow: aggregates today's spend; blocks if daily cap reached.
# mid_execution: per LLM call, checks running tokens_used/cost_used.
# after_workflow: final audit; WARN if per-workflow cap exceeded.
return await llm_call(query)
Observability
| Field | Example |
|---|---|
| Category | budget |
| Action | block |
| Reason | "Daily cost budget exceeded ($52.4731/$50.0000)" |
| Metadata | {"current": 52.4731, "limit": 50.0, "scope": "daily"} |
| Field | Example (WARN at threshold) |
|---|---|
| Action | warn |
| Reason | "Approaching token budget (82% used)" |
| Metadata | {"percent_used": 82.0} |
Common Gotchas
supported_planes = ["observe"]by default. The Django ORM lookup in the eval path makes this observe-only until the runtime plane installs the Redis aggregator viaBudgetHandler._usage_query_fn. Without the injection, governed-runtime agents will not enforce budget.action_on_exceedstrings are case-sensitive at the Python layer. The handler doesPolicyAction[action_on_exceed.upper()], so values must be one ofblock,warn,throttle-- anything else raisesKeyError.- Daily usage is "today" in UTC. The fallback aggregator uses
timezone.now().date(), which is the Django/server timezone setting. For tenants in other timezones, the day boundary will not match local midnight. per_workflow_*_limitrequires the SDK to populatecontext.tokens_used/cost_used. If your agent doesn't record token usage, mid_execution and after_workflow will see0and never trip.mid_executionre-queries the database for daily usage on every LLM call. With many agents and no Redis aggregator, this is the most expensive policy in the catalog. Inject the Redis aggregator before enabling it broadly.
Next Steps
- Rate-Limit Policy -- Request/second throttling complements budget caps
- Chargeback Attribution -- Tag every run with cost-center for finance reporting
- Policy Categories