Budget Policy

The budget policy category enforces token and cost ceilings on workflow execution. Use it to cap daily LLM spend across all agents, set per-workflow guardrails, get warned when you approach the cap, and apply per-model limits for expensive frontier models. Also known as cost.

Rules

Rule	Type	Default	Description
`daily_token_limit`	integer	(none)	Maximum tokens per day across all workflows
`daily_cost_limit`	number	(none)	Maximum spend per day in dollars
`per_workflow_token_limit`	integer	(none)	Token cap for a single workflow execution
`per_workflow_cost_limit`	number	(none)	Cost cap for a single workflow execution
`warning_threshold_percent`	integer	`80`	Emit WARN when this percent of the budget is used
`action_on_exceed`	string	`"block"`	One of `block`, `warn`, `throttle`
`model_limits`	object	`{}`	Per-model daily limits, e.g. `{"gpt-4": {"daily_cost_limit": 5.00}}`

How It Works

The budget handler runs at before_workflow, mid_execution, and after_workflow.

Phase	What It Checks	Actions
`before_workflow`	Aggregates daily token/cost usage (today), compares to daily limits and per-model limits	BLOCK / WARN / THROTTLE on exceed, WARN at threshold
`mid_execution`	Re-queries daily usage AND checks per-workflow `tokens_used`/`cost_used`	BLOCK / WARN / THROTTLE on exceed
`after_workflow`	Final per-workflow token/cost vs limits	WARN on exceed (audit-only)

Context Attributes Read

Attribute	Phase	Purpose
`context.model`	before_workflow	Match against `model_limits` keys
`context.tokens_used`	mid_execution, after_workflow	Per-workflow token count
`context.cost_used`	mid_execution, after_workflow	Per-workflow cost
`context._policy_is_global`	(internal)	Global scope vs agent-scoped aggregation

Daily usage comes from a Redis-backed aggregator injected by the runtime plane via BudgetHandler._usage_query_fn (Phase 0c.3), or falls back to a Django ORM query of LlmCallRecord (observe plane).

Example Policy

{
  "name": "Engineering Daily Budget",
  "category": "budget",
  "rules": {
    "daily_token_limit": 1000000,
    "daily_cost_limit": 50.00,
    "per_workflow_token_limit": 50000,
    "per_workflow_cost_limit": 2.00,
    "warning_threshold_percent": 80,
    "action_on_exceed": "block",
    "model_limits": {
      "gpt-4-turbo": {"daily_cost_limit": 20.00},
      "claude-opus-4": {"daily_cost_limit": 15.00}
    }
  },
  "scope": {"agents": ["research-agent"]},
  "enabled": true
}

SDK Integration

import waxell_observe as waxell
waxell.init()

@waxell.observe(agent_name="research-agent", enforce_policy=True)
async def research(query: str) -> str:
    # before_workflow: aggregates today's spend; blocks if daily cap reached.
    # mid_execution: per LLM call, checks running tokens_used/cost_used.
    # after_workflow: final audit; WARN if per-workflow cap exceeded.
    return await llm_call(query)

Observability

Field	Example
Category	`budget`
Action	`block`
Reason	"Daily cost budget exceeded ($52.4731/$50.0000)"
Metadata	`{"current": 52.4731, "limit": 50.0, "scope": "daily"}`

Field	Example (WARN at threshold)
Action	`warn`
Reason	"Approaching token budget (82% used)"
Metadata	`{"percent_used": 82.0}`

Common Gotchas

supported_planes = ["observe"] by default. The Django ORM lookup in the eval path makes this observe-only until the runtime plane installs the Redis aggregator via BudgetHandler._usage_query_fn. Without the injection, governed-runtime agents will not enforce budget.
action_on_exceed strings are case-sensitive at the Python layer. The handler does PolicyAction[action_on_exceed.upper()], so values must be one of block, warn, throttle -- anything else raises KeyError.
Daily usage is "today" in UTC. The fallback aggregator uses timezone.now().date(), which is the Django/server timezone setting. For tenants in other timezones, the day boundary will not match local midnight.
per_workflow_*_limit requires the SDK to populate context.tokens_used/cost_used. If your agent doesn't record token usage, mid_execution and after_workflow will see 0 and never trip.
mid_execution re-queries the database for daily usage on every LLM call. With many agents and no Redis aggregator, this is the most expensive policy in the catalog. Inject the Redis aggregator before enabling it broadly.

Next Steps

Rate-Limit Policy -- Request/second throttling complements budget caps
Chargeback Attribution -- Tag every run with cost-center for finance reporting
Policy Categories

Rules​

How It Works​

Context Attributes Read​

Example Policy​

SDK Integration​

Observability​

Common Gotchas​

Next Steps​