Skip to main content

Rate Limit Policy

The rate-limit policy category enforces execution frequency limits on agent workflows. Unlike content-based policies (safety, compliance), rate limiting is purely about how often an agent runs, not what the agent does.

Use it when you need to:

  • Prevent runaway agents from consuming excessive resources
  • Enforce fair usage across teams or user groups
  • Protect downstream APIs from being overwhelmed
  • Limit burst activity during peak periods

Rules

RuleTypeDefaultDescription
max_per_minuteint10Maximum workflow executions per minute (fixed-window bucket)
max_per_hourint100Maximum workflow executions per hour (fixed-window bucket)
max_per_dayint(none)Maximum workflow executions per day (fixed-window bucket)
max_concurrentint5Maximum concurrent workflow executions (incr/decr counter)
burst_limitint(none)Maximum executions in burst window (sliding window via sorted set)
burst_window_secondsint10Time window for burst limit

How It Works

The rate limit handler runs at before_workflow to check limits and at after_workflow / on_failure to clean up concurrent counters. All counters are stored in Redis and scoped by <agent_name>:<workflow_name>.

Rate Limit Types

Per-Minute / Per-Hour / Per-Day (Fixed-Window Buckets)

Time-window limits use fixed buckets: int(now // window_seconds). The counter increments on each execution and resets when the time crosses a bucket boundary.

WindowBucket SizeReset Behavior
Per-minute60sResets at the start of each minute (wall clock)
Per-hour3600sResets at the start of each hour
Per-day86400sResets at the start of each day
Fixed-Window Boundary Burst

Fixed-window buckets can allow up to 2x the configured limit at a window boundary. For example, with max_per_minute=10, an agent could execute 10 times at 12:00:59 and 10 times at 12:01:00 -- 20 executions in 2 seconds. Use burst_limit for tighter short-term control.

Concurrent (Incr/Decr Counter)

The concurrent limit tracks how many workflow executions are running simultaneously. The counter is incremented at before_workflow and decremented at after_workflow or on_failure. A TTL of 300 seconds acts as a safety net in case the decrement is missed (e.g., process crash).

Burst (Sliding Window)

Burst limits use a Redis sorted set as a sliding window. Entries older than burst_window_seconds are pruned on each check. This provides more accurate short-term rate limiting than fixed-window buckets.

Enforcement Phases

PhaseBehavior
before_workflowChecks concurrent, burst, and time-window limits. Returns BLOCK or THROTTLE if exceeded
mid_executionNot implemented
after_workflowDecrements concurrent counter
on_failureDecrements concurrent counter

Actions

ActionWhen
ALLOWUnder all configured limits
THROTTLEConcurrent limit or burst limit exceeded
BLOCKTime-window limit exceeded (max_per_minute, max_per_hour, max_per_day)
THROTTLE vs BLOCK

THROTTLE is returned for concurrent and burst limits -- the client should retry after a short delay. BLOCK is returned for time-window limits -- the client must wait for the window to reset.

Example Policies

Strict Rate Limit (Batch Jobs)

Low limits for batch processing agents that should run infrequently:

{
"max_per_minute": 3,
"max_per_hour": 50,
"max_per_day": 500,
"max_concurrent": 1,
"burst_limit": 3,
"burst_window_seconds": 10
}

Interactive Agent (High Throughput)

Higher limits for user-facing agents:

{
"max_per_minute": 30,
"max_per_hour": 500,
"max_concurrent": 10,
"burst_limit": 15,
"burst_window_seconds": 5
}

API Protection (Burst Only)

Only limit burst activity, no per-minute/hour caps:

{
"burst_limit": 10,
"burst_window_seconds": 5
}

SDK Integration

Using the Context Manager

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
async with waxell.WaxellContext(
agent_name="analyst",
workflow_name="quick-analysis",
enforce_policy=True,
) as ctx:
# If rate limit is exceeded, PolicyViolationError
# is raised here (before any agent work happens)

result = await analyze_data(query)
ctx.set_result(result)

except PolicyViolationError as e:
print(f"Rate limited: {e}")
# e.g. "Max Per Minute limit reached (3/3)"

Using the Decorator

@waxell.observe(
agent_name="analyst",
workflow_name="quick-analysis",
enforce_policy=True,
)
async def run_analysis(query: str):
# Rate limit check happens before this function body runs
return await analyze_data(query)

Enforcement Flow

Agent starts (WaxellContext.__aenter__ or decorator entry)
|
+-- before_workflow governance runs
| |
| +-- Check concurrent limit
| | +-- current >= max_concurrent? -> THROTTLE
| | +-- Otherwise: increment counter, set 300s TTL
| |
| +-- Check burst limit (sliding window)
| | +-- Prune entries older than burst_window_seconds
| | +-- count >= burst_limit? -> THROTTLE
| | +-- Otherwise: add entry to sorted set
| |
| +-- Check time-window limits (minute, hour, day)
| +-- For each configured limit:
| | +-- bucket = int(now // window)
| | +-- current >= max? -> BLOCK
| | +-- Otherwise: increment counter
| +-- All under limit -> ALLOW
|
+-- Agent executes...
|
+-- after_workflow (or on_failure)
+-- Decrement concurrent counter
Redis Required

Rate limits require Redis for distributed counting. When running with WAXELL_OBSERVE=false or without a live server connection, rate limits are not enforced -- all queries succeed. This is by design for local development.

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Rate Limit
  4. Configure limits (per-minute, per-hour, concurrent, burst)
  5. Set scope to target specific agents (e.g., rate-limited-analyst)
  6. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Strict Rate Limit",
"category": "rate-limit",
"rules": {
"max_per_minute": 3,
"max_per_hour": 100,
"max_concurrent": 2,
"burst_limit": 5,
"burst_window_seconds": 10
},
"scope": {
"agents": ["analyst"]
},
"enabled": true
}'

Observability

Governance Tab

Rate limit evaluations appear with:

FieldExample
Policy nameStrict Rate Limit
Actionallow, throttle, or block
Categoryrate-limit
Reason"Max Per Minute limit reached (3/3)"
Metadata{"current": 3, "limit": 3}

For throttle (concurrent):

FieldExample
Reason"Concurrent limit reached (2/2)"
Metadata{"current": 2, "limit": 2}

For throttle (burst):

FieldExample
Reason"Burst limit reached (5/5 in 10s)"
Metadata{"current": 5, "limit": 5, "window": 10}

Combining with Other Policies

Rate Limit + Kill Switch: Defense in depth. Rate limits prevent overuse under normal conditions. If errors spike despite rate limiting, the kill switch activates as a circuit breaker.

Rate Limit + Budget: Rate limits control frequency; budget limits control total cost. An agent might be allowed 10 executions per minute but blocked if it exceeds $50/day in LLM costs.

Rate Limit + Compliance: A compliance policy can require that rate limiting is configured as part of a regulatory framework (e.g., SOC 2 operations policy).

Common Gotchas

  1. Fixed-window buckets can allow 2x burst at window boundary. A per-minute limit of 10 can allow 20 executions in 2 seconds if they span a minute boundary. Use burst_limit for tighter short-term control.

  2. Concurrent counter TTL is a 300-second safety net. If a process crashes without decrementing the counter, the TTL ensures it eventually resets. However, during those 300 seconds, the counter is inflated.

  3. After restarts, concurrent counter may be stale. Redis counters persist across process restarts. If an agent crashes mid-execution, the concurrent counter stays incremented until the 300s TTL expires.

  4. Rate limits are scoped by agent_name + workflow_name. A policy targeting analyst with workflow quick-analysis does not affect analyst with workflow deep-analysis. Each combination has independent counters.

  5. No Redis = no rate limiting. When running with WAXELL_OBSERVE=false or without a live server, rate limits are not enforced. All queries succeed. This is intentional for local development.

  6. max_per_day uses UTC day boundaries. The day bucket is int(now // 86400), which aligns with UTC midnight, not local time.

  7. THROTTLE and BLOCK are different. THROTTLE (concurrent/burst) means "try again shortly." BLOCK (time-window) means "wait for the window to reset."

Next Steps