Rate Limit Policy
The rate-limit policy category enforces execution frequency limits on agent workflows. Unlike content-based policies (safety, compliance), rate limiting is purely about how often an agent runs, not what the agent does.
Use it when you need to:
- Prevent runaway agents from consuming excessive resources
- Enforce fair usage across teams or user groups
- Protect downstream APIs from being overwhelmed
- Limit burst activity during peak periods
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
max_per_minute | int | 10 | Maximum workflow executions per minute (fixed-window bucket) |
max_per_hour | int | 100 | Maximum workflow executions per hour (fixed-window bucket) |
max_per_day | int | (none) | Maximum workflow executions per day (fixed-window bucket) |
max_concurrent | int | 5 | Maximum concurrent workflow executions (incr/decr counter) |
burst_limit | int | (none) | Maximum executions in burst window (sliding window via sorted set) |
burst_window_seconds | int | 10 | Time window for burst limit |
How It Works
The rate limit handler runs at before_workflow to check limits and at after_workflow / on_failure to clean up concurrent counters. All counters are stored in Redis and scoped by <agent_name>:<workflow_name>.
Rate Limit Types
Per-Minute / Per-Hour / Per-Day (Fixed-Window Buckets)
Time-window limits use fixed buckets: int(now // window_seconds). The counter increments on each execution and resets when the time crosses a bucket boundary.
| Window | Bucket Size | Reset Behavior |
|---|---|---|
| Per-minute | 60s | Resets at the start of each minute (wall clock) |
| Per-hour | 3600s | Resets at the start of each hour |
| Per-day | 86400s | Resets at the start of each day |
Fixed-window buckets can allow up to 2x the configured limit at a window boundary. For example, with max_per_minute=10, an agent could execute 10 times at 12:00:59 and 10 times at 12:01:00 -- 20 executions in 2 seconds. Use burst_limit for tighter short-term control.
Concurrent (Incr/Decr Counter)
The concurrent limit tracks how many workflow executions are running simultaneously. The counter is incremented at before_workflow and decremented at after_workflow or on_failure. A TTL of 300 seconds acts as a safety net in case the decrement is missed (e.g., process crash).
Burst (Sliding Window)
Burst limits use a Redis sorted set as a sliding window. Entries older than burst_window_seconds are pruned on each check. This provides more accurate short-term rate limiting than fixed-window buckets.
Enforcement Phases
| Phase | Behavior |
|---|---|
before_workflow | Checks concurrent, burst, and time-window limits. Returns BLOCK or THROTTLE if exceeded |
mid_execution | Not implemented |
after_workflow | Decrements concurrent counter |
on_failure | Decrements concurrent counter |
Actions
| Action | When |
|---|---|
ALLOW | Under all configured limits |
THROTTLE | Concurrent limit or burst limit exceeded |
BLOCK | Time-window limit exceeded (max_per_minute, max_per_hour, max_per_day) |
THROTTLE is returned for concurrent and burst limits -- the client should retry after a short delay. BLOCK is returned for time-window limits -- the client must wait for the window to reset.
Example Policies
Strict Rate Limit (Batch Jobs)
Low limits for batch processing agents that should run infrequently:
{
"max_per_minute": 3,
"max_per_hour": 50,
"max_per_day": 500,
"max_concurrent": 1,
"burst_limit": 3,
"burst_window_seconds": 10
}
Interactive Agent (High Throughput)
Higher limits for user-facing agents:
{
"max_per_minute": 30,
"max_per_hour": 500,
"max_concurrent": 10,
"burst_limit": 15,
"burst_window_seconds": 5
}
API Protection (Burst Only)
Only limit burst activity, no per-minute/hour caps:
{
"burst_limit": 10,
"burst_window_seconds": 5
}
SDK Integration
Using the Context Manager
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
waxell.init()
try:
async with waxell.WaxellContext(
agent_name="analyst",
workflow_name="quick-analysis",
enforce_policy=True,
) as ctx:
# If rate limit is exceeded, PolicyViolationError
# is raised here (before any agent work happens)
result = await analyze_data(query)
ctx.set_result(result)
except PolicyViolationError as e:
print(f"Rate limited: {e}")
# e.g. "Max Per Minute limit reached (3/3)"
Using the Decorator
@waxell.observe(
agent_name="analyst",
workflow_name="quick-analysis",
enforce_policy=True,
)
async def run_analysis(query: str):
# Rate limit check happens before this function body runs
return await analyze_data(query)
Enforcement Flow
Agent starts (WaxellContext.__aenter__ or decorator entry)
|
+-- before_workflow governance runs
| |
| +-- Check concurrent limit
| | +-- current >= max_concurrent? -> THROTTLE
| | +-- Otherwise: increment counter, set 300s TTL
| |
| +-- Check burst limit (sliding window)
| | +-- Prune entries older than burst_window_seconds
| | +-- count >= burst_limit? -> THROTTLE
| | +-- Otherwise: add entry to sorted set
| |
| +-- Check time-window limits (minute, hour, day)
| +-- For each configured limit:
| | +-- bucket = int(now // window)
| | +-- current >= max? -> BLOCK
| | +-- Otherwise: increment counter
| +-- All under limit -> ALLOW
|
+-- Agent executes...
|
+-- after_workflow (or on_failure)
+-- Decrement concurrent counter
Rate limits require Redis for distributed counting. When running with WAXELL_OBSERVE=false or without a live server connection, rate limits are not enforced -- all queries succeed. This is by design for local development.
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Rate Limit
- Configure limits (per-minute, per-hour, concurrent, burst)
- Set scope to target specific agents (e.g.,
rate-limited-analyst) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Strict Rate Limit",
"category": "rate-limit",
"rules": {
"max_per_minute": 3,
"max_per_hour": 100,
"max_concurrent": 2,
"burst_limit": 5,
"burst_window_seconds": 10
},
"scope": {
"agents": ["analyst"]
},
"enabled": true
}'
Observability
Governance Tab
Rate limit evaluations appear with:
| Field | Example |
|---|---|
| Policy name | Strict Rate Limit |
| Action | allow, throttle, or block |
| Category | rate-limit |
| Reason | "Max Per Minute limit reached (3/3)" |
| Metadata | {"current": 3, "limit": 3} |
For throttle (concurrent):
| Field | Example |
|---|---|
| Reason | "Concurrent limit reached (2/2)" |
| Metadata | {"current": 2, "limit": 2} |
For throttle (burst):
| Field | Example |
|---|---|
| Reason | "Burst limit reached (5/5 in 10s)" |
| Metadata | {"current": 5, "limit": 5, "window": 10} |
Combining with Other Policies
Rate Limit + Kill Switch: Defense in depth. Rate limits prevent overuse under normal conditions. If errors spike despite rate limiting, the kill switch activates as a circuit breaker.
Rate Limit + Budget: Rate limits control frequency; budget limits control total cost. An agent might be allowed 10 executions per minute but blocked if it exceeds $50/day in LLM costs.
Rate Limit + Compliance: A compliance policy can require that rate limiting is configured as part of a regulatory framework (e.g., SOC 2 operations policy).
Common Gotchas
-
Fixed-window buckets can allow 2x burst at window boundary. A per-minute limit of 10 can allow 20 executions in 2 seconds if they span a minute boundary. Use
burst_limitfor tighter short-term control. -
Concurrent counter TTL is a 300-second safety net. If a process crashes without decrementing the counter, the TTL ensures it eventually resets. However, during those 300 seconds, the counter is inflated.
-
After restarts, concurrent counter may be stale. Redis counters persist across process restarts. If an agent crashes mid-execution, the concurrent counter stays incremented until the 300s TTL expires.
-
Rate limits are scoped by agent_name + workflow_name. A policy targeting
analystwith workflowquick-analysisdoes not affectanalystwith workflowdeep-analysis. Each combination has independent counters. -
No Redis = no rate limiting. When running with
WAXELL_OBSERVE=falseor without a live server, rate limits are not enforced. All queries succeed. This is intentional for local development. -
max_per_dayuses UTC day boundaries. The day bucket isint(now // 86400), which aligns with UTC midnight, not local time. -
THROTTLE and BLOCK are different. THROTTLE (concurrent/burst) means "try again shortly." BLOCK (time-window) means "wait for the window to reset."
Next Steps
- Policy & Governance -- How policy enforcement works
- Kill Switch Policy -- Circuit breaker for error-rate protection
- Compliance Policy -- Meta-validator for regulatory frameworks
- Policy Categories & Templates -- All 26 categories