Rate Limit Policy

The rate-limit policy category enforces execution frequency limits on agent workflows. Unlike content-based policies (safety, compliance), rate limiting is purely about how often an agent runs, not what the agent does.

Use it when you need to:

Prevent runaway agents from consuming excessive resources
Enforce fair usage across teams or user groups
Protect downstream APIs from being overwhelmed
Limit burst activity during peak periods

Rules

Rule	Type	Default	Description
`max_per_minute`	int	`10`	Maximum workflow executions per minute (fixed-window bucket)
`max_per_hour`	int	`100`	Maximum workflow executions per hour (fixed-window bucket)
`max_per_day`	int	(none)	Maximum workflow executions per day (fixed-window bucket)
`max_concurrent`	int	`5`	Maximum concurrent workflow executions (incr/decr counter)
`burst_limit`	int	(none)	Maximum executions in burst window (sliding window via sorted set)
`burst_window_seconds`	int	`10`	Time window for burst limit

How It Works

The rate limit handler runs at before_workflow to check limits and at after_workflow / on_failure to clean up concurrent counters. All counters are stored in Redis and scoped by <agent_name>:<workflow_name>.

Rate Limit Types

Per-Minute / Per-Hour / Per-Day (Fixed-Window Buckets)

Time-window limits use fixed buckets: int(now // window_seconds). The counter increments on each execution and resets when the time crosses a bucket boundary.

Window	Bucket Size	Reset Behavior
Per-minute	60s	Resets at the start of each minute (wall clock)
Per-hour	3600s	Resets at the start of each hour
Per-day	86400s	Resets at the start of each day

Fixed-Window Boundary Burst

Fixed-window buckets can allow up to 2x the configured limit at a window boundary. For example, with max_per_minute=10, an agent could execute 10 times at 12:00:59 and 10 times at 12:01:00 -- 20 executions in 2 seconds. Use burst_limit for tighter short-term control.

Concurrent (Incr/Decr Counter)

The concurrent limit tracks how many workflow executions are running simultaneously. The counter is incremented at before_workflow and decremented at after_workflow or on_failure. A TTL of 300 seconds acts as a safety net in case the decrement is missed (e.g., process crash).

Burst (Sliding Window)

Burst limits use a Redis sorted set as a sliding window. Entries older than burst_window_seconds are pruned on each check. This provides more accurate short-term rate limiting than fixed-window buckets.

Enforcement Phases

Phase	Behavior
`before_workflow`	Checks concurrent, burst, and time-window limits. Returns BLOCK or THROTTLE if exceeded
`mid_execution`	Not implemented
`after_workflow`	Decrements concurrent counter
`on_failure`	Decrements concurrent counter

Actions

Action	When
`ALLOW`	Under all configured limits
`THROTTLE`	Concurrent limit or burst limit exceeded
`BLOCK`	Time-window limit exceeded (max_per_minute, max_per_hour, max_per_day)

THROTTLE vs BLOCK

THROTTLE is returned for concurrent and burst limits -- the client should retry after a short delay. BLOCK is returned for time-window limits -- the client must wait for the window to reset.

Example Policies

Strict Rate Limit (Batch Jobs)

Low limits for batch processing agents that should run infrequently:

{
  "max_per_minute": 3,
  "max_per_hour": 50,
  "max_per_day": 500,
  "max_concurrent": 1,
  "burst_limit": 3,
  "burst_window_seconds": 10
}

Interactive Agent (High Throughput)

Higher limits for user-facing agents:

{
  "max_per_minute": 30,
  "max_per_hour": 500,
  "max_concurrent": 10,
  "burst_limit": 15,
  "burst_window_seconds": 5
}

API Protection (Burst Only)

Only limit burst activity, no per-minute/hour caps:

{
  "burst_limit": 10,
  "burst_window_seconds": 5
}

SDK Integration

Using the Context Manager

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
    async with waxell.WaxellContext(
        agent_name="analyst",
        workflow_name="quick-analysis",
        enforce_policy=True,
    ) as ctx:
        # If rate limit is exceeded, PolicyViolationError
        # is raised here (before any agent work happens)

        result = await analyze_data(query)
        ctx.set_result(result)

except PolicyViolationError as e:
    print(f"Rate limited: {e}")
    # e.g. "Max Per Minute limit reached (3/3)"

Using the Decorator

@waxell.observe(
    agent_name="analyst",
    workflow_name="quick-analysis",
    enforce_policy=True,
)
async def run_analysis(query: str):
    # Rate limit check happens before this function body runs
    return await analyze_data(query)

Enforcement Flow

Agent starts (WaxellContext.__aenter__ or decorator entry)
    |
    +-- before_workflow governance runs
    |   |
    |   +-- Check concurrent limit
    |   |   +-- current >= max_concurrent? -> THROTTLE
    |   |   +-- Otherwise: increment counter, set 300s TTL
    |   |
    |   +-- Check burst limit (sliding window)
    |   |   +-- Prune entries older than burst_window_seconds
    |   |   +-- count >= burst_limit? -> THROTTLE
    |   |   +-- Otherwise: add entry to sorted set
    |   |
    |   +-- Check time-window limits (minute, hour, day)
    |       +-- For each configured limit:
    |       |   +-- bucket = int(now // window)
    |       |   +-- current >= max? -> BLOCK
    |       |   +-- Otherwise: increment counter
    |       +-- All under limit -> ALLOW
    |
    +-- Agent executes...
    |
    +-- after_workflow (or on_failure)
        +-- Decrement concurrent counter

Redis Required

Rate limits require Redis for distributed counting. When running with WAXELL_OBSERVE=false or without a live server connection, rate limits are not enforced -- all queries succeed. This is by design for local development.

Creating via Dashboard

Navigate to Governance > Policies
Click New Policy
Select category Rate Limit
Configure limits (per-minute, per-hour, concurrent, burst)
Set scope to target specific agents (e.g., rate-limited-analyst)
Enable

Creating via API

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://acme.waxell.dev/waxell/v1/policies/ \
  -d '{
    "name": "Strict Rate Limit",
    "category": "rate-limit",
    "rules": {
      "max_per_minute": 3,
      "max_per_hour": 100,
      "max_concurrent": 2,
      "burst_limit": 5,
      "burst_window_seconds": 10
    },
    "scope": {
      "agents": ["analyst"]
    },
    "enabled": true
  }'

Observability

Governance Tab

Rate limit evaluations appear with:

Field	Example
Policy name	Strict Rate Limit
Action	`allow`, `throttle`, or `block`
Category	`rate-limit`
Reason	"Max Per Minute limit reached (3/3)"
Metadata	`{"current": 3, "limit": 3}`

For throttle (concurrent):

Field	Example
Reason	"Concurrent limit reached (2/2)"
Metadata	`{"current": 2, "limit": 2}`

For throttle (burst):

Field	Example
Reason	"Burst limit reached (5/5 in 10s)"
Metadata	`{"current": 5, "limit": 5, "window": 10}`

Combining with Other Policies

Rate Limit + Kill Switch: Defense in depth. Rate limits prevent overuse under normal conditions. If errors spike despite rate limiting, the kill switch activates as a circuit breaker.

Rate Limit + Budget: Rate limits control frequency; budget limits control total cost. An agent might be allowed 10 executions per minute but blocked if it exceeds $50/day in LLM costs.

Rate Limit + Compliance: A compliance policy can require that rate limiting is configured as part of a regulatory framework (e.g., SOC 2 operations policy).

Common Gotchas

Fixed-window buckets can allow 2x burst at window boundary. A per-minute limit of 10 can allow 20 executions in 2 seconds if they span a minute boundary. Use burst_limit for tighter short-term control.
Concurrent counter TTL is a 300-second safety net. If a process crashes without decrementing the counter, the TTL ensures it eventually resets. However, during those 300 seconds, the counter is inflated.
After restarts, concurrent counter may be stale. Redis counters persist across process restarts. If an agent crashes mid-execution, the concurrent counter stays incremented until the 300s TTL expires.
Rate limits are scoped by agent_name + workflow_name. A policy targeting analyst with workflow quick-analysis does not affect analyst with workflow deep-analysis. Each combination has independent counters.
No Redis = no rate limiting. When running with WAXELL_OBSERVE=false or without a live server, rate limits are not enforced. All queries succeed. This is intentional for local development.
max_per_day uses UTC day boundaries. The day bucket is int(now // 86400), which aligns with UTC midnight, not local time.
THROTTLE and BLOCK are different. THROTTLE (concurrent/burst) means "try again shortly." BLOCK (time-window) means "wait for the window to reset."

Next Steps

Policy & Governance -- How policy enforcement works
Kill Switch Policy -- Circuit breaker for error-rate protection
Compliance Policy -- Meta-validator for regulatory frameworks
Policy Categories & Templates -- All 26 categories

Rules​

How It Works​

Rate Limit Types​

Enforcement Phases​

Actions​

Example Policies​

Strict Rate Limit (Batch Jobs)​

Interactive Agent (High Throughput)​

API Protection (Burst Only)​

SDK Integration​

Using the Context Manager​

Using the Decorator​

Enforcement Flow​

Creating via Dashboard​

Creating via API​

Observability​

Governance Tab​

Combining with Other Policies​

Common Gotchas​

Next Steps​