Skip to main content

Spawn Limit Policy

The spawn-limit policy puts a tenant-level ceiling on concurrent ctx.spawn children. This is the operator's rail: the agent-level cap (declared by the author in guards: [{type: spawn_concurrent, limit: N}]) enforces per-parent-tree in the BudgetLedger. This handler is policy-driven, scoped via the standard policy scope filter (tenant / agent / workflow), and halts the spawn before any children dispatch.

Runs only on the runtime plane -- there is no observe-plane before_spawn call site.

Rules

RuleTypeDefaultDescription
concurrent_spawn_limitinteger50Maximum concurrent spawned children for this scope
actionstring"block"Either block or warn when the cap would be breached

How It Works

The spawn-limit handler fires from the before_spawn hook on PolicyGovernanceHook. The dispatcher (ProductionSpawnDispatcher) is responsible for incrementing the counter on enqueue_children and decrementing on on_child_complete. This handler only reads the counter and rejects on breach.

PhaseWhat It ChecksActions
before_workflowNo-op (ALLOW)ALLOW
mid_executionIf context._pending_spawn is set, delegates to before_spawn; otherwise ALLOWBLOCK / WARN on cap breach
before_spawnReads Redis counter for (agent, workflow), checks current + total_children > limitBLOCK or WARN
after_workflowNo-op (ALLOW)ALLOW

Context Attributes Read

AttributePhasePurpose
context.agent_namebefore_spawnCounter key narrowing
context.workflow_namebefore_spawnCounter key narrowing
context._pending_spawnmid_executionSet by PolicyGovernanceHook.before_spawn with {child_agent, total_children}
context.run_idbefore_spawnRecorded on durable EnforcementEvent

Counter Key

spawn_concurrent:{agent_name}:{workflow_name}

Auto-prefixed with tenant by TenantAwareRedis. The SpawnLimitHandler.build_counter_key() staticmethod is the canonical builder -- the dispatcher uses the same function to ensure key parity.

Example Policy

{
"name": "Research Fleet Concurrency Cap",
"category": "spawn-limit",
"rules": {
"concurrent_spawn_limit": 25,
"action": "block"
},
"scope": {
"agents": ["research-agent"],
"workflows": ["batch-research"]
},
"enabled": true
}

SDK Integration

import waxell_observe as waxell
waxell.init()

@waxell.observe(agent_name="research-agent", enforce_policy=True)
async def batch_research(topics: list[str]) -> list[str]:
# ctx.spawn calls trigger before_spawn — the policy reads the
# Redis counter and halts if this batch + in-flight would exceed
# concurrent_spawn_limit for (research-agent, batch-research).
return await ctx.spawn_many("worker-agent", topics)

Observability

FieldExample
Categoryspawn-limit (recorded as concurrency in EnforcementEvent)
Actionblock
Reason"concurrent spawn limit breached: 23 in flight + 5 requested would reach 28 (cap 25) for scope agent=research-agent workflow=batch-research"
Metadata{"cap": 25, "current_in_flight": 23, "requested": 5, "would_be": 28, "counter_key": "spawn_concurrent:research-agent:batch-research", "workflow_name": "batch-research"}

Every block / warn writes a durable EnforcementEvent (handler="spawn-limit", category="concurrency") so blocks-in-last-24h-by-agent is a cheap query in the admin surface.

Common Gotchas

  1. supported_planes = ["runtime"]. This policy never fires on the observe plane. Agents instrumented purely via waxell-observe (no governed runtime) cannot enforce it. The before_spawn call site only exists in the runtime dispatcher.
  2. The handler does NOT increment / decrement the counter. That is the dispatcher's job. If you swap dispatchers, ensure the new one also calls inc on enqueue and dec on completion using the same key builder.
  3. Redis read failure fails open. If the Redis read raises, the handler logs a warning and returns ALLOW. This is intentional (a Redis outage should not block spawns) but means a degraded Redis cluster will silently disable the cap.
  4. concurrent_spawn_limit <= 0 disables the check. Setting it to 0 or negative is a no-op ALLOW, not "block all spawns". To block all spawns, set the limit to 1 and rely on the cap being breached by the first request.
  5. Scope narrowing happens upstream. By the time the handler runs, DynamicPolicyManager has already filtered to in-scope policies via applies_to. The counter still reads the per-(agent, workflow) key, so a tenant-wide policy with no scope filter still counts per-agent-per-workflow buckets, not a single tenant-total counter.
  6. mid_execution is the entry point in Phase 1.0. The hook stamps context._pending_spawn then calls mid_execution_detailed. Non-spawn mid_execution invocations (tool calls, LLM calls) return ALLOW immediately.

Next Steps