Skip to main content

Operations Policy

The operations policy category enforces operational controls on workflow execution. Currently it has a single rule: timeout_seconds. The handler monitors run duration and generates warnings when agents exceed configured time limits.

Use it when you need SLA compliance monitoring, performance alerting, or tracking slow-running agents.

Rules

RuleTypeDefaultDescription
timeout_secondsinteger (min 1)300Maximum allowed run duration in seconds. Checked post-execution for observed agents

How It Works

Post-Hoc Enforcement

Operations does NOT preemptively kill agents. The agent always runs to completion. Timeout violations are detected after the fact and recorded as governance incidents (WARN). This is correct for observe-path agents where external frameworks control execution.

Enforcement Phases

PhaseBehavior
before_workflowStores timeout in context. Returns ALLOW with "Timeout set to Xs"
mid_executionChecks context.duration if available. Returns WARN if exceeded, ALLOW otherwise
after_workflowFinal duration check. Returns WARN if exceeded, ALLOW otherwise

Context Data

AttributePhasePurpose
context.durationmid_execution, after_workflowElapsed time of the workflow run (float, seconds)

Actions Returned

  • ALLOW -- duration within timeout, or no timeout configured, or duration not available
  • WARN -- duration exceeds timeout_seconds

The handler never returns BLOCK. Timeout violations are always WARN. This is by design -- for observe-path agents, the execution has already happened.

Example Policies

Strict SLA (60 seconds)

Alert on any run exceeding 1 minute:

{
"timeout_seconds": 60
}

Standard Monitoring (5 minutes)

Default timeout with monitoring:

{
"timeout_seconds": 300
}

Long-Running Batch (1 hour)

For batch processing agents that legitimately run longer:

{
"timeout_seconds": 3600
}

SDK Integration

Using the Context Manager

import waxell_observe as waxell

waxell.init()

async with waxell.WaxellContext(
agent_name="data-processor",
enforce_policy=True,
) as ctx:
# Operations policy stores timeout at before_workflow
# Agent runs normally -- no blocking

result = await process_data(query)
ctx.set_result(result)

# after_workflow checks: if total duration > timeout_seconds -> WARN
# WARN is recorded but does NOT raise PolicyViolationError

Using the Decorator

@waxell.observe(
agent_name="data-processor",
enforce_policy=True,
)
async def process_data(query: str):
# Operations checks happen after this function returns
return await long_running_analysis(query)

Enforcement Flow

Agent starts (WaxellContext.__aenter__)
|
+-- before_workflow: stores timeout (e.g., 60s) in context
| -> ALLOW "Timeout set to 60s"
|
+-- Agent executes (duration tracked automatically)
| |
| +-- mid_execution (if triggered):
| | -> duration < timeout? ALLOW "Within timeout (30.0s/60s)"
| | -> duration > timeout? WARN "Mid-run: approaching timeout (75.0s/60s)"
| |
| +-- Agent continues regardless of mid_execution result
|
+-- Agent completes
|
+-- after_workflow: final duration check
-> duration < timeout? ALLOW "Completed within timeout (45.0s/60s)"
-> duration > timeout? WARN "Run exceeded timeout (120.0s/60s)"
-> WARN recorded as governance incident

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Operations
  4. Set timeout_seconds to your desired limit
  5. Set scope to target specific agents (e.g., data-processor)
  6. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "SLA Timeout Monitor",
"category": "operations",
"rules": {
"timeout_seconds": 60
},
"scope": {
"agents": ["data-processor"]
},
"enabled": true
}'

Observability

Governance Tab

Operations evaluations appear with:

FieldExample
Policy nameSLA Timeout Monitor
Actionallow or warn
Categoryoperations
Reason"Completed within timeout (45.0s/60s)" or "Run exceeded timeout (120.0s/60s)"
Metadata{"duration": 120.0, "timeout": 60}

Governance Incidents

Timeout violations create governance incidents visible in:

  • The trace's Governance tab
  • The Governance Incidents list
  • Compliance Console (if Insights is enabled)

Combining with Other Policies

Operations + Kill Switch: Use operations timeout warnings to feed into kill switch error rate monitoring. Repeated timeout violations may indicate an agent that should be killed.

Operations + Control: Combine timeout monitoring with cost threshold monitoring. Long-running agents often also consume more LLM tokens (higher cost).

Operations + Compliance: Include operations as a required category in a SOC 2 compliance profile to ensure all agents have timeout monitoring configured.

Common Gotchas

  1. Returns WARN, never BLOCK. The agent always completes. Timeout violations are informational -- they create governance incidents but do not prevent execution.

  2. context.duration may be None. If mid_execution fires before the duration attribute is populated, the handler returns ALLOW with a generic reason. Duration is reliably set by the time after_workflow runs.

  3. Duration is calculated when WaxellContext closes. For observe-path agents, the SDK measures elapsed time between __aenter__ and __aexit__. This includes all LLM calls, tool calls, and any processing time.

  4. Default timeout is 300 seconds (5 minutes). If you configure an operations policy without specifying timeout_seconds, it defaults to 300s.

  5. Simulated duration in dry-run requires manual setting. Demo agents set context._duration_override to simulate elapsed time. In production, actual wall-clock time is used automatically.

Next Steps