Operations Policy
The operations policy category enforces operational controls on workflow execution. Currently it has a single rule: timeout_seconds. The handler monitors run duration and generates warnings when agents exceed configured time limits.
Use it when you need SLA compliance monitoring, performance alerting, or tracking slow-running agents.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
timeout_seconds | integer (min 1) | 300 | Maximum allowed run duration in seconds. Checked post-execution for observed agents |
How It Works
Operations does NOT preemptively kill agents. The agent always runs to completion. Timeout violations are detected after the fact and recorded as governance incidents (WARN). This is correct for observe-path agents where external frameworks control execution.
Enforcement Phases
| Phase | Behavior |
|---|---|
before_workflow | Stores timeout in context. Returns ALLOW with "Timeout set to Xs" |
mid_execution | Checks context.duration if available. Returns WARN if exceeded, ALLOW otherwise |
after_workflow | Final duration check. Returns WARN if exceeded, ALLOW otherwise |
Context Data
| Attribute | Phase | Purpose |
|---|---|---|
context.duration | mid_execution, after_workflow | Elapsed time of the workflow run (float, seconds) |
Actions Returned
- ALLOW -- duration within timeout, or no timeout configured, or duration not available
- WARN -- duration exceeds timeout_seconds
The handler never returns BLOCK. Timeout violations are always WARN. This is by design -- for observe-path agents, the execution has already happened.
Example Policies
Strict SLA (60 seconds)
Alert on any run exceeding 1 minute:
{
"timeout_seconds": 60
}
Standard Monitoring (5 minutes)
Default timeout with monitoring:
{
"timeout_seconds": 300
}
Long-Running Batch (1 hour)
For batch processing agents that legitimately run longer:
{
"timeout_seconds": 3600
}
SDK Integration
Using the Context Manager
import waxell_observe as waxell
waxell.init()
async with waxell.WaxellContext(
agent_name="data-processor",
enforce_policy=True,
) as ctx:
# Operations policy stores timeout at before_workflow
# Agent runs normally -- no blocking
result = await process_data(query)
ctx.set_result(result)
# after_workflow checks: if total duration > timeout_seconds -> WARN
# WARN is recorded but does NOT raise PolicyViolationError
Using the Decorator
@waxell.observe(
agent_name="data-processor",
enforce_policy=True,
)
async def process_data(query: str):
# Operations checks happen after this function returns
return await long_running_analysis(query)
Enforcement Flow
Agent starts (WaxellContext.__aenter__)
|
+-- before_workflow: stores timeout (e.g., 60s) in context
| -> ALLOW "Timeout set to 60s"
|
+-- Agent executes (duration tracked automatically)
| |
| +-- mid_execution (if triggered):
| | -> duration < timeout? ALLOW "Within timeout (30.0s/60s)"
| | -> duration > timeout? WARN "Mid-run: approaching timeout (75.0s/60s)"
| |
| +-- Agent continues regardless of mid_execution result
|
+-- Agent completes
|
+-- after_workflow: final duration check
-> duration < timeout? ALLOW "Completed within timeout (45.0s/60s)"
-> duration > timeout? WARN "Run exceeded timeout (120.0s/60s)"
-> WARN recorded as governance incident
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Operations
- Set
timeout_secondsto your desired limit - Set scope to target specific agents (e.g.,
data-processor) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "SLA Timeout Monitor",
"category": "operations",
"rules": {
"timeout_seconds": 60
},
"scope": {
"agents": ["data-processor"]
},
"enabled": true
}'
Observability
Governance Tab
Operations evaluations appear with:
| Field | Example |
|---|---|
| Policy name | SLA Timeout Monitor |
| Action | allow or warn |
| Category | operations |
| Reason | "Completed within timeout (45.0s/60s)" or "Run exceeded timeout (120.0s/60s)" |
| Metadata | {"duration": 120.0, "timeout": 60} |
Governance Incidents
Timeout violations create governance incidents visible in:
- The trace's Governance tab
- The Governance Incidents list
- Compliance Console (if Insights is enabled)
Combining with Other Policies
Operations + Kill Switch: Use operations timeout warnings to feed into kill switch error rate monitoring. Repeated timeout violations may indicate an agent that should be killed.
Operations + Control: Combine timeout monitoring with cost threshold monitoring. Long-running agents often also consume more LLM tokens (higher cost).
Operations + Compliance: Include operations as a required category in a SOC 2 compliance profile to ensure all agents have timeout monitoring configured.
Common Gotchas
-
Returns WARN, never BLOCK. The agent always completes. Timeout violations are informational -- they create governance incidents but do not prevent execution.
-
context.durationmay be None. If mid_execution fires before the duration attribute is populated, the handler returns ALLOW with a generic reason. Duration is reliably set by the timeafter_workflowruns. -
Duration is calculated when WaxellContext closes. For observe-path agents, the SDK measures elapsed time between
__aenter__and__aexit__. This includes all LLM calls, tool calls, and any processing time. -
Default timeout is 300 seconds (5 minutes). If you configure an operations policy without specifying
timeout_seconds, it defaults to 300s. -
Simulated duration in dry-run requires manual setting. Demo agents set
context._duration_overrideto simulate elapsed time. In production, actual wall-clock time is used automatically.
Next Steps
- Policy & Governance -- How policy enforcement works
- LLM Policy -- Model allowlists and token limits
- Quality Policy -- Output quality validation
- Policy Categories & Templates -- All 26 categories