Skip to main content

Bias Trend Policy

The bias-trend policy category provides continuous fairness measurement -- it tracks the rate of bias flags over a rolling time window and fires when the rate crosses a threshold. Per-run bias detection (handled by reasoning) catches isolated incidents; this handler catches the drift that regulators require ongoing measurement of.

Aligned to NIST AI RMF MS-3.1 and EU AI Act Art-10 (fairness obligations).

Rules

RuleTypeDefaultDescription
tracking_window_hoursinteger168 (7 days)Rolling window size for rate calculation
max_bias_ratenumber0.10Fraction of recent runs flagged that triggers the policy (0.10 = 10%)
min_sample_sizeinteger50Require at least N runs in the window before evaluating
action_on_exceedstring"warn"Action when threshold exceeded: "warn" or "block"

How It Works

The bias-trend handler runs at mid_execution and after_workflow. At each phase it records whether the current run had bias flags, then evaluates the rolling rate across all runs for the same agent.

PhaseBehavior
before_workflowNo-op (ALLOW)
mid_executionRecord + evaluate (the reasoning handler typically populates bias_flags at this point)
after_workflowRecord + evaluate (catches runs where bias is detected only at completion)

Storage

Records are stored in a per-process in-memory ring buffer (bounded to 2,000 entries per agent, lock-protected for thread safety). Tenants needing cross-process aggregation install a _history_fn callable on the handler class that returns (flagged_count, total_count) from their telemetry pipeline.

Context Attributes Read

AttributePhasePurpose
context.bias_flagsmid_execution, after_workflowCurrent run's bias flags (list[str]); empty = unflagged
context.agent_namemid_execution, after_workflowScopes tracking per-agent

Example Policy

{
"tracking_window_hours": 168,
"max_bias_rate": 0.05,
"min_sample_size": 100,
"action_on_exceed": "warn"
}

This warns when more than 5% of the last 168 hours of runs (with at least 100 samples) were bias-flagged.

SDK Integration

import waxell_observe as waxell
waxell.init()

@waxell.observe(agent_name="hiring-screener", enforce_policy=True)
async def screen_candidate(resume: str) -> dict:
return await rank(resume)

The reasoning policy populates context.bias_flags on each run; the bias-trend policy aggregates across runs. Assign both for full coverage.

Observability

FieldExample
Categorybias-trend
Actionwarn
Reason"Bias rate for 'hiring-screener' = 12.4% over last 168h (threshold 10.0%); 31/250 runs flagged."
Metadata{"signal": "bias_rate_exceeded", "bias_rate": 0.124, "threshold": 0.10, "flagged_count": 31, "total_count": 250, "window_hours": 168, "nist_ai_rmf": "MS-3.1", "eu_ai_act": "Art-10"}

Common Gotchas

  1. Per-process buffer. The default in-memory ring buffer is not shared across workers. A multi-process deployment with 4 workers will track 4 independent buffers. Install a _history_fn that reads from your telemetry pipeline (OpenSearch, Postgres, Loki) for accurate cross-worker rates.

  2. min_sample_size silences early runs. Until the window has ≥ N samples, the handler always returns ALLOW. New agents will never fire this policy until they accumulate enough history.

  3. max_bias_rate is a fraction, not a percentage. Use 0.10 for 10%, not 10.

  4. Bounded buffer = 2,000 entries. Tenants with extremely high run volumes will lose the oldest samples in the buffer. Use the injection seam for accurate rates at scale.

  5. Bias flags come from upstream. This handler reads context.bias_flags but does not populate it. You need the reasoning policy (or equivalent custom handler) to flag bias per-run.

  6. short_circuit_on_block = False. Like audit, this handler runs even when prior handlers have blocked -- so the flagged run is still recorded.

Next Steps