Skip to main content

Data Access Policy

The data-access policy category controls which data sources an agent may access. Use it to prevent agents from touching sensitive databases, enforce read-only access to production data, or cap the number of records an agent can pull per query.

Rules

RuleTypeDefaultDescription
allowed_data_sourcesstring[][]If non-empty, agents may only access sources in this list. Acts as an allowlist.
blocked_data_sourcesstring[][]Sources that agents are never allowed to access, regardless of the allowlist.
read_only_sourcesstring[][]Sources that agents may read from but not write to.
max_records_per_queryinteger1000Maximum number of records an agent may retrieve in a single query. Violations produce a WARN (never a block).
action_on_violationstring"block""block" raises a PolicyViolationError; "warn" logs the violation and lets the agent continue. Applies to source-level violations only — record limit violations always WARN.

How It Works

The data-access handler runs at three phases:

before_workflow

Runs before the agent does any work. Checks context.data_sources_configured — if the agent is pre-configured to use a blocked source, it is stopped before any LLM call or tool use occurs.

mid_execution

Triggered every time the agent calls ctx.record_data_access(...). This is where source-level and write violations are caught:

  1. For each source in context.data_sources_accessed:
    • If the source appears in blocked_data_sources → violation
    • If allowed_data_sources is non-empty and the source is not in it → violation
  2. For each source in context.data_sources_written:
    • If the source appears in read_only_sources → violation
  3. If context.records_queried exceeds max_records_per_query → WARN (agent continues regardless of action_on_violation)

after_workflow

Runs after the agent completes. Produces a final audit summary listing sources accessed and written. Warnings are emitted if blocked or read-only sources were accessed during execution (belt-and-suspenders check after mid_execution).

Rule Evaluation Order

CheckWhen triggeredConfigurable action
Blocked sourcemid_execution, per record_data_access callaction_on_violation
Not in allowlistmid_execution, per record_data_access callaction_on_violation
Write to read-only sourcemid_execution, per record_data_access callaction_on_violation
Record limitmid_execution, per record_data_access callAlways WARN

Blocked sources are checked before allowlist membership. A source that appears in both allowed_data_sources and blocked_data_sources is always blocked.

Example Policies

Customer Data Policy (strict)

Allow reads from approved sources only; block HR and payroll entirely; make the production database read-only:

{
"allowed_data_sources": ["postgres", "redis", "product_catalog"],
"blocked_data_sources": ["hr_records", "payroll"],
"read_only_sources": ["postgres"],
"max_records_per_query": 1000,
"action_on_violation": "block"
}

Analytics Agent (high volume, warn on violations)

Allow large record pulls but log violations rather than blocking:

{
"allowed_data_sources": ["analytics_db", "data_warehouse"],
"blocked_data_sources": ["pii_store"],
"read_only_sources": [],
"max_records_per_query": 50000,
"action_on_violation": "warn"
}

Internal-Only Agent (blocklist only)

Block specific sensitive sources without restricting everything else:

{
"allowed_data_sources": [],
"blocked_data_sources": ["hr_records", "payroll", "executive_compensation"],
"read_only_sources": [],
"max_records_per_query": 5000,
"action_on_violation": "block"
}

SDK Integration

Recording Data Access Events

Call ctx.record_data_access() after each data operation. The handler evaluates the access immediately at mid_execution:

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
async with waxell.WaxellContext(
agent_name="data-agent",
enforce_policy=True,
) as ctx:

# Read from a data source — triggers mid_execution governance
rows = db.query("SELECT * FROM customers LIMIT 500")
ctx.record_data_access(
source="postgres",
operation="read",
records=len(rows),
)

# Write to a data source
db.execute("UPDATE customers SET status = 'active' WHERE id = ?", customer_id)
ctx.record_data_access(
source="postgres",
operation="write",
records=1,
)

ctx.set_result({"rows": rows})

except PolicyViolationError as e:
print(f"Data access blocked: {e}")
# e.g. "Write to read-only data source 'postgres'"
# e.g. "Access to blocked data source 'hr_records'"
# e.g. "Data source 'staging_db' is not in allowed list"

Method Signature

ctx.record_data_access(
source: str, # Data source name — must match your policy config exactly
operation: str, # "read" or "write"
records: int = 0, # Number of records accessed/modified
) -> None

The source name is compared exactly (case-sensitive) against allowed_data_sources, blocked_data_sources, and read_only_sources. Use consistent naming conventions across your codebase.

Using the Decorator

@waxell.observe(
agent_name="data-agent",
enforce_policy=True,
)
async def run_query(ctx, query: str):
rows = db.query(query)
ctx.record_data_access(source="postgres", operation="read", records=len(rows))
return rows

Enforcement Flow

Agent starts (WaxellContext.__aenter__)

└── before_workflow governance
└── Check data_sources_configured vs blocked_data_sources
└── Pre-configured blocked source? → BLOCK (always, regardless of action_on_violation)

Agent calls ctx.record_data_access(source="hr_records", operation="read", records=200)

└── mid_execution governance
├── source in blocked_data_sources? → action_on_violation (BLOCK or WARN)
├── allowed_data_sources non-empty AND source not in it? → action_on_violation
├── source in read_only_sources AND operation == "write"? → action_on_violation
└── records_queried > max_records_per_query? → always WARN

Agent completes

└── after_workflow governance
└── Audit summary — warns if blocked or read-only sources were accessed

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Data Access
  4. Configure source lists, record limit, and violation action
  5. Set scope to target specific agents (e.g., data-access-agent)
  6. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Customer Data Policy",
"category": "data-access",
"rules": {
"allowed_data_sources": ["postgres", "redis", "product_catalog"],
"blocked_data_sources": ["hr_records", "payroll"],
"read_only_sources": ["postgres"],
"max_records_per_query": 1000,
"action_on_violation": "block"
},
"scope": {
"agents": ["data-access-agent"]
},
"enabled": true
}'

Observability

Governance Tab

Data access evaluations appear with:

FieldExample
Policy nameCustomer Data Policy
Actionallow, warn, or block
Categorydata-access
Reason"Access to blocked data source 'hr_records'"
Metadata{"blocked_source": "hr_records"}

For allow cases:

FieldExample
Reason"Data access within policy (2 source(s) accessed)"
Metadata{"sources_accessed": ["postgres", "redis"], "sources_written": []}

Record Limit Warnings

When records_queried exceeds max_records_per_query, the governance tab shows:

FieldExample
Actionwarn
Reason"Records queried (15000) exceeds limit (1000)"
Metadata{"records_queried": 15000, "limit": 1000}

Record limit warnings never stop the agent — the action_on_violation setting does not apply to them.

Common Gotchas

  1. allowed_data_sources is an allowlist when non-empty. An empty list means "no restriction." As soon as you add one entry, all other sources are blocked (unless action_on_violation is "warn").

  2. blocked_data_sources is checked before allowed_data_sources. A source in both lists is always blocked. This makes blocklists safe to use alongside allowlists without interaction surprises.

  3. max_records_per_query always WARNS, never blocks. The handler hardcodes WARN for record limit violations. Setting action_on_violation: "block" does not change this behavior. Use source-level controls (allowlists and blocklists) for hard enforcement.

  4. Source names are case-sensitive and exact-matched. "Postgres" and "postgres" are different sources. Use consistent lowercase naming in your ctx.record_data_access() calls and policy configuration.

  5. write in operation populates data_sources_written, not data_sources_accessed. Read-only enforcement only fires when the operation is "write". If you accidentally pass operation="read" for a write operation, the read-only check is bypassed.

  6. Each record_data_access() call triggers mid_execution immediately. The handler evaluates the entire accumulated access buffer on every call. If a second access is the violating one, the first access is still recorded in the trace.

  7. before_workflow only checks data_sources_configured. This field is rarely populated in practice — it requires the agent framework to pre-declare which sources it uses. Most enforcement happens at mid_execution.

Combining with Other Policies

The data-access policy works well alongside:

  • Audit policy — logs every data access with timestamp and user for compliance records
  • Compliance policy — HIPAA and PCI-DSS compliance profiles often require a data-access policy in required_categories
  • Scope policy — combine with data-access to limit both which sources and how many records can be modified in a single run

Next Steps