Data Access Policy
The data-access policy category controls which data sources an agent may access. Use it to prevent agents from touching sensitive databases, enforce read-only access to production data, or cap the number of records an agent can pull per query.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
allowed_data_sources | string[] | [] | If non-empty, agents may only access sources in this list. Acts as an allowlist. |
blocked_data_sources | string[] | [] | Sources that agents are never allowed to access, regardless of the allowlist. |
read_only_sources | string[] | [] | Sources that agents may read from but not write to. |
max_records_per_query | integer | 1000 | Maximum number of records an agent may retrieve in a single query. Violations produce a WARN (never a block). |
action_on_violation | string | "block" | "block" raises a PolicyViolationError; "warn" logs the violation and lets the agent continue. Applies to source-level violations only — record limit violations always WARN. |
How It Works
The data-access handler runs at three phases:
before_workflow
Runs before the agent does any work. Checks context.data_sources_configured — if the agent is pre-configured to use a blocked source, it is stopped before any LLM call or tool use occurs.
mid_execution
Triggered every time the agent calls ctx.record_data_access(...). This is where source-level and write violations are caught:
- For each source in
context.data_sources_accessed:- If the source appears in
blocked_data_sources→ violation - If
allowed_data_sourcesis non-empty and the source is not in it → violation
- If the source appears in
- For each source in
context.data_sources_written:- If the source appears in
read_only_sources→ violation
- If the source appears in
- If
context.records_queriedexceedsmax_records_per_query→ WARN (agent continues regardless ofaction_on_violation)
after_workflow
Runs after the agent completes. Produces a final audit summary listing sources accessed and written. Warnings are emitted if blocked or read-only sources were accessed during execution (belt-and-suspenders check after mid_execution).
Rule Evaluation Order
| Check | When triggered | Configurable action |
|---|---|---|
| Blocked source | mid_execution, per record_data_access call | action_on_violation |
| Not in allowlist | mid_execution, per record_data_access call | action_on_violation |
| Write to read-only source | mid_execution, per record_data_access call | action_on_violation |
| Record limit | mid_execution, per record_data_access call | Always WARN |
Blocked sources are checked before allowlist membership. A source that appears in both allowed_data_sources and blocked_data_sources is always blocked.
Example Policies
Customer Data Policy (strict)
Allow reads from approved sources only; block HR and payroll entirely; make the production database read-only:
{
"allowed_data_sources": ["postgres", "redis", "product_catalog"],
"blocked_data_sources": ["hr_records", "payroll"],
"read_only_sources": ["postgres"],
"max_records_per_query": 1000,
"action_on_violation": "block"
}
Analytics Agent (high volume, warn on violations)
Allow large record pulls but log violations rather than blocking:
{
"allowed_data_sources": ["analytics_db", "data_warehouse"],
"blocked_data_sources": ["pii_store"],
"read_only_sources": [],
"max_records_per_query": 50000,
"action_on_violation": "warn"
}
Internal-Only Agent (blocklist only)
Block specific sensitive sources without restricting everything else:
{
"allowed_data_sources": [],
"blocked_data_sources": ["hr_records", "payroll", "executive_compensation"],
"read_only_sources": [],
"max_records_per_query": 5000,
"action_on_violation": "block"
}
SDK Integration
Recording Data Access Events
Call ctx.record_data_access() after each data operation. The handler evaluates the access immediately at mid_execution:
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
waxell.init()
try:
async with waxell.WaxellContext(
agent_name="data-agent",
enforce_policy=True,
) as ctx:
# Read from a data source — triggers mid_execution governance
rows = db.query("SELECT * FROM customers LIMIT 500")
ctx.record_data_access(
source="postgres",
operation="read",
records=len(rows),
)
# Write to a data source
db.execute("UPDATE customers SET status = 'active' WHERE id = ?", customer_id)
ctx.record_data_access(
source="postgres",
operation="write",
records=1,
)
ctx.set_result({"rows": rows})
except PolicyViolationError as e:
print(f"Data access blocked: {e}")
# e.g. "Write to read-only data source 'postgres'"
# e.g. "Access to blocked data source 'hr_records'"
# e.g. "Data source 'staging_db' is not in allowed list"
Method Signature
ctx.record_data_access(
source: str, # Data source name — must match your policy config exactly
operation: str, # "read" or "write"
records: int = 0, # Number of records accessed/modified
) -> None
The source name is compared exactly (case-sensitive) against allowed_data_sources, blocked_data_sources, and read_only_sources. Use consistent naming conventions across your codebase.
Using the Decorator
@waxell.observe(
agent_name="data-agent",
enforce_policy=True,
)
async def run_query(ctx, query: str):
rows = db.query(query)
ctx.record_data_access(source="postgres", operation="read", records=len(rows))
return rows
Enforcement Flow
Agent starts (WaxellContext.__aenter__)
│
└── before_workflow governance
└── Check data_sources_configured vs blocked_data_sources
└── Pre-configured blocked source? → BLOCK (always, regardless of action_on_violation)
Agent calls ctx.record_data_access(source="hr_records", operation="read", records=200)
│
└── mid_execution governance
├── source in blocked_data_sources? → action_on_violation (BLOCK or WARN)
├── allowed_data_sources non-empty AND source not in it? → action_on_violation
├── source in read_only_sources AND operation == "write"? → action_on_violation
└── records_queried > max_records_per_query? → always WARN
Agent completes
│
└── after_workflow governance
└── Audit summary — warns if blocked or read-only sources were accessed
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Data Access
- Configure source lists, record limit, and violation action
- Set scope to target specific agents (e.g.,
data-access-agent) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "Customer Data Policy",
"category": "data-access",
"rules": {
"allowed_data_sources": ["postgres", "redis", "product_catalog"],
"blocked_data_sources": ["hr_records", "payroll"],
"read_only_sources": ["postgres"],
"max_records_per_query": 1000,
"action_on_violation": "block"
},
"scope": {
"agents": ["data-access-agent"]
},
"enabled": true
}'
Observability
Governance Tab
Data access evaluations appear with:
| Field | Example |
|---|---|
| Policy name | Customer Data Policy |
| Action | allow, warn, or block |
| Category | data-access |
| Reason | "Access to blocked data source 'hr_records'" |
| Metadata | {"blocked_source": "hr_records"} |
For allow cases:
| Field | Example |
|---|---|
| Reason | "Data access within policy (2 source(s) accessed)" |
| Metadata | {"sources_accessed": ["postgres", "redis"], "sources_written": []} |
Record Limit Warnings
When records_queried exceeds max_records_per_query, the governance tab shows:
| Field | Example |
|---|---|
| Action | warn |
| Reason | "Records queried (15000) exceeds limit (1000)" |
| Metadata | {"records_queried": 15000, "limit": 1000} |
Record limit warnings never stop the agent — the action_on_violation setting does not apply to them.
Common Gotchas
-
allowed_data_sourcesis an allowlist when non-empty. An empty list means "no restriction." As soon as you add one entry, all other sources are blocked (unlessaction_on_violationis"warn"). -
blocked_data_sourcesis checked beforeallowed_data_sources. A source in both lists is always blocked. This makes blocklists safe to use alongside allowlists without interaction surprises. -
max_records_per_queryalways WARNS, never blocks. The handler hardcodes WARN for record limit violations. Settingaction_on_violation: "block"does not change this behavior. Use source-level controls (allowlists and blocklists) for hard enforcement. -
Source names are case-sensitive and exact-matched.
"Postgres"and"postgres"are different sources. Use consistent lowercase naming in yourctx.record_data_access()calls and policy configuration. -
writeinoperationpopulatesdata_sources_written, notdata_sources_accessed. Read-only enforcement only fires when theoperationis"write". If you accidentally passoperation="read"for a write operation, the read-only check is bypassed. -
Each
record_data_access()call triggers mid_execution immediately. The handler evaluates the entire accumulated access buffer on every call. If a second access is the violating one, the first access is still recorded in the trace. -
before_workflowonly checksdata_sources_configured. This field is rarely populated in practice — it requires the agent framework to pre-declare which sources it uses. Most enforcement happens at mid_execution.
Combining with Other Policies
The data-access policy works well alongside:
- Audit policy — logs every data access with timestamp and user for compliance records
- Compliance policy — HIPAA and PCI-DSS compliance profiles often require a
data-accesspolicy inrequired_categories - Scope policy — combine with data-access to limit both which sources and how many records can be modified in a single run
Next Steps
- Policy & Governance — How policy enforcement works
- Compliance Policy — Enforce regulatory frameworks that require data-access controls
- Network Policy — Govern outbound HTTP requests alongside data source access
- Scope Policy — Limit blast radius for write operations
- Policy Categories & Templates — All 26 categories