Retrieval Policy
The retrieval policy category governs the quality of documents retrieved in RAG pipelines. It checks relevance scores, source freshness, allowed collections, and source diversity — blocking or warning before stale or off-topic context contaminates your agent's answers.
Use it when you need to enforce quality standards on vector search, document retrieval, or any step where your agent fetches external context before generating a response.
Rules
| Rule | Type | Default | Description |
|---|---|---|---|
min_relevance_score | number | 0.7 | Minimum cosine/dot-product similarity score required (0.0–1.0) |
max_source_age_days | integer | 90 | Maximum age of a retrieved source document in days |
min_chunks | integer | 1 | Minimum number of results required |
max_chunks | integer | 10 | Maximum number of results allowed |
allowed_collections | string[] | [] | If non-empty, only these vector collections may be queried |
blocked_sources | string[] | [] | Source identifiers that are always blocked regardless of score |
require_source_diversity | boolean | false | Enforce that no single source dominates the result set |
max_single_source_ratio | number | 0.6 | Maximum fraction of results from a single source (when diversity is required) |
action_on_low_relevance | string | "warn" | What to do when relevance is below threshold: "warn" or "block" |
action_on_stale_source | string | "block" | What to do when a source exceeds max age: "warn" or "block" |
action_on_chunk_violation | string | "warn" | What to do when chunk count exceeds min_chunks/max_chunks: "warn" or "block" |
How It Works
The retrieval handler runs at two phases:
- mid_execution — checks each retrieved document as it is recorded. Fires after every
ctx.record_retrieval_result()call. - after_workflow — checks the full result set for source diversity violations after the agent finishes.
Evaluation Order (mid_execution)
- Check chunk count: too many {max_chunks} or too few {min_chunks}
- For each result in
retrieval_results:- Relevance score below
min_relevance_score→action_on_low_relevance - Source in
blocked_sources→ always BLOCK - Collection not in
allowed_collections(if list is non-empty) → always BLOCK - Source age exceeds
max_source_age_days→action_on_stale_source
- Relevance score below
Evaluation Order (after_workflow)
- If
require_source_diversityis true, count how many chunks come from each source - If any single source accounts for more than
max_single_source_ratioof results → WARN
Rule Matching Reference
| Check | Condition | Result |
|---|---|---|
| Relevance 0.85, threshold 0.70 | 0.85 ≥ 0.70 | Pass |
| Relevance 0.60, threshold 0.70 | 0.60 < 0.70 | action_on_low_relevance |
| Source age 45d, max 90d | 45 ≤ 90 | Pass |
| Source age 200d, max 90d | 200 > 90 | action_on_stale_source |
| Collection "knowledge_base", allowed ["knowledge_base"] | In list | Pass |
| Collection "internal-hr", allowed ["knowledge_base"] | Not in list | BLOCK |
| Source "deprecated-kb.pdf", blocked list includes it | Exact match | BLOCK |
| 3 of 4 chunks from same source, max ratio 0.6 | 0.75 > 0.6 | WARN (after_workflow) |
By default, min_chunks and max_chunks violations produce WARN. Set action_on_chunk_violation: "block" to hard-stop agents that retrieve too many or too few documents. This is useful in regulated environments where retrieval pipeline health must be enforced.
If allowed_collections is ["knowledge_base"] and you retrieve from both "knowledge_base" and "internal-hr", the first result from "knowledge_base" passes, but the result from "internal-hr" triggers a BLOCK. The check fires per document, not per query.
Example Policies
Standard RAG Quality Gate
Enforce minimum relevance and freshness for a customer-facing knowledge base:
{
"min_relevance_score": 0.75,
"max_source_age_days": 90,
"min_chunks": 1,
"max_chunks": 8,
"allowed_collections": ["knowledge_base", "product-docs"],
"action_on_low_relevance": "warn",
"action_on_stale_source": "block"
}
Strict Compliance Retrieval
Block on any quality violation — suitable for regulated environments where stale or low-quality sources cannot be used:
{
"min_relevance_score": 0.80,
"max_source_age_days": 30,
"min_chunks": 2,
"max_chunks": 5,
"allowed_collections": ["compliance-docs"],
"blocked_sources": ["deprecated-policy-archive"],
"require_source_diversity": true,
"max_single_source_ratio": 0.5,
"action_on_low_relevance": "block",
"action_on_stale_source": "block",
"action_on_chunk_violation": "block"
}
Lenient Monitoring
Warn on all quality issues without blocking — useful for observing retrieval quality before enforcing stricter rules:
{
"min_relevance_score": 0.5,
"max_source_age_days": 365,
"action_on_low_relevance": "warn",
"action_on_stale_source": "warn"
}
SDK Integration
Recording Retrieval Results
Call ctx.record_retrieval_result() once per retrieved document. Each call automatically sends data to the controlplane and triggers mid_execution governance immediately.
import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError
async with waxell.WaxellContext(
agent_name="rag-agent",
enforce_policy=True,
) as ctx:
# Record each retrieved document
ctx.record_retrieval_result(
relevance_score=0.92,
source="product-manual.pdf",
collection="knowledge_base",
age_days=14,
)
ctx.record_retrieval_result(
relevance_score=0.85,
source="faq.pdf",
collection="knowledge_base",
age_days=30,
)
# Governance fires here — low relevance or stale sources raise PolicyViolationError
answer = synthesize_answer(context_docs, query)
ctx.set_result({"answer": answer})
Catching Policy Violations
try:
async with waxell.WaxellContext(
agent_name="rag-agent",
enforce_policy=True,
) as ctx:
for doc in retrieved_docs:
ctx.record_retrieval_result(
relevance_score=doc.score,
source=doc.id,
collection=doc.collection,
age_days=doc.age_days,
)
answer = synthesize(docs, query)
ctx.set_result({"answer": answer})
except PolicyViolationError as e:
# e.g. "Retrieved from blocked source 'deprecated-kb'"
# or "Source age (400 days) exceeds max (90 days)"
# or "Collection 'internal-hr' not in allowed list"
return fallback_response(query)
Using the Decorator
@waxell.observe(
agent_name="rag-agent",
enforce_policy=True,
)
async def run_rag(query: str, docs: list[dict]):
ctx = waxell.get_current_context()
for doc in docs:
ctx.record_retrieval_result(
relevance_score=doc["score"],
source=doc["source"],
collection=doc["collection"],
age_days=doc["age_days"],
)
return synthesize(docs, query)
Enforcement Flow
Agent runs vector search, retrieves N documents
│
├── before_workflow governance runs
│ └── Retrieval rules stored in context._retrieval_rules
│
├── For each document retrieved:
│ └── ctx.record_retrieval_result(relevance_score, source, collection, age_days)
│ │
│ └── mid_execution governance fires
│ ├── chunk count within [min_chunks, max_chunks]?
│ │ └── No: action_on_chunk_violation (warn or block)
│ ├── relevance_score >= min_relevance_score?
│ │ └── No: action_on_low_relevance (warn or block)
│ ├── source in blocked_sources?
│ │ └── Yes: BLOCK
│ ├── collection in allowed_collections?
│ │ └── No: BLOCK
│ └── age_days <= max_source_age_days?
│ └── No: action_on_stale_source (warn or block)
│
├── Agent synthesizes answer from retrieved context
│
└── after_workflow governance fires
└── require_source_diversity?
└── Yes: check max_single_source_ratio across all sources
└── Any source dominates? → WARN
Creating via Dashboard
- Navigate to Governance > Policies
- Click New Policy
- Select category Retrieval
- Configure relevance threshold, max age, and collection allowlist
- Set scope to target specific agents (e.g.,
rag-agent) - Enable
Creating via API
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "RAG Quality Guard",
"category": "retrieval",
"rules": {
"min_relevance_score": 0.75,
"max_source_age_days": 90,
"min_chunks": 1,
"max_chunks": 8,
"allowed_collections": ["knowledge_base"],
"action_on_low_relevance": "warn",
"action_on_stale_source": "block"
},
"scope": {
"agents": ["rag-agent"]
},
"enabled": true
}'
Observability
Governance Tab
Retrieval evaluations appear with:
| Field | Example |
|---|---|
| Policy name | RAG Quality Guard |
| Phase | mid_execution or after_workflow |
| Action | allow, warn, or block |
| Category | retrieval |
| Reason | "Retrieval quality within policy (3 chunks)" |
| Metadata | {"chunk_count": 3} |
For violations:
| Violation | Reason | Metadata |
|---|---|---|
| Low relevance | "Retrieval relevance (0.41) below threshold (0.70)" | {"relevance_score": 0.41, "threshold": 0.70} |
| Stale source | "Source age (400 days) exceeds max (90 days)" | {"age_days": 400, "max_age": 90} |
| Blocked collection | "Collection 'internal-hr' not in allowed list" | {"collection": "internal-hr", "allowed": ["knowledge_base"]} |
| Blocked source | "Retrieved from blocked source 'deprecated-kb'" | {"blocked_source": "deprecated-kb"} |
| Low chunk count | "Retrieved chunks (0) below minimum (1)" | {"chunk_count": 0, "limit": 1} |
| Source domination | "Source 'doc.pdf' dominates at 75% (max 60%)" | {"warnings": ["..."]} |
Trace Tab
Each ctx.record_retrieval_result() call produces a span under the parent vector_search tool span. You can inspect per-document metadata including relevance score, source, collection, and age.
Combining with Other Policies
The retrieval policy is commonly paired with:
- Content policy — scan the synthesized answer for PII or harmful content after retrieval
- Grounding policy — verify that the final answer stays grounded in the retrieved documents
- Quality policy — score answer quality (coherence, factuality) after synthesis
- Compliance policy — require all three of the above for regulated use cases
Common Gotchas
-
Empty
allowed_collectionsmeans allow all. A non-configured allowlist is not an implicit block-all. Set at least one collection name to restrict access. -
blocked_sourcesuses exact match. The source string must exactly match the value passed torecord_retrieval_result(). Use consistent naming conventions across your retrieval pipeline. -
mid_execution fires per document, stops at first violation. If your first retrieved document fails the collection check, the second document is never evaluated. You will only see one governance event per run.
-
Source diversity is after_workflow only. The
require_source_diversitycheck runs after the agent completes, not during retrieval. It will not stop the agent mid-execution — it produces a warning in the governance audit. -
age_days must be computed by your application. The SDK does not calculate document age automatically. You must compute
(today - document_created_date).daysand pass it torecord_retrieval_result(). -
min_chunksandmax_chunksdefault to WARN but can BLOCK. Setaction_on_chunk_violation: "block"to enforce chunk count limits as hard stops. The default"warn"behavior records a governance event without stopping the agent. -
action_on_low_relevance: "warn"still produces a governance event. The agent continues running but the warning is recorded in the trace and governance tab. Use this for monitoring before enforcing stricter rules.
Next Steps
- Policy & Governance — How policy enforcement works
- Grounding Policy — Verify answers stay grounded in retrieved context
- Content Policy — Scan synthesized answers for harmful content
- Quality Policy — Score answer quality after synthesis
- Policy Categories & Templates — All 26 categories