Skip to main content

Retrieval Policy

The retrieval policy category governs the quality of documents retrieved in RAG pipelines. It checks relevance scores, source freshness, allowed collections, and source diversity — blocking or warning before stale or off-topic context contaminates your agent's answers.

Use it when you need to enforce quality standards on vector search, document retrieval, or any step where your agent fetches external context before generating a response.

Rules

RuleTypeDefaultDescription
min_relevance_scorenumber0.7Minimum cosine/dot-product similarity score required (0.0–1.0)
max_source_age_daysinteger90Maximum age of a retrieved source document in days
min_chunksinteger1Minimum number of results required
max_chunksinteger10Maximum number of results allowed
allowed_collectionsstring[][]If non-empty, only these vector collections may be queried
blocked_sourcesstring[][]Source identifiers that are always blocked regardless of score
require_source_diversitybooleanfalseEnforce that no single source dominates the result set
max_single_source_rationumber0.6Maximum fraction of results from a single source (when diversity is required)
action_on_low_relevancestring"warn"What to do when relevance is below threshold: "warn" or "block"
action_on_stale_sourcestring"block"What to do when a source exceeds max age: "warn" or "block"
action_on_chunk_violationstring"warn"What to do when chunk count exceeds min_chunks/max_chunks: "warn" or "block"

How It Works

The retrieval handler runs at two phases:

  • mid_execution — checks each retrieved document as it is recorded. Fires after every ctx.record_retrieval_result() call.
  • after_workflow — checks the full result set for source diversity violations after the agent finishes.

Evaluation Order (mid_execution)

  1. Check chunk count: too many {max_chunks} or too few {min_chunks}
  2. For each result in retrieval_results:
    • Relevance score below min_relevance_scoreaction_on_low_relevance
    • Source in blocked_sources → always BLOCK
    • Collection not in allowed_collections (if list is non-empty) → always BLOCK
    • Source age exceeds max_source_age_daysaction_on_stale_source

Evaluation Order (after_workflow)

  1. If require_source_diversity is true, count how many chunks come from each source
  2. If any single source accounts for more than max_single_source_ratio of results → WARN

Rule Matching Reference

CheckConditionResult
Relevance 0.85, threshold 0.700.85 ≥ 0.70Pass
Relevance 0.60, threshold 0.700.60 < 0.70action_on_low_relevance
Source age 45d, max 90d45 ≤ 90Pass
Source age 200d, max 90d200 > 90action_on_stale_source
Collection "knowledge_base", allowed ["knowledge_base"]In listPass
Collection "internal-hr", allowed ["knowledge_base"]Not in listBLOCK
Source "deprecated-kb.pdf", blocked list includes itExact matchBLOCK
3 of 4 chunks from same source, max ratio 0.60.75 > 0.6WARN (after_workflow)
Chunk Count Action Is Configurable

By default, min_chunks and max_chunks violations produce WARN. Set action_on_chunk_violation: "block" to hard-stop agents that retrieve too many or too few documents. This is useful in regulated environments where retrieval pipeline health must be enforced.

allowed_collections Is Checked Per Document

If allowed_collections is ["knowledge_base"] and you retrieve from both "knowledge_base" and "internal-hr", the first result from "knowledge_base" passes, but the result from "internal-hr" triggers a BLOCK. The check fires per document, not per query.

Example Policies

Standard RAG Quality Gate

Enforce minimum relevance and freshness for a customer-facing knowledge base:

{
"min_relevance_score": 0.75,
"max_source_age_days": 90,
"min_chunks": 1,
"max_chunks": 8,
"allowed_collections": ["knowledge_base", "product-docs"],
"action_on_low_relevance": "warn",
"action_on_stale_source": "block"
}

Strict Compliance Retrieval

Block on any quality violation — suitable for regulated environments where stale or low-quality sources cannot be used:

{
"min_relevance_score": 0.80,
"max_source_age_days": 30,
"min_chunks": 2,
"max_chunks": 5,
"allowed_collections": ["compliance-docs"],
"blocked_sources": ["deprecated-policy-archive"],
"require_source_diversity": true,
"max_single_source_ratio": 0.5,
"action_on_low_relevance": "block",
"action_on_stale_source": "block",
"action_on_chunk_violation": "block"
}

Lenient Monitoring

Warn on all quality issues without blocking — useful for observing retrieval quality before enforcing stricter rules:

{
"min_relevance_score": 0.5,
"max_source_age_days": 365,
"action_on_low_relevance": "warn",
"action_on_stale_source": "warn"
}

SDK Integration

Recording Retrieval Results

Call ctx.record_retrieval_result() once per retrieved document. Each call automatically sends data to the controlplane and triggers mid_execution governance immediately.

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

async with waxell.WaxellContext(
agent_name="rag-agent",
enforce_policy=True,
) as ctx:
# Record each retrieved document
ctx.record_retrieval_result(
relevance_score=0.92,
source="product-manual.pdf",
collection="knowledge_base",
age_days=14,
)
ctx.record_retrieval_result(
relevance_score=0.85,
source="faq.pdf",
collection="knowledge_base",
age_days=30,
)
# Governance fires here — low relevance or stale sources raise PolicyViolationError

answer = synthesize_answer(context_docs, query)
ctx.set_result({"answer": answer})

Catching Policy Violations

try:
async with waxell.WaxellContext(
agent_name="rag-agent",
enforce_policy=True,
) as ctx:
for doc in retrieved_docs:
ctx.record_retrieval_result(
relevance_score=doc.score,
source=doc.id,
collection=doc.collection,
age_days=doc.age_days,
)
answer = synthesize(docs, query)
ctx.set_result({"answer": answer})

except PolicyViolationError as e:
# e.g. "Retrieved from blocked source 'deprecated-kb'"
# or "Source age (400 days) exceeds max (90 days)"
# or "Collection 'internal-hr' not in allowed list"
return fallback_response(query)

Using the Decorator

@waxell.observe(
agent_name="rag-agent",
enforce_policy=True,
)
async def run_rag(query: str, docs: list[dict]):
ctx = waxell.get_current_context()
for doc in docs:
ctx.record_retrieval_result(
relevance_score=doc["score"],
source=doc["source"],
collection=doc["collection"],
age_days=doc["age_days"],
)
return synthesize(docs, query)

Enforcement Flow

Agent runs vector search, retrieves N documents

├── before_workflow governance runs
│ └── Retrieval rules stored in context._retrieval_rules

├── For each document retrieved:
│ └── ctx.record_retrieval_result(relevance_score, source, collection, age_days)
│ │
│ └── mid_execution governance fires
│ ├── chunk count within [min_chunks, max_chunks]?
│ │ └── No: action_on_chunk_violation (warn or block)
│ ├── relevance_score >= min_relevance_score?
│ │ └── No: action_on_low_relevance (warn or block)
│ ├── source in blocked_sources?
│ │ └── Yes: BLOCK
│ ├── collection in allowed_collections?
│ │ └── No: BLOCK
│ └── age_days <= max_source_age_days?
│ └── No: action_on_stale_source (warn or block)

├── Agent synthesizes answer from retrieved context

└── after_workflow governance fires
└── require_source_diversity?
└── Yes: check max_single_source_ratio across all sources
└── Any source dominates? → WARN

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Retrieval
  4. Configure relevance threshold, max age, and collection allowlist
  5. Set scope to target specific agents (e.g., rag-agent)
  6. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "RAG Quality Guard",
"category": "retrieval",
"rules": {
"min_relevance_score": 0.75,
"max_source_age_days": 90,
"min_chunks": 1,
"max_chunks": 8,
"allowed_collections": ["knowledge_base"],
"action_on_low_relevance": "warn",
"action_on_stale_source": "block"
},
"scope": {
"agents": ["rag-agent"]
},
"enabled": true
}'

Observability

Governance Tab

Retrieval evaluations appear with:

FieldExample
Policy nameRAG Quality Guard
Phasemid_execution or after_workflow
Actionallow, warn, or block
Categoryretrieval
Reason"Retrieval quality within policy (3 chunks)"
Metadata{"chunk_count": 3}

For violations:

ViolationReasonMetadata
Low relevance"Retrieval relevance (0.41) below threshold (0.70)"{"relevance_score": 0.41, "threshold": 0.70}
Stale source"Source age (400 days) exceeds max (90 days)"{"age_days": 400, "max_age": 90}
Blocked collection"Collection 'internal-hr' not in allowed list"{"collection": "internal-hr", "allowed": ["knowledge_base"]}
Blocked source"Retrieved from blocked source 'deprecated-kb'"{"blocked_source": "deprecated-kb"}
Low chunk count"Retrieved chunks (0) below minimum (1)"{"chunk_count": 0, "limit": 1}
Source domination"Source 'doc.pdf' dominates at 75% (max 60%)"{"warnings": ["..."]}

Trace Tab

Each ctx.record_retrieval_result() call produces a span under the parent vector_search tool span. You can inspect per-document metadata including relevance score, source, collection, and age.

Combining with Other Policies

The retrieval policy is commonly paired with:

  • Content policy — scan the synthesized answer for PII or harmful content after retrieval
  • Grounding policy — verify that the final answer stays grounded in the retrieved documents
  • Quality policy — score answer quality (coherence, factuality) after synthesis
  • Compliance policy — require all three of the above for regulated use cases

Common Gotchas

  1. Empty allowed_collections means allow all. A non-configured allowlist is not an implicit block-all. Set at least one collection name to restrict access.

  2. blocked_sources uses exact match. The source string must exactly match the value passed to record_retrieval_result(). Use consistent naming conventions across your retrieval pipeline.

  3. mid_execution fires per document, stops at first violation. If your first retrieved document fails the collection check, the second document is never evaluated. You will only see one governance event per run.

  4. Source diversity is after_workflow only. The require_source_diversity check runs after the agent completes, not during retrieval. It will not stop the agent mid-execution — it produces a warning in the governance audit.

  5. age_days must be computed by your application. The SDK does not calculate document age automatically. You must compute (today - document_created_date).days and pass it to record_retrieval_result().

  6. min_chunks and max_chunks default to WARN but can BLOCK. Set action_on_chunk_violation: "block" to enforce chunk count limits as hard stops. The default "warn" behavior records a governance event without stopping the agent.

  7. action_on_low_relevance: "warn" still produces a governance event. The agent continues running but the warning is recorded in the trace and governance tab. Use this for monitoring before enforcing stricter rules.

Next Steps