FAISS RAG Agent
The gold-standard multi-agent RAG pipeline. A parent orchestrator coordinates 5 child agents across 3 LLM providers (OpenAI, Anthropic, Groq), demonstrating every SDK primitive -- decorators, manual convenience functions, and auto-instrumented LLM calls -- in a single trace. The pipeline preprocesses a query, classifies it, retrieves documents from a FAISS vector index, verifies the answer with Anthropic tool_use, and synthesizes a final response with Groq.
This example requires OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
Orchestrator with @observe and manual decide()
The parent agent wraps the full pipeline. Each nested @observe call creates a child run with automatic parent-child lineage.
@waxell.observe(agent_name="rag-orchestrator", workflow_name="rag-pipeline")
async def run_pipeline(query: str, dry_run: bool = False, waxell_ctx=None):
waxell.tag("demo", "faiss-gold-standard")
waxell.metadata("providers", ["openai", "anthropic", "groq"])
# @step decorator -- preprocess the query
preprocessed = await preprocess_query(query)
# @decision decorator -- classify via OpenAI
classification = await classify_query(query=query, openai_client=openai_client)
# Manual decide() -- routing decision
waxell.decide(
"retrieval_strategy",
chosen=strategy,
options=["semantic_search", "keyword_search", "hybrid_search"],
reasoning=f"Query classified as '{chosen}' -- {strategy} optimal",
confidence=0.88,
)
# Child agents auto-link to this parent via WaxellContext
retrieval_result = await run_retrieval(query=query, corpus=MOCK_DOCUMENTS)
tool_result = await run_tool_calling(query=query, documents=...)
verification = await run_anthropic_verification(query=query, ...)
synthesis = await run_synthesis(query=query, documents=..., verified=...)
FAISS retriever with @tool and @retrieval
The retriever child agent uses @tool to record FAISS operations and @retrieval to record the search-and-rank step.
@waxell.tool(tool_type="vector_db")
def create_index(dim: int):
"""Create a FAISS flat L2 index."""
import faiss
return faiss.IndexFlatL2(dim)
@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
"""Search the FAISS index for nearest neighbors."""
distances, indices = index.search(query_vec.reshape(1, -1).astype(np.float32), k)
return {"distances": distances[0].tolist(), "indices": indices[0].tolist()}
@waxell.retrieval(source="faiss")
def search_and_rank(query: str, corpus: list, indices: list, distances: list):
"""Rank and return matched documents from FAISS search results."""
matched = []
for idx, dist in zip(indices, distances):
if 0 <= idx < len(corpus):
doc = dict(corpus[idx])
doc["score"] = round(1.0 / (1.0 + dist), 4)
matched.append(doc)
return matched
Answer synthesis with @reasoning, @decision, @retry, and score()
The synthesizer child agent demonstrates the remaining SDK primitives in one place.
@waxell.retry_dec(max_attempts=2, strategy="retry")
async def _call_groq(groq_client, messages, model="llama-3.3-70b-versatile"):
"""Groq LLM call with automatic retry recording on failure."""
return await groq_client.chat.completions.create(model=model, messages=messages)
@waxell.reasoning_dec(step="quality_assessment")
async def assess_quality(answer: str, documents: list) -> dict:
"""Chain-of-thought quality assessment auto-recorded as reasoning span."""
return {
"thought": f"Generated answer references {coverage}/{len(documents)} docs.",
"evidence": [f"Source: {t}" for t in doc_titles],
"conclusion": "Answer adequately covers source material",
}
@waxell.decision(name="output_format", options=["brief", "detailed", "bullet_points"])
def choose_output_format(num_docs: int, verified: bool) -> dict:
return {"chosen": "detailed", "reasoning": "...", "confidence": 0.85}
# Scores
waxell.score("answer_quality", 0.92, comment="auto-scored based on doc coverage")
waxell.score("factual_grounding", verified, data_type="boolean")
What this demonstrates
@waxell.observe-- parent-child agent hierarchy with automatic lineage viaWaxellContext@waxell.tool-- FAISS vector database operations recorded withtool_type="vector_db"@waxell.retrieval-- search-and-rank recorded withsource="faiss"@waxell.decision-- query classification and output format selectionwaxell.decide()-- manual routing decision with options, reasoning, and confidence@waxell.reasoning_dec-- chain-of-thought quality assessment@waxell.retry_dec-- LLM call with automatic retry recording@waxell.step_dec-- query preprocessing recorded as execution stepwaxell.score()-- quality and factual grounding scores attached to the tracewaxell.tag()/waxell.metadata()-- agent role, provider, and pipeline metadata- Auto-instrumented LLM calls -- OpenAI, Anthropic, and Groq calls captured without extra code
- 3 decision capture methods -- decorator (
@decision), manual (decide()), and auto-detected (fromtool_calls/tool_use)
Run it
# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.faiss_agent --dry-run
# Live (real OpenAI + Anthropic + Groq)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
python -m app.demos.faiss_agent
# Custom query
python -m app.demos.faiss_agent --query "Find documents about model safety"