Skip to main content

FAISS RAG Agent

The gold-standard multi-agent RAG pipeline. A parent orchestrator coordinates 5 child agents across 3 LLM providers (OpenAI, Anthropic, Groq), demonstrating every SDK primitive -- decorators, manual convenience functions, and auto-instrumented LLM calls -- in a single trace. The pipeline preprocesses a query, classifies it, retrieves documents from a FAISS vector index, verifies the answer with Anthropic tool_use, and synthesizes a final response with Groq.

Environment variables

This example requires OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Orchestrator with @observe and manual decide()

The parent agent wraps the full pipeline. Each nested @observe call creates a child run with automatic parent-child lineage.

@waxell.observe(agent_name="rag-orchestrator", workflow_name="rag-pipeline")
async def run_pipeline(query: str, dry_run: bool = False, waxell_ctx=None):
waxell.tag("demo", "faiss-gold-standard")
waxell.metadata("providers", ["openai", "anthropic", "groq"])

# @step decorator -- preprocess the query
preprocessed = await preprocess_query(query)

# @decision decorator -- classify via OpenAI
classification = await classify_query(query=query, openai_client=openai_client)

# Manual decide() -- routing decision
waxell.decide(
"retrieval_strategy",
chosen=strategy,
options=["semantic_search", "keyword_search", "hybrid_search"],
reasoning=f"Query classified as '{chosen}' -- {strategy} optimal",
confidence=0.88,
)

# Child agents auto-link to this parent via WaxellContext
retrieval_result = await run_retrieval(query=query, corpus=MOCK_DOCUMENTS)
tool_result = await run_tool_calling(query=query, documents=...)
verification = await run_anthropic_verification(query=query, ...)
synthesis = await run_synthesis(query=query, documents=..., verified=...)

FAISS retriever with @tool and @retrieval

The retriever child agent uses @tool to record FAISS operations and @retrieval to record the search-and-rank step.

@waxell.tool(tool_type="vector_db")
def create_index(dim: int):
"""Create a FAISS flat L2 index."""
import faiss
return faiss.IndexFlatL2(dim)

@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
"""Search the FAISS index for nearest neighbors."""
distances, indices = index.search(query_vec.reshape(1, -1).astype(np.float32), k)
return {"distances": distances[0].tolist(), "indices": indices[0].tolist()}

@waxell.retrieval(source="faiss")
def search_and_rank(query: str, corpus: list, indices: list, distances: list):
"""Rank and return matched documents from FAISS search results."""
matched = []
for idx, dist in zip(indices, distances):
if 0 <= idx < len(corpus):
doc = dict(corpus[idx])
doc["score"] = round(1.0 / (1.0 + dist), 4)
matched.append(doc)
return matched

Answer synthesis with @reasoning, @decision, @retry, and score()

The synthesizer child agent demonstrates the remaining SDK primitives in one place.

@waxell.retry_dec(max_attempts=2, strategy="retry")
async def _call_groq(groq_client, messages, model="llama-3.3-70b-versatile"):
"""Groq LLM call with automatic retry recording on failure."""
return await groq_client.chat.completions.create(model=model, messages=messages)

@waxell.reasoning_dec(step="quality_assessment")
async def assess_quality(answer: str, documents: list) -> dict:
"""Chain-of-thought quality assessment auto-recorded as reasoning span."""
return {
"thought": f"Generated answer references {coverage}/{len(documents)} docs.",
"evidence": [f"Source: {t}" for t in doc_titles],
"conclusion": "Answer adequately covers source material",
}

@waxell.decision(name="output_format", options=["brief", "detailed", "bullet_points"])
def choose_output_format(num_docs: int, verified: bool) -> dict:
return {"chosen": "detailed", "reasoning": "...", "confidence": 0.85}

# Scores
waxell.score("answer_quality", 0.92, comment="auto-scored based on doc coverage")
waxell.score("factual_grounding", verified, data_type="boolean")

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy with automatic lineage via WaxellContext
  • @waxell.tool -- FAISS vector database operations recorded with tool_type="vector_db"
  • @waxell.retrieval -- search-and-rank recorded with source="faiss"
  • @waxell.decision -- query classification and output format selection
  • waxell.decide() -- manual routing decision with options, reasoning, and confidence
  • @waxell.reasoning_dec -- chain-of-thought quality assessment
  • @waxell.retry_dec -- LLM call with automatic retry recording
  • @waxell.step_dec -- query preprocessing recorded as execution step
  • waxell.score() -- quality and factual grounding scores attached to the trace
  • waxell.tag() / waxell.metadata() -- agent role, provider, and pipeline metadata
  • Auto-instrumented LLM calls -- OpenAI, Anthropic, and Groq calls captured without extra code
  • 3 decision capture methods -- decorator (@decision), manual (decide()), and auto-detected (from tool_calls / tool_use)

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.faiss_agent --dry-run

# Live (real OpenAI + Anthropic + Groq)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
python -m app.demos.faiss_agent

# Custom query
python -m app.demos.faiss_agent --query "Find documents about model safety"

Source

dev/waxell-dev/app/demos/faiss_agent.py