FAISS RAG Agent

The gold-standard multi-agent RAG pipeline. A parent orchestrator coordinates 5 child agents across 3 LLM providers (OpenAI, Anthropic, Groq), demonstrating every SDK primitive -- decorators, manual convenience functions, and auto-instrumented LLM calls -- in a single trace. The pipeline preprocesses a query, classifies it, retrieves documents from a FAISS vector index, verifies the answer with Anthropic tool_use, and synthesizes a final response with Groq.

Environment variables

This example requires OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Orchestrator with `@observe` and manual `decide()`

The parent agent wraps the full pipeline. Each nested @observe call creates a child run with automatic parent-child lineage.

@waxell.observe(agent_name="rag-orchestrator", workflow_name="rag-pipeline")
async def run_pipeline(query: str, dry_run: bool = False):
    waxell.tag("demo", "faiss-gold-standard")
    waxell.metadata("providers", ["openai", "anthropic", "groq"])

    # @step decorator -- preprocess the query
    preprocessed = await preprocess_query(query)

    # @decision decorator -- classify via OpenAI
    classification = await classify_query(query=query, openai_client=openai_client)

    # Manual decide() -- routing decision
    waxell.decide(
        "retrieval_strategy",
        chosen=strategy,
        options=["semantic_search", "keyword_search", "hybrid_search"],
        reasoning=f"Query classified as '{chosen}' -- {strategy} optimal",
        confidence=0.88,
    )

    # Child agents auto-link to this parent via WaxellContext
    retrieval_result = await run_retrieval(query=query, corpus=MOCK_DOCUMENTS)
    tool_result = await run_tool_calling(query=query, documents=...)
    verification = await run_anthropic_verification(query=query, ...)
    synthesis = await run_synthesis(query=query, documents=..., verified=...)

FAISS retriever with `@tool` and `@retrieval`

The retriever child agent uses @tool to record FAISS operations and @retrieval to record the search-and-rank step.

@waxell.tool(tool_type="vector_db")
def create_index(dim: int):
    """Create a FAISS flat L2 index."""
    import faiss
    return faiss.IndexFlatL2(dim)

@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
    """Search the FAISS index for nearest neighbors."""
    distances, indices = index.search(query_vec.reshape(1, -1).astype(np.float32), k)
    return {"distances": distances[0].tolist(), "indices": indices[0].tolist()}

@waxell.retrieval(source="faiss")
def search_and_rank(query: str, corpus: list, indices: list, distances: list):
    """Rank and return matched documents from FAISS search results."""
    matched = []
    for idx, dist in zip(indices, distances):
        if 0 <= idx < len(corpus):
            doc = dict(corpus[idx])
            doc["score"] = round(1.0 / (1.0 + dist), 4)
            matched.append(doc)
    return matched

Answer synthesis with `@reasoning`, `@decision`, `@retry`, and `score()`

The synthesizer child agent demonstrates the remaining SDK primitives in one place.

@waxell.retry_dec(max_attempts=2, strategy="retry")
async def _call_groq(groq_client, messages, model="llama-3.3-70b-versatile"):
    """Groq LLM call with automatic retry recording on failure."""
    return await groq_client.chat.completions.create(model=model, messages=messages)

@waxell.reasoning_dec(step="quality_assessment")
async def assess_quality(answer: str, documents: list) -> dict:
    """Chain-of-thought quality assessment auto-recorded as reasoning span."""
    return {
        "thought": f"Generated answer references {coverage}/{len(documents)} docs.",
        "evidence": [f"Source: {t}" for t in doc_titles],
        "conclusion": "Answer adequately covers source material",
    }

@waxell.decision(name="output_format", options=["brief", "detailed", "bullet_points"])
def choose_output_format(num_docs: int, verified: bool) -> dict:
    return {"chosen": "detailed", "reasoning": "...", "confidence": 0.85}

# Scores
waxell.score("answer_quality", 0.92, comment="auto-scored based on doc coverage")
waxell.score("factual_grounding", verified, data_type="boolean")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage via WaxellContext
@waxell.tool -- FAISS vector database operations recorded with tool_type="vector_db"
@waxell.retrieval -- search-and-rank recorded with source="faiss"
@waxell.decision -- query classification and output format selection
waxell.decide() -- manual routing decision with options, reasoning, and confidence
@waxell.reasoning_dec -- chain-of-thought quality assessment
@waxell.retry_dec -- LLM call with automatic retry recording
@waxell.step_dec -- query preprocessing recorded as execution step
waxell.score() -- quality and factual grounding scores attached to the trace
waxell.tag() / waxell.metadata() -- agent role, provider, and pipeline metadata
Auto-instrumented LLM calls -- OpenAI, Anthropic, and Groq calls captured without extra code
3 decision capture methods -- decorator (@decision), manual (decide()), and auto-detected (from tool_calls / tool_use)

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.faiss_agent --dry-run

# Live (real OpenAI + Anthropic + Groq)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
python -m app.demos.faiss_agent

# Custom query
python -m app.demos.faiss_agent --query "Find documents about model safety"

Source

dev/waxell-dev/app/demos/faiss_agent.py

Architecture​

Key Code​

Orchestrator with @observe and manual decide()​

FAISS retriever with @tool and @retrieval​

Answer synthesis with @reasoning, @decision, @retry, and score()​

What this demonstrates​

Run it​

Source​