Cohere Rerank Agent
A multi-agent RAG pipeline with Cohere embedding and reranking. A parent orchestrator coordinates 3 child agents -- a retriever that embeds the query and documents with Cohere, ranks initial results by cosine similarity, a reranker that rescores results using Cohere's rerank API and selects the top-k, and a synthesizer that generates answers with quality assessment. Demonstrates embedding and reranking tool spans with token tracking.
Environment variables
This example requires OPENAI_API_KEY, COHERE_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.
Architecture
Key Code
Cohere Embedding with @tool(tool_type="embedding")
The retriever embeds both the query and candidate documents using Cohere's embedding API.
@waxell.tool(tool_type="embedding")
def embed_query(query: str, model: str = "embed-english-v3.0") -> dict:
"""Embed query text using Cohere embedding model."""
return {"embedding": query_embedding, "model": model, "tokens": token_count}
@waxell.tool(tool_type="embedding")
def embed_documents(documents: list, model: str = "embed-english-v3.0") -> dict:
"""Embed candidate documents using Cohere embedding model."""
return {"embeddings": doc_embeddings, "count": len(documents), "tokens": total_tokens}
Cohere Reranking with @tool(tool_type="reranking")
The reranker rescores initial results using Cohere's cross-encoder rerank model.
@waxell.tool(tool_type="reranking")
def cohere_rerank(query: str, documents: list,
model: str = "rerank-english-v3.0", top_n: int = 3) -> dict:
"""Rerank documents using Cohere rerank API."""
return {
"reranked": reranked_docs,
"top_score": reranked_docs[0]["relevance_score"],
"model": model,
"tokens": token_count,
}
@waxell.retrieval(source="cohere")
def collect_reranked_results(reranked_docs: list) -> list[dict]:
"""Collect reranked results with scores."""
return [{"text": d["text"], "score": d["relevance_score"], "rank": i}
for i, d in enumerate(reranked_docs)]
What this demonstrates
@waxell.observe-- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage@waxell.tool(tool_type="embedding")-- Cohere embed-english-v3.0 calls for query and document embedding@waxell.tool(tool_type="reranking")-- Cohere rerank-english-v3.0 call with relevance rescoring@waxell.retrieval(source="cohere")-- initial ranking and reranked result collection@waxell.decision-- embedding model selection and top-k threshold selection@waxell.reasoning_dec-- answer quality assessment@waxell.step_dec-- query preprocessing- Token tracking -- embedding and reranking token counts recorded in tool outputs
- Auto-instrumented LLM calls -- OpenAI synthesis captured without extra code
- Embed + Rerank pipeline -- Cohere embedding followed by cross-encoder reranking
Run it
# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.cohere_rerank_agent --dry-run
# Live mode
export COHERE_API_KEY="..."
export OPENAI_API_KEY="sk-..."
python -m app.demos.cohere_rerank_agent
# Custom query
python -m app.demos.cohere_rerank_agent --dry-run --query "Best practices for RAG optimization"