Multi-Agent Coordination Agent
A 15-step cross-framework stress test that simulates 3 agent frameworks (CrewAI, AutoGen, Agno) coordinating on a research task with shared Zep memory, Pinecone vector retrieval, and DeepEval evaluation. The CrewAI phase runs researcher + writer agents, AutoGen runs a 2-round fact-checker + critic group chat with @reasoning, and Agno synthesizes the final output using Groq for fast inference. Produces 3 evaluation scores and an accept/revise verdict.
This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.
Architecture
Key Code
Cross-framework tool calls with shared Zep memory
Each framework stores its results in shared Zep memory for downstream consumption.
@waxell.tool(tool_type="memory")
def zep_memory_add(zep_client, session_id, messages, phase="") -> dict:
return zep_client.memory.add(session_id=session_id, messages=messages)
@waxell.tool(tool_type="agent_framework")
def crewai_agent_execute(role: str, task: str) -> dict:
return {"output_length": len(response), "status": "completed"}
@waxell.tool(tool_type="agent_framework")
def agno_agent_run(agent, prompt, context) -> dict:
result = agent.run(prompt=prompt, context=context)
return {"output_length": len(SYNTHESIS_RESPONSE), "status": "completed"}
AutoGen multi-round reasoning and accept/revise verdict
4 rounds of @reasoning capture the fact-checker and critic debate, followed by a deployment decision.
@waxell.reasoning_dec(step="fact_check_round1")
def reasoning_fact_check_r1() -> dict:
return {
"thought": "CrewAI claims verified. Agno characterization oversimplified. Governance omitted.",
"evidence": ["CrewAI docs", "AutoGen docs", "Agno README"],
"conclusion": "Research factually sound but has completeness gaps.",
}
@waxell.decision(name="accept_or_revise", options=["accept", "revise"])
def decide_accept_or_revise(eval_scores, avg_score, all_passed) -> dict:
verdict = "accept" if all_passed and avg_score >= 0.80 else "revise"
return {"chosen": verdict, "reasoning": f"All metrics passed, avg {avg_score:.2f}", "confidence": 0.93}
What this demonstrates
- 3 agent frameworks -- CrewAI (supervisor-worker), AutoGen (group chat), Agno (reactive pipeline) in a single trace.
@waxell.tool(tool_type="memory")-- 4 Zep memory operations (3 adds, 1 search) for shared state.@waxell.tool(tool_type="agent_framework")-- CrewAI and Agno agent execution recorded.@waxell.retrieval-- Pinecone vector retrieval and Zep memory retrieval.@waxell.reasoning_dec-- 4 reasoning chains (2 fact-checker rounds, 2 critic rounds).@waxell.decision-- accept/revise verdict based on evaluation scores.waxell.score()-- faithfulness, coherence, completeness, and pipeline_quality scores.@waxell.step_dec-- 15 step recordings across all phases.- 15-step pipeline -- maximum-depth cross-framework coordination stress test.
Run it
# Dry-run (no API key needed)
python -m app.demos.multi_agent_coordination_agent --dry-run
# Live mode
OPENAI_API_KEY=sk-... python -m app.demos.multi_agent_coordination_agent