OpenAI Agents SDK
A multi-agent pipeline simulating OpenAI Agents SDK patterns with Runner, triage agent, and specialist handoff. The orchestrator prepares a Runner configuration, classifies the request type via an LLM-powered decision, dispatches a runner child agent that performs triage classification and specialist handoff, then dispatches an evaluator child agent that assesses analysis quality and generates a structured security report.
This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
Runner with triage and handoff
The runner child agent simulates the OpenAI Agents SDK Runner.run pattern: triage classification, waxell.decide() for handoff, and specialist agent execution.
@waxell.observe(agent_name="openai-agents-runner", workflow_name="openai-agents-execution")
async def run_agent_execution(query: str, openai_client, runner_config: dict, waxell_ctx=None):
waxell.tag("agent_role", "runner")
waxell.tag("framework", "openai_agents")
# Triage agent classifies via @tool
triage_result = triage_classify(query=query, available_agents=AVAILABLE_AGENTS)
target_agent = triage_result["target"]
# Triage LLM call
response1 = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": "You are a triage agent."}, {"role": "user", "content": query}],
)
# Handoff decision via waxell.decide()
waxell.decide(
"agent_handoff", chosen=target_agent, options=AVAILABLE_AGENTS,
reasoning=f"Triage classified with {triage_result.get('confidence', 0.9):.0%} confidence",
confidence=triage_result.get("confidence", 0.9),
)
# Specialist agent execution
response2 = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": f"You are a {target_agent.replace('_', ' ')}."}, ...],
)
return {"specialist_output": response2.choices[0].message.content, "agents_executed": ["triage_agent", target_agent]}
Evaluator with reasoning and report formatting
The evaluator child agent assesses quality with @reasoning and formats a structured report with @tool.
@waxell.observe(agent_name="openai-agents-evaluator", workflow_name="openai-agents-evaluation")
async def run_agent_evaluation(query: str, runner_result: dict, openai_client, waxell_ctx=None):
waxell.tag("agent_role", "evaluator")
# Quality assessment via @reasoning
quality = await evaluate_analysis_quality(
analysis=runner_result["specialist_output"],
agents_used=runner_result["agents_executed"],
handoff_count=runner_result["handoffs"],
)
# Report formatting via @tool
report = format_security_report(
analysis=runner_result["specialist_output"],
agents_used=runner_result["agents_executed"],
handoff_count=runner_result["handoffs"],
)
waxell.score("analysis_depth", 0.90)
waxell.score("handoff_efficiency", True, data_type="boolean")
return {"quality": quality, "report": report}
What this demonstrates
@waxell.observe-- parent orchestrator with 2 child agents@waxell.step_dec-- runner config preparation@waxell.decision-- LLM-powered request type classificationwaxell.decide()-- manual inline decision for agent handoff@waxell.tool-- triage classification and report formatting@waxell.reasoning_dec-- analysis quality assessmentwaxell.score()-- depth scores and boolean efficiency markers- OpenAI Agents SDK patterns -- Runner, triage, handoff, and specialist agents
Run it
cd dev/waxell-dev
python -m app.demos.openai_agents_agent --dry-run