Skip to main content

OpenAI Agents SDK

A multi-agent pipeline simulating OpenAI Agents SDK patterns with Runner, triage agent, and specialist handoff. The orchestrator prepares a Runner configuration, classifies the request type via an LLM-powered decision, dispatches a runner child agent that performs triage classification and specialist handoff, then dispatches an evaluator child agent that assesses analysis quality and generates a structured security report.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with triage and handoff

The runner child agent simulates the OpenAI Agents SDK Runner.run pattern: triage classification, waxell.decide() for handoff, and specialist agent execution.

@waxell.observe(agent_name="openai-agents-runner", workflow_name="openai-agents-execution")
async def run_agent_execution(query: str, openai_client, runner_config: dict, waxell_ctx=None):
waxell.tag("agent_role", "runner")
waxell.tag("framework", "openai_agents")

# Triage agent classifies via @tool
triage_result = triage_classify(query=query, available_agents=AVAILABLE_AGENTS)
target_agent = triage_result["target"]

# Triage LLM call
response1 = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": "You are a triage agent."}, {"role": "user", "content": query}],
)

# Handoff decision via waxell.decide()
waxell.decide(
"agent_handoff", chosen=target_agent, options=AVAILABLE_AGENTS,
reasoning=f"Triage classified with {triage_result.get('confidence', 0.9):.0%} confidence",
confidence=triage_result.get("confidence", 0.9),
)

# Specialist agent execution
response2 = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": f"You are a {target_agent.replace('_', ' ')}."}, ...],
)
return {"specialist_output": response2.choices[0].message.content, "agents_executed": ["triage_agent", target_agent]}

Evaluator with reasoning and report formatting

The evaluator child agent assesses quality with @reasoning and formats a structured report with @tool.

@waxell.observe(agent_name="openai-agents-evaluator", workflow_name="openai-agents-evaluation")
async def run_agent_evaluation(query: str, runner_result: dict, openai_client, waxell_ctx=None):
waxell.tag("agent_role", "evaluator")

# Quality assessment via @reasoning
quality = await evaluate_analysis_quality(
analysis=runner_result["specialist_output"],
agents_used=runner_result["agents_executed"],
handoff_count=runner_result["handoffs"],
)

# Report formatting via @tool
report = format_security_report(
analysis=runner_result["specialist_output"],
agents_used=runner_result["agents_executed"],
handoff_count=runner_result["handoffs"],
)

waxell.score("analysis_depth", 0.90)
waxell.score("handoff_efficiency", True, data_type="boolean")
return {"quality": quality, "report": report}

What this demonstrates

  • @waxell.observe -- parent orchestrator with 2 child agents
  • @waxell.step_dec -- runner config preparation
  • @waxell.decision -- LLM-powered request type classification
  • waxell.decide() -- manual inline decision for agent handoff
  • @waxell.tool -- triage classification and report formatting
  • @waxell.reasoning_dec -- analysis quality assessment
  • waxell.score() -- depth scores and boolean efficiency markers
  • OpenAI Agents SDK patterns -- Runner, triage, handoff, and specialist agents

Run it

cd dev/waxell-dev
python -m app.demos.openai_agents_agent --dry-run

Source

dev/waxell-dev/app/demos/openai_agents_agent.py