Skip to main content

Voice AI Agent

A multi-agent pipeline that runs two voice AI frameworks -- LiveKit Agents and Pipecat -- through their full STT/LLM/TTS pipelines, then compares architectures with reasoning and LLM analysis.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

LiveKit Agents Pipeline

Four @tool-decorated functions exercise the full LiveKit Agents API: VoicePipelineAgent.start, STT.recognize, LLM.chat, and TTS.synthesize.

@waxell.tool(tool_type="voice_pipeline")
def livekit_pipeline_start(llm_model: str, stt_model: str, tts_model: str, room: str) -> dict:
"""Start a LiveKit VoicePipelineAgent."""
lk_llm = MockLiveKitLLM(model=llm_model)
lk_stt = MockLiveKitSTT(model=stt_model)
lk_tts = MockLiveKitTTS(model=tts_model)
pipeline = MockLiveKitVoicePipelineAgent(llm=lk_llm, stt=lk_stt, tts=lk_tts)
result = pipeline.start(room=room)
return {"pipeline": pipeline.name, "room": room, "status": result.status}

@waxell.tool(tool_type="voice_stt")
def livekit_stt_recognize(model: str, audio_data: bytes) -> dict:
"""Run LiveKit STT.recognize."""
stt = MockLiveKitSTT(model=model)
result = stt.recognize(audio_data=audio_data)
return {"transcript": result.text, "duration_s": result.duration}

@waxell.tool(tool_type="voice_llm")
def livekit_llm_chat(model: str, messages: list) -> dict:
"""Run LiveKit LLM.chat."""
llm = MockLiveKitLLM(model=model)
result = llm.chat(chat_ctx=MockLiveKitChatContext(messages=messages))
return {"content": result.content, "tokens_in": result.usage.prompt_tokens}

@waxell.tool(tool_type="voice_tts")
def livekit_tts_synthesize(model: str, text: str) -> dict:
"""Run LiveKit TTS.synthesize."""
tts = MockLiveKitTTS(model=model)
result = tts.synthesize(text=text)
return {"duration_s": result.duration, "sample_rate": result.sample_rate}

Pipecat Frame Pipeline

Pipecat uses a frame-based architecture. Four @tool-decorated functions exercise Pipeline.run, PipelineRunner.run, STTService.run, and TTSService.run.

@waxell.tool(tool_type="voice_pipeline")
async def pipecat_pipeline_run(processors: list) -> dict:
"""Run a Pipecat Pipeline with FrameProcessor.process_frame."""
pc_stt = MockPipecatSTTService(model="deepgram")
pc_tts = MockPipecatTTSService(model="elevenlabs")
pc_processor = MockPipecatFrameProcessor(name="LLMProcessor")
pipeline = MockPipecatPipeline(processors=[pc_stt, pc_processor, pc_tts])
await pipeline.run()
return {"processor_count": 3, "processors": [p.name for p in pipeline.processors]}

@waxell.tool(tool_type="voice_stt")
async def pipecat_stt_run(model: str, audio_data: bytes) -> dict:
"""Run Pipecat STTService.run."""
stt = MockPipecatSTTService(model=model)
transcript = await stt.run(audio_data=audio_data)
return {"transcript": transcript, "sample_rate": stt.sample_rate}

@waxell.tool(tool_type="voice_tts")
async def pipecat_tts_run(model: str, text: str) -> dict:
"""Run Pipecat TTSService.run."""
tts = MockPipecatTTSService(model=model)
audio = await tts.run(text=text)
return {"audio_bytes": len(audio), "sample_rate": tts.sample_rate}

Pipeline Assessment

The evaluator agent uses @reasoning to compare architectural tradeoffs between the two frameworks, followed by an auto-instrumented LLM call for detailed analysis.

@waxell.reasoning_dec(step="pipeline_assessment")
async def assess_pipelines(livekit_data: dict, pipecat_data: dict) -> dict:
"""Assess voice AI pipeline architectures."""
return {
"thought": f"LiveKit uses component-level APIs (STT/LLM/TTS) with VoicePipelineAgent. "
f"Pipecat uses frame-based processing with Pipeline/PipelineRunner pattern.",
"evidence": [
f"LiveKit: STT duration={livekit_data['stt_duration_s']}s",
f"Pipecat: {pipecat_data['processor_count']} processors",
],
"conclusion": "Both frameworks provide production-ready voice AI pipelines",
}

# Scores
waxell.score("frameworks_tested", 2, comment="LiveKit Agents + Pipecat")
waxell.score("evaluation_quality", 0.88, comment="Multi-framework architecture comparison")

What this demonstrates

  • Multi-agent voice pipeline -- an orchestrator coordinates voice-processor (runs both frameworks) and voice-evaluator (compares and analyzes) with automatic parent-child traces.
  • Eight @tool calls across two frameworks -- LiveKit Agents (pipeline_start, stt_recognize, llm_chat, tts_synthesize) and Pipecat (pipeline_run, runner_run, stt_run, tts_run) each with distinct tool_type values (voice_pipeline, voice_stt, voice_llm, voice_tts).
  • LiveKit component-level tracing -- individual STT, LLM, and TTS operations captured as separate spans matching the instrumentor's wrapped methods.
  • Pipecat frame-based tracing -- Pipeline.run and PipelineRunner.run captured alongside individual service operations (STTService.run, TTSService.run).
  • @reasoning decorator -- documents the architectural comparison between component-level (LiveKit) and frame-based (Pipecat) approaches.
  • Auto-instrumented LLM comparison -- the evaluator's OpenAI call is captured automatically for framework analysis.
  • waxell.tag() and waxell.metadata() -- framework names, agent roles, and component counts recorded as structured enrichment.

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.voice_ai_agent --dry-run

# Live mode
export OPENAI_API_KEY="sk-..."
export WAXELL_API_KEY="your-waxell-api-key"
export WAXELL_API_URL="https://api.waxell.ai"
python -m app.demos.voice_ai_agent

Source

dev/waxell-dev/app/demos/voice_ai_agent.py