Voice AI Agent

A multi-agent pipeline that runs two voice AI frameworks -- LiveKit Agents and Pipecat -- through their full STT/LLM/TTS pipelines, then compares architectures with reasoning and LLM analysis.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

LiveKit Agents Pipeline

Four @tool-decorated functions exercise the full LiveKit Agents API: VoicePipelineAgent.start, STT.recognize, LLM.chat, and TTS.synthesize.

@waxell.tool(tool_type="voice_pipeline")
def livekit_pipeline_start(llm_model: str, stt_model: str, tts_model: str, room: str) -> dict:
    """Start a LiveKit VoicePipelineAgent."""
    lk_llm = MockLiveKitLLM(model=llm_model)
    lk_stt = MockLiveKitSTT(model=stt_model)
    lk_tts = MockLiveKitTTS(model=tts_model)
    pipeline = MockLiveKitVoicePipelineAgent(llm=lk_llm, stt=lk_stt, tts=lk_tts)
    result = pipeline.start(room=room)
    return {"pipeline": pipeline.name, "room": room, "status": result.status}

@waxell.tool(tool_type="voice_stt")
def livekit_stt_recognize(model: str, audio_data: bytes) -> dict:
    """Run LiveKit STT.recognize."""
    stt = MockLiveKitSTT(model=model)
    result = stt.recognize(audio_data=audio_data)
    return {"transcript": result.text, "duration_s": result.duration}

@waxell.tool(tool_type="voice_llm")
def livekit_llm_chat(model: str, messages: list) -> dict:
    """Run LiveKit LLM.chat."""
    llm = MockLiveKitLLM(model=model)
    result = llm.chat(chat_ctx=MockLiveKitChatContext(messages=messages))
    return {"content": result.content, "tokens_in": result.usage.prompt_tokens}

@waxell.tool(tool_type="voice_tts")
def livekit_tts_synthesize(model: str, text: str) -> dict:
    """Run LiveKit TTS.synthesize."""
    tts = MockLiveKitTTS(model=model)
    result = tts.synthesize(text=text)
    return {"duration_s": result.duration, "sample_rate": result.sample_rate}

Pipecat Frame Pipeline

Pipecat uses a frame-based architecture. Four @tool-decorated functions exercise Pipeline.run, PipelineRunner.run, STTService.run, and TTSService.run.

@waxell.tool(tool_type="voice_pipeline")
async def pipecat_pipeline_run(processors: list) -> dict:
    """Run a Pipecat Pipeline with FrameProcessor.process_frame."""
    pc_stt = MockPipecatSTTService(model="deepgram")
    pc_tts = MockPipecatTTSService(model="elevenlabs")
    pc_processor = MockPipecatFrameProcessor(name="LLMProcessor")
    pipeline = MockPipecatPipeline(processors=[pc_stt, pc_processor, pc_tts])
    await pipeline.run()
    return {"processor_count": 3, "processors": [p.name for p in pipeline.processors]}

@waxell.tool(tool_type="voice_stt")
async def pipecat_stt_run(model: str, audio_data: bytes) -> dict:
    """Run Pipecat STTService.run."""
    stt = MockPipecatSTTService(model=model)
    transcript = await stt.run(audio_data=audio_data)
    return {"transcript": transcript, "sample_rate": stt.sample_rate}

@waxell.tool(tool_type="voice_tts")
async def pipecat_tts_run(model: str, text: str) -> dict:
    """Run Pipecat TTSService.run."""
    tts = MockPipecatTTSService(model=model)
    audio = await tts.run(text=text)
    return {"audio_bytes": len(audio), "sample_rate": tts.sample_rate}

Pipeline Assessment

The evaluator agent uses @reasoning to compare architectural tradeoffs between the two frameworks, followed by an auto-instrumented LLM call for detailed analysis.

@waxell.reasoning_dec(step="pipeline_assessment")
async def assess_pipelines(livekit_data: dict, pipecat_data: dict) -> dict:
    """Assess voice AI pipeline architectures."""
    return {
        "thought": f"LiveKit uses component-level APIs (STT/LLM/TTS) with VoicePipelineAgent. "
                   f"Pipecat uses frame-based processing with Pipeline/PipelineRunner pattern.",
        "evidence": [
            f"LiveKit: STT duration={livekit_data['stt_duration_s']}s",
            f"Pipecat: {pipecat_data['processor_count']} processors",
        ],
        "conclusion": "Both frameworks provide production-ready voice AI pipelines",
    }

# Scores
waxell.score("frameworks_tested", 2, comment="LiveKit Agents + Pipecat")
waxell.score("evaluation_quality", 0.88, comment="Multi-framework architecture comparison")

What this demonstrates

Multi-agent voice pipeline -- an orchestrator coordinates voice-processor (runs both frameworks) and voice-evaluator (compares and analyzes) with automatic parent-child traces.
Eight @tool calls across two frameworks -- LiveKit Agents (pipeline_start, stt_recognize, llm_chat, tts_synthesize) and Pipecat (pipeline_run, runner_run, stt_run, tts_run) each with distinct tool_type values (voice_pipeline, voice_stt, voice_llm, voice_tts).
LiveKit component-level tracing -- individual STT, LLM, and TTS operations captured as separate spans matching the instrumentor's wrapped methods.
Pipecat frame-based tracing -- Pipeline.run and PipelineRunner.run captured alongside individual service operations (STTService.run, TTSService.run).
@reasoning decorator -- documents the architectural comparison between component-level (LiveKit) and frame-based (Pipecat) approaches.
Auto-instrumented LLM comparison -- the evaluator's OpenAI call is captured automatically for framework analysis.
waxell.tag() and waxell.metadata() -- framework names, agent roles, and component counts recorded as structured enrichment.

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.voice_ai_agent --dry-run

# Live mode
export OPENAI_API_KEY="sk-..."
export WAXELL_API_KEY="your-waxell-api-key"
export WAXELL_API_URL="https://api.waxell.ai"
python -m app.demos.voice_ai_agent

Source

dev/waxell-dev/app/demos/voice_ai_agent.py

Architecture​

Key Code​

LiveKit Agents Pipeline​

Pipecat Frame Pipeline​

Pipeline Assessment​

What this demonstrates​

Run it​

Source​