Together AI

A multi-model inference pipeline using Together AI's API to run open-source models. The orchestrator pairs a fast analyzer backed by Llama 3.3 70B with a deep synthesizer backed by Mixtral 8x22B, demonstrating how to trace multi-model inference through a single provider with model-specific cost and quality tracking.

Environment variables

This example requires TOGETHER_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Multi-model pair selection

The decision decorator chooses which model pair to use based on query characteristics like comparison keywords and length.

@waxell.decision(name="choose_model_pair", options=["llama_mixtral", "llama_only", "mixtral_only"])
def choose_model_pair(query_info: dict) -> dict:
    if query_info.get("is_comparison"):
        chosen = "llama_mixtral"
        reasoning = "Comparison query -- use fast Llama for analysis, thorough Mixtral for synthesis"
    elif query_info.get("word_count", 0) > 20:
        chosen = "llama_mixtral"
        reasoning = "Complex query -- both models for depth"
    else:
        chosen = "llama_mixtral"
        reasoning = "Standard query -- demonstrate multi-model pipeline"
    return {"chosen": chosen, "reasoning": reasoning, "confidence": 0.88}

Child agents with Together AI's OpenAI-compatible API

Together AI uses an OpenAI-compatible client. Each child agent targets a different open-source model via the model parameter.

@waxell.observe(agent_name="fast-analyzer", workflow_name="together-analysis", capture_io=True)
async def run_fast_analyzer(query: str, client, *, dry_run=False, waxell_ctx=None) -> dict:
    waxell.tag("task", "fast_analysis")
    waxell.tag("model", "llama-3.3-70b")

    response = await client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[
            {"role": "system", "content": "Analyze the query and identify the key technical concepts."},
            {"role": "user", "content": query},
        ],
    )
    analysis = response.choices[0].message.content
    waxell.score("analysis_quality", 0.82)
    return {"analysis": analysis, "model": response.model}

What this demonstrates

@waxell.observe -- parent orchestrator with 2 child agents
@waxell.step_dec -- query preprocessing with comparison detection
@waxell.decision -- model pair selection
@waxell.reasoning_dec -- synthesis depth evaluation
waxell.tag() -- per-model and per-task tagging
waxell.score() -- analysis and synthesis quality scores
waxell.metadata() -- SDK and model metadata
OpenAI-compatible API -- Together AI uses the same chat.completions.create interface

Run it

cd dev/waxell-dev
python -m app.demos.together_agent --dry-run

Source

dev/waxell-dev/app/demos/together_agent.py

Architecture​

Key Code​

Multi-model pair selection​

Child agents with Together AI's OpenAI-compatible API​

What this demonstrates​

Run it​

Source​