Skip to main content

Together AI

A multi-model inference pipeline using Together AI's API to run open-source models. The orchestrator pairs a fast analyzer backed by Llama 3.3 70B with a deep synthesizer backed by Mixtral 8x22B, demonstrating how to trace multi-model inference through a single provider with model-specific cost and quality tracking.

Environment variables

This example requires TOGETHER_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Multi-model pair selection

The decision decorator chooses which model pair to use based on query characteristics like comparison keywords and length.

@waxell.decision(name="choose_model_pair", options=["llama_mixtral", "llama_only", "mixtral_only"])
def choose_model_pair(query_info: dict) -> dict:
if query_info.get("is_comparison"):
chosen = "llama_mixtral"
reasoning = "Comparison query -- use fast Llama for analysis, thorough Mixtral for synthesis"
elif query_info.get("word_count", 0) > 20:
chosen = "llama_mixtral"
reasoning = "Complex query -- both models for depth"
else:
chosen = "llama_mixtral"
reasoning = "Standard query -- demonstrate multi-model pipeline"
return {"chosen": chosen, "reasoning": reasoning, "confidence": 0.88}

Child agents with Together AI's OpenAI-compatible API

Together AI uses an OpenAI-compatible client. Each child agent targets a different open-source model via the model parameter.

@waxell.observe(agent_name="fast-analyzer", workflow_name="together-analysis", capture_io=True)
async def run_fast_analyzer(query: str, client, *, dry_run=False, waxell_ctx=None) -> dict:
waxell.tag("task", "fast_analysis")
waxell.tag("model", "llama-3.3-70b")

response = await client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[
{"role": "system", "content": "Analyze the query and identify the key technical concepts."},
{"role": "user", "content": query},
],
)
analysis = response.choices[0].message.content
waxell.score("analysis_quality", 0.82)
return {"analysis": analysis, "model": response.model}

What this demonstrates

  • @waxell.observe -- parent orchestrator with 2 child agents
  • @waxell.step_dec -- query preprocessing with comparison detection
  • @waxell.decision -- model pair selection
  • @waxell.reasoning_dec -- synthesis depth evaluation
  • waxell.tag() -- per-model and per-task tagging
  • waxell.score() -- analysis and synthesis quality scores
  • waxell.metadata() -- SDK and model metadata
  • OpenAI-compatible API -- Together AI uses the same chat.completions.create interface

Run it

cd dev/waxell-dev
python -m app.demos.together_agent --dry-run

Source

dev/waxell-dev/app/demos/together_agent.py