Skip to main content

Cloud LLM Providers

An enterprise cloud LLM provider comparison pipeline exercising the DashScope, WatsonX, and Azure AI Inference instrumentors. Three child agents -- each backed by a different cloud provider (DashScope/Qwen, WatsonX/Granite, Azure AI/GPT-4o) -- generate responses to the same query. The orchestrator compares token usage, cost estimates, and response quality with @reasoning, then synthesizes the best answer via OpenAI.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. In production, each provider needs its own credentials (DashScope API key, WatsonX credentials, Azure AI endpoint).

Architecture

Key Code

Provider-specific tool decorators

Each cloud provider API is wrapped in a @tool(llm) decorator, normalizing the response format for comparison while preserving provider-specific calling conventions.

@waxell.tool(name="dashscope_generation_call", tool_type="llm")
def dashscope_generate(client, model: str, prompt: str) -> dict:
response = client.Generation.call(model=model, prompt=prompt)
return {
"text": response.output.text,
"model": response.model,
"tokens_in": response.usage.input_tokens,
"tokens_out": response.usage.output_tokens,
"tokens_total": response.usage.total_tokens,
}


@waxell.tool(name="watsonx_model_generate", tool_type="llm")
def watsonx_generate(model_inference, prompt: str) -> dict:
response = model_inference.generate(prompt=prompt)
result = response["results"][0]
return {
"text": result["generated_text"],
"model": response["model_id"],
"tokens_in": result["input_token_count"],
"tokens_out": result["generated_token_count"],
}

Cross-provider comparison with cost analysis

The reasoning decorator compares all providers on token usage and cost estimates.

@waxell.reasoning_dec(step="compare_cloud_providers")
async def compare_cloud_providers(results: dict) -> dict:
providers = list(results.keys())
token_counts = {p: r["tokens_total"] for p, r in results.items()}
costs = {p: r["cost_estimate"] for p, r in results.items()}

cheapest = min(costs, key=costs.get)
most_tokens = max(token_counts, key=token_counts.get)

return {
"thought": f"Compared {len(providers)} cloud providers. Cost range: ${min(costs.values()):.4f} to ${max(costs.values()):.4f}.",
"evidence": [f"{p}: {token_counts[p]} tokens, ~${costs[p]:.4f}" for p in providers],
"conclusion": f"{cheapest} is most cost-effective. Enterprise choice depends on data residency and compliance.",
}

What this demonstrates

  • @waxell.observe -- parent orchestrator with 3 child agents (one per cloud provider)
  • @waxell.step_dec -- query preprocessing
  • @waxell.decision -- cloud strategy selection (compare_all vs best_fit)
  • @waxell.tool(llm) -- provider-specific LLM wrappers with normalized output
  • @waxell.reasoning_dec -- cross-provider cost and quality comparison
  • waxell.tag() -- provider-specific tagging
  • waxell.score() -- comparison completeness and cost efficiency scores
  • Three cloud instrumentors -- DashScope, WatsonX, and Azure AI Inference

Run it

cd dev/waxell-dev
python -m app.demos.cloud_llm_providers_agent --dry-run

Source

dev/waxell-dev/app/demos/cloud_llm_providers_agent.py