Cloud LLM Providers
An enterprise cloud LLM provider comparison pipeline exercising the DashScope, WatsonX, and Azure AI Inference instrumentors. Three child agents -- each backed by a different cloud provider (DashScope/Qwen, WatsonX/Granite, Azure AI/GPT-4o) -- generate responses to the same query. The orchestrator compares token usage, cost estimates, and response quality with @reasoning, then synthesizes the best answer via OpenAI.
This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. In production, each provider needs its own credentials (DashScope API key, WatsonX credentials, Azure AI endpoint).
Architecture
Key Code
Provider-specific tool decorators
Each cloud provider API is wrapped in a @tool(llm) decorator, normalizing the response format for comparison while preserving provider-specific calling conventions.
@waxell.tool(name="dashscope_generation_call", tool_type="llm")
def dashscope_generate(client, model: str, prompt: str) -> dict:
response = client.Generation.call(model=model, prompt=prompt)
return {
"text": response.output.text,
"model": response.model,
"tokens_in": response.usage.input_tokens,
"tokens_out": response.usage.output_tokens,
"tokens_total": response.usage.total_tokens,
}
@waxell.tool(name="watsonx_model_generate", tool_type="llm")
def watsonx_generate(model_inference, prompt: str) -> dict:
response = model_inference.generate(prompt=prompt)
result = response["results"][0]
return {
"text": result["generated_text"],
"model": response["model_id"],
"tokens_in": result["input_token_count"],
"tokens_out": result["generated_token_count"],
}
Cross-provider comparison with cost analysis
The reasoning decorator compares all providers on token usage and cost estimates.
@waxell.reasoning_dec(step="compare_cloud_providers")
async def compare_cloud_providers(results: dict) -> dict:
providers = list(results.keys())
token_counts = {p: r["tokens_total"] for p, r in results.items()}
costs = {p: r["cost_estimate"] for p, r in results.items()}
cheapest = min(costs, key=costs.get)
most_tokens = max(token_counts, key=token_counts.get)
return {
"thought": f"Compared {len(providers)} cloud providers. Cost range: ${min(costs.values()):.4f} to ${max(costs.values()):.4f}.",
"evidence": [f"{p}: {token_counts[p]} tokens, ~${costs[p]:.4f}" for p in providers],
"conclusion": f"{cheapest} is most cost-effective. Enterprise choice depends on data residency and compliance.",
}
What this demonstrates
@waxell.observe-- parent orchestrator with 3 child agents (one per cloud provider)@waxell.step_dec-- query preprocessing@waxell.decision-- cloud strategy selection (compare_all vs best_fit)@waxell.tool(llm)-- provider-specific LLM wrappers with normalized output@waxell.reasoning_dec-- cross-provider cost and quality comparisonwaxell.tag()-- provider-specific taggingwaxell.score()-- comparison completeness and cost efficiency scores- Three cloud instrumentors -- DashScope, WatsonX, and Azure AI Inference
Run it
cd dev/waxell-dev
python -m app.demos.cloud_llm_providers_agent --dry-run