Cloud LLM Providers

An enterprise cloud LLM provider comparison pipeline exercising the DashScope, WatsonX, and Azure AI Inference instrumentors. Three child agents -- each backed by a different cloud provider (DashScope/Qwen, WatsonX/Granite, Azure AI/GPT-4o) -- generate responses to the same query. The orchestrator compares token usage, cost estimates, and response quality with @reasoning, then synthesizes the best answer via OpenAI.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. In production, each provider needs its own credentials (DashScope API key, WatsonX credentials, Azure AI endpoint).

Architecture

Key Code

Provider-specific tool decorators

Each cloud provider API is wrapped in a @tool(llm) decorator, normalizing the response format for comparison while preserving provider-specific calling conventions.

@waxell.tool(name="dashscope_generation_call", tool_type="llm")
def dashscope_generate(client, model: str, prompt: str) -> dict:
    response = client.Generation.call(model=model, prompt=prompt)
    return {
        "text": response.output.text,
        "model": response.model,
        "tokens_in": response.usage.input_tokens,
        "tokens_out": response.usage.output_tokens,
        "tokens_total": response.usage.total_tokens,
    }


@waxell.tool(name="watsonx_model_generate", tool_type="llm")
def watsonx_generate(model_inference, prompt: str) -> dict:
    response = model_inference.generate(prompt=prompt)
    result = response["results"][0]
    return {
        "text": result["generated_text"],
        "model": response["model_id"],
        "tokens_in": result["input_token_count"],
        "tokens_out": result["generated_token_count"],
    }

Cross-provider comparison with cost analysis

The reasoning decorator compares all providers on token usage and cost estimates.

@waxell.reasoning_dec(step="compare_cloud_providers")
async def compare_cloud_providers(results: dict) -> dict:
    providers = list(results.keys())
    token_counts = {p: r["tokens_total"] for p, r in results.items()}
    costs = {p: r["cost_estimate"] for p, r in results.items()}

    cheapest = min(costs, key=costs.get)
    most_tokens = max(token_counts, key=token_counts.get)

    return {
        "thought": f"Compared {len(providers)} cloud providers. Cost range: ${min(costs.values()):.4f} to ${max(costs.values()):.4f}.",
        "evidence": [f"{p}: {token_counts[p]} tokens, ~${costs[p]:.4f}" for p in providers],
        "conclusion": f"{cheapest} is most cost-effective. Enterprise choice depends on data residency and compliance.",
    }

What this demonstrates

@waxell.observe -- parent orchestrator with 3 child agents (one per cloud provider)
@waxell.step_dec -- query preprocessing
@waxell.decision -- cloud strategy selection (compare_all vs best_fit)
@waxell.tool(llm) -- provider-specific LLM wrappers with normalized output
@waxell.reasoning_dec -- cross-provider cost and quality comparison
waxell.tag() -- provider-specific tagging
waxell.score() -- comparison completeness and cost efficiency scores
Three cloud instrumentors -- DashScope, WatsonX, and Azure AI Inference

Run it

cd dev/waxell-dev
python -m app.demos.cloud_llm_providers_agent --dry-run

Source

dev/waxell-dev/app/demos/cloud_llm_providers_agent.py

Architecture​

Key Code​

Provider-specific tool decorators​

Cross-provider comparison with cost analysis​

What this demonstrates​

Run it​

Source​