Skip to main content

IBM watsonx.ai

5-minute deploy · BYO Python code · runs anywhere (laptop, Code Engine, ECS, Lambda)

If your agent calls watsonx Foundation Models (Granite, Llama, Mistral, granite-guardian, etc.) through the official ibm-watsonx-ai Python SDK, Waxell auto-instruments every call — no code changes, no wrappers. You get full spans, real token counts, and cost sourced from IBM's published per-million-token rates.

What gets captured

Waxell patches the watsonx SDK in-place. After waxell.init(), every call to these methods produces a child LLM span on the parent run:

watsonx SDK methodCaptured as
ModelInference.generate(...)LLM span with prompt, completion, token counts
ModelInference.generate_text(...)LLM span (text-only return shape)
ModelInference.chat(...)LLM span (chat-shaped messages)
ModelInference.chat_stream(...)LLM span with streamed completion accumulated
Embeddings.embed_documents(...)Embedding span with input count + vector dim

Cost is computed from a built-in table covering Granite 4, Granite 3.x, Granite Guardian, and the common third-party models hosted on watsonx — refreshed against IBM's published pricing.

Prerequisites

  1. An IBM Cloud account with watsonx.ai enabled.
  2. A project ID and a watsonx API key with at least Editor access on the project.
  3. Python 3.10+ (the ibm-watsonx-ai SDK supports 3.10–3.13).

Install

pip install waxell-observe ibm-watsonx-ai

Wire it up

Two lines at the top of your entrypoint — waxell.init() must import before ibm_watsonx_ai so the instrumentor can patch the methods at import time:

import os
import waxell_observe as waxell

waxell.init() # must run before importing ibm_watsonx_ai

from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference


@waxell.observe(agent_name="my-watsonx-agent")
def chat_turn(user_message: str) -> str:
model = ModelInference(
model_id="ibm/granite-3-8b-instruct",
credentials=Credentials(
api_key=os.environ["WATSONX_APIKEY"],
url=os.environ.get("WATSONX_URL", "https://us-south.ml.cloud.ibm.com"),
),
project_id=os.environ["WATSONX_PROJECT_ID"],
)

response = model.chat(messages=[
{"role": "user", "content": user_message},
])
return response["choices"][0]["message"]["content"]


if __name__ == "__main__":
print(chat_turn("what is the deepest known cave?"))

Environment variables

# Waxell
export WAXELL_API_KEY=<your-waxell-key>
export WAXELL_API_URL=https://api.waxell.dev # default; omit for production

# IBM watsonx
export WATSONX_APIKEY=<your-ibm-api-key>
export WATSONX_PROJECT_ID=<your-project-id>
export WATSONX_URL=https://us-south.ml.cloud.ibm.com # or your region's endpoint

Run

python my_agent.py

Then check Waxell:

wax runs list --limit 5

You'll see your agent run with a chat ibm/granite-3-8b-instruct child LLM span, real input/output token counts pulled from the watsonx response, and cost computed from IBM's published rate.

Deploy targets

Anywhere Python runs:

  • Locallypython my_agent.py, no config beyond the env vars above.
  • IBM Cloud Code Engine — push a container with the SDK + your agent; set the env vars in the Code Engine app config. NAT egress to api.waxell.dev is enabled by default.
  • AWS Lambda / ECS, GCP Cloud Run, Azure Container Apps — identical pattern. watsonx-side is just an HTTPS API call from your host, so any runtime with outbound HTTPS works.

Policy enforcement

The whole @waxell.observe + @waxell.tool + ctx.check_policy(...) stack works the same as any other runtime. A policy denial raised before model.chat(...) aborts the call so the watsonx tokens are never charged — useful for PII redaction, prompt-injection defense, and tenant cost caps on watsonx workloads.

A note on watsonx Agent Lab

IBM Agent Lab is a no-code agent builder where agents are configured in the IBM Cloud UI and execute on IBM's managed runtime — you don't ship your own Python into it, so Waxell can't install inside an Agent Lab agent the way it installs inside an AgentCore microVM.

For Agent Lab workloads, the two governed patterns are:

  1. BYO wrapper. Build your agent loop in Python with ibm-watsonx-ai, wrap with @waxell.observe, and run it as shown above. Skip Agent Lab.
  2. Agent Lab + Waxell-instrumented client. Use Agent Lab to author/version your prompts and tool catalog, then call the Agent Lab deployment endpoint from your Python entrypoint (also wrapped with @waxell.observe). You get Waxell spans on the outer call but not on the agent's internal steps.

Pattern 1 is what we recommend for any workload that needs full-fidelity governance.

Troubleshooting

Spans don't appear in Waxell. Check that waxell.init() is imported before from ibm_watsonx_ai import .... The instrumentor patches ModelInference at import time; if watsonx loads first, the patch never lands. The fix is one line reorder.

Cost shows as $0.00. A model id with no entry in Waxell's cost table falls back to the cheapest current Granite rate (so runs are never over-counted). If you're running a custom or recently-launched model, open an issue with the model id and we'll add it to the table.

Streaming spans look truncated. chat_stream accumulates deltas client-side. If your code consumes the generator with break early, the span finalizes with whatever was received up to that point — that's intentional, but it means partial completions look short in the UI.