IBM watsonx.ai
5-minute deploy · BYO Python code · runs anywhere (laptop, Code Engine, ECS, Lambda)
If your agent calls watsonx Foundation Models (Granite, Llama,
Mistral, granite-guardian, etc.) through the official
ibm-watsonx-ai Python SDK, Waxell auto-instruments every call —
no code changes, no wrappers. You get full spans, real token
counts, and cost sourced from IBM's published per-million-token
rates.
What gets captured
Waxell patches the watsonx SDK in-place. After waxell.init(),
every call to these methods produces a child LLM span on the
parent run:
| watsonx SDK method | Captured as |
|---|---|
ModelInference.generate(...) | LLM span with prompt, completion, token counts |
ModelInference.generate_text(...) | LLM span (text-only return shape) |
ModelInference.chat(...) | LLM span (chat-shaped messages) |
ModelInference.chat_stream(...) | LLM span with streamed completion accumulated |
Embeddings.embed_documents(...) | Embedding span with input count + vector dim |
Cost is computed from a built-in table covering Granite 4, Granite 3.x, Granite Guardian, and the common third-party models hosted on watsonx — refreshed against IBM's published pricing.
Prerequisites
- An IBM Cloud account with watsonx.ai enabled.
- A project ID and a watsonx API key with at least
Editoraccess on the project. - Python 3.10+ (the
ibm-watsonx-aiSDK supports 3.10–3.13).
Install
pip install waxell-observe ibm-watsonx-ai
Wire it up
Two lines at the top of your entrypoint — waxell.init() must
import before ibm_watsonx_ai so the instrumentor can patch
the methods at import time:
import os
import waxell_observe as waxell
waxell.init() # must run before importing ibm_watsonx_ai
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
@waxell.observe(agent_name="my-watsonx-agent")
def chat_turn(user_message: str) -> str:
model = ModelInference(
model_id="ibm/granite-3-8b-instruct",
credentials=Credentials(
api_key=os.environ["WATSONX_APIKEY"],
url=os.environ.get("WATSONX_URL", "https://us-south.ml.cloud.ibm.com"),
),
project_id=os.environ["WATSONX_PROJECT_ID"],
)
response = model.chat(messages=[
{"role": "user", "content": user_message},
])
return response["choices"][0]["message"]["content"]
if __name__ == "__main__":
print(chat_turn("what is the deepest known cave?"))
Environment variables
# Waxell
export WAXELL_API_KEY=<your-waxell-key>
export WAXELL_API_URL=https://api.waxell.dev # default; omit for production
# IBM watsonx
export WATSONX_APIKEY=<your-ibm-api-key>
export WATSONX_PROJECT_ID=<your-project-id>
export WATSONX_URL=https://us-south.ml.cloud.ibm.com # or your region's endpoint
Run
python my_agent.py
Then check Waxell:
wax runs list --limit 5
You'll see your agent run with a chat ibm/granite-3-8b-instruct
child LLM span, real input/output token counts pulled from the
watsonx response, and cost computed from IBM's published rate.
Deploy targets
Anywhere Python runs:
- Locally —
python my_agent.py, no config beyond the env vars above. - IBM Cloud Code Engine — push a container with the SDK + your
agent; set the env vars in the Code Engine app config. NAT
egress to
api.waxell.devis enabled by default. - AWS Lambda / ECS, GCP Cloud Run, Azure Container Apps — identical pattern. watsonx-side is just an HTTPS API call from your host, so any runtime with outbound HTTPS works.
Policy enforcement
The whole @waxell.observe + @waxell.tool + ctx.check_policy(...)
stack works the same as any other runtime. A policy denial raised
before model.chat(...) aborts the call so the watsonx tokens
are never charged — useful for PII redaction, prompt-injection
defense, and tenant cost caps on watsonx workloads.
A note on watsonx Agent Lab
IBM Agent Lab is a no-code agent builder where agents are configured in the IBM Cloud UI and execute on IBM's managed runtime — you don't ship your own Python into it, so Waxell can't install inside an Agent Lab agent the way it installs inside an AgentCore microVM.
For Agent Lab workloads, the two governed patterns are:
- BYO wrapper. Build your agent loop in Python with
ibm-watsonx-ai, wrap with@waxell.observe, and run it as shown above. Skip Agent Lab. - Agent Lab + Waxell-instrumented client. Use Agent Lab to
author/version your prompts and tool catalog, then call the
Agent Lab deployment endpoint from your Python entrypoint
(also wrapped with
@waxell.observe). You get Waxell spans on the outer call but not on the agent's internal steps.
Pattern 1 is what we recommend for any workload that needs full-fidelity governance.
Troubleshooting
Spans don't appear in Waxell. Check that waxell.init() is
imported before from ibm_watsonx_ai import .... The
instrumentor patches ModelInference at import time; if watsonx
loads first, the patch never lands. The fix is one line reorder.
Cost shows as $0.00. A model id with no entry in Waxell's
cost table falls back to the cheapest current Granite rate (so
runs are never over-counted). If you're running a custom or
recently-launched model, open an issue
with the model id and we'll add it to the table.
Streaming spans look truncated. chat_stream accumulates
deltas client-side. If your code consumes the generator with
break early, the span finalizes with whatever was received up to
that point — that's intentional, but it means partial completions
look short in the UI.