Watsonx Deploy — Deal Intelligence ReAct Agent
A LangGraph ReAct agent powered by IBM Granite on watsonx.ai, deployed to a watsonx deployment space, with Waxell observability and governance baked in. The agent runs entirely inside the watsonx runtime — but because waxell-observe is installed as an SDK dependency, every LLM call, tool invocation, governance check, and memory write flows back to your Waxell controlplane over HTTPS.
This is the canonical pattern for instrumenting an agent that lives on a third-party platform (watsonx.ai, SageMaker, Vertex, Bedrock Agents) where you control the agent code but not the host runtime.
Requires WATSONX_API_KEY, WATSONX_PROJECT_ID, WAXELL_API_KEY, WAXELL_API_URL, plus per-tool credentials (PINECONE_API_KEY, OPENAI_API_KEY).
Architecture
The integration pattern
The agent uses three layers of observability that combine to fill out a complete trace + Memory tab:
| Layer | Mechanism | What it captures |
|---|---|---|
| Auto-instrumentation | waxell.init() | All LangChain/watsonx LLM calls, token usage, latency |
| Trace-side memory | ctx.record_memory_write() + ctx.record_reasoning() | Reasoning + memory spans visible in the trace |
| Controlplane Memory tab | POST /memory/episodic/ and POST /memory/semantic/ | Persisted episodic data + semantic facts with embeddings |
The third layer is the key insight: agents on third-party platforms can populate the Waxell Memory tab via HTTPS, even without direct database access.
Initialize observe before importing the agent
def deployable_ai_service(context, ...):
import os
# Wire env vars passed from the deployment config
os.environ["WAXELL_API_KEY"] = waxell_api_key
os.environ["WAXELL_API_URL"] = waxell_api_url
# Init BEFORE importing LangChain so the LangChain instrumentor patches
import waxell_observe
waxell_observe.init()
# Now import the agent
from langchain_ibm import ChatWatsonx
from langgraph.prebuilt import create_react_agent
from agent.tools import ALL_TOOLS
llm = ChatWatsonx(model_id="ibm/granite-4-h-small", watsonx_client=client, ...)
graph = create_react_agent(llm, ALL_TOOLS, prompt=system_prompt)
Wrap the run with WaxellContext
The watsonx-deployed agent uses WaxellContext instead of @observe because the watsonx runtime contract gives you a generate() handler — you instrument inside the handler body, not by decorating it. This is the advanced pattern the context manager docs describe: lifecycle wrapping for an entry point you don't own.
from waxell_observe import WaxellContext
with WaxellContext(
agent_name="watsonx-react-agent",
workflow_name="watsonx-deal-analysis",
enforce_policy=True,
mid_execution_governance=True,
) as ctx:
result = graph.invoke({"messages": messages})
# Post-process the trace
_walk_and_instrument(ctx, result.get("messages", []), run_id=ctx.run_id)
# Set the final deliverable
final = _extract_final_message(result["messages"])
ctx.set_result({"response": final, "model": model, "agent": "watsonx-react-agent"})
mid_execution_governance=True makes every record_memory_write flush to the controlplane and check governance immediately — so policies that gate memory operations (PII checks, forbidden-content scans) fire while the agent is still running.
Walk the trace and emit reasoning + memory writes
LangGraph hands you back a messages list when the graph completes. Walking it lets you emit structured reasoning + memory spans for each tool call:
def _walk_and_instrument(ctx, messages: list, run_id: str) -> None:
"""Emit a reasoning span + memory write per tool call."""
scope_key = f"watsonx-react-agent:{run_id}"
for i, msg in enumerate(messages):
tool_calls = getattr(msg, "tool_calls", None) or []
if not tool_calls:
continue
for tc in tool_calls:
tool_name = tc["name"]
tool_args = tc.get("args", {})
# Find the ToolMessage that followed
tool_result = ""
for nxt in messages[i + 1:]:
if getattr(nxt, "name", "") == tool_name:
tool_result = str(getattr(nxt, "content", ""))
break
# 1. Reasoning span — visible in trace view
ctx.record_reasoning(
step=f"tool_call.{tool_name}",
thought=f"Calling {tool_name}({json.dumps(tool_args)[:200]}) to gather "
f"information for the investment brief.",
conclusion=tool_result[:500] if tool_result else "(no result)",
)
# 2. Memory write — episodic span (trace-side)
ctx.record_memory_write(
memory_type="episodic",
content=f"{tool_name}({json.dumps(tool_args)[:120]}) -> {tool_result[:400]}",
)
# 3. Populate the Memory tab — episodic slot (HTTPS POST)
_post_episodic(run_id, slot_name=f"tool.{tool_name}", scope_key=scope_key, data={
"tool": tool_name,
"args": tool_args,
"result_preview": tool_result[:400],
})
Populate the controlplane Memory tab via HTTPS
Two REST endpoints let an external runtime write to the Memory tab:
Episodic slot — bulk dict/list/value with optional TTL:
def _post_episodic(run_id: str, slot_name: str, scope_key: str, data: dict) -> None:
httpx.post(
f"{WAXELL_API_URL}/api/v1/observability/runs/{run_id}/memory/episodic/",
headers={"X-Wax-Key": WAXELL_API_KEY, "Content-Type": "application/json"},
json={
"slot_name": slot_name,
"scope_key": scope_key,
"memory_type": "dict",
"data": data,
"ttl_seconds": 24 * 3600,
},
timeout=10.0,
)
Semantic fact — content + 1536-dim embedding + importance + tags:
def _post_semantic(run_id: str, slot_name: str, scope_key: str, content: str,
importance: float = 0.7, tags: list[str] | None = None,
fact_key: str | None = None, source_tool: str = "") -> None:
# Compute the embedding (same model as your ingestion)
from openai import OpenAI
oai = OpenAI()
emb = oai.embeddings.create(
model="text-embedding-3-small",
input=[content[:8000]],
).data[0].embedding
httpx.post(
f"{WAXELL_API_URL}/api/v1/observability/runs/{run_id}/memory/semantic/",
headers={"X-Wax-Key": WAXELL_API_KEY, "Content-Type": "application/json"},
json={
"slot_name": slot_name,
"scope_key": scope_key,
"content": content[:8000],
"content_embedding": emb,
"importance": importance,
"tags": tags or [],
"fact_key": fact_key,
"source_tool": source_tool,
},
timeout=15.0,
)
For the full endpoint reference (auth, request schema, validation), see REST API endpoints.
Promote important findings to semantic facts
After walking the trace, lift high-signal tool results into semantic memory with hand-curated facts so the Memory tab's semantic tier has real content (not just raw tool dumps):
if tool_name == "get_company_profile" and tool_result:
profile = json.loads(tool_result)
facts = [
(
f"{profile['name']} reported {profile['latest_quarter']['total_revenue_usd']} "
f"in revenue (+{profile['latest_quarter']['yoy_revenue_growth']} YoY) for "
f"{profile['latest_quarter']['fiscal_period']}.",
"latest_financials", # fact_key for dedup
0.95, # importance
["financial", "earnings"], # tags
),
# ... more curated facts ...
]
for content, fact_key, importance, tags in facts:
_post_semantic(
run_id, slot_name="company_facts", scope_key=scope_key,
content=content, importance=importance, tags=tags,
fact_key=fact_key, source_tool="get_company_profile",
)
# Also persist the final deliverable as a semantic fact
if final_brief:
_post_semantic(
run_id, slot_name="deliverable", scope_key=f"watsonx-react-agent:{run_id}",
content=final_brief[:8000], importance=1.0,
tags=["deliverable", "investment_brief"],
fact_key="final_brief", source_tool="(final-brief)",
)
The fact_key acts as a dedup key — re-posting with the same fact_key updates the existing fact instead of creating a duplicate.
What this demonstrates
WaxellContextas a deployment-handler wrapper — the right pattern when you don't control the entry function signature (watsonx hands yougenerate(context))mid_execution_governance=True— memory writes trigger governance immediately, not at run completionctx.record_reasoning()+ctx.record_memory_write()— trace-side capture for visibility- REST memory write endpoints —
POST /memory/episodic/andPOST /memory/semantic/populate the Memory tab from any platform that can make HTTPS calls - Embedding ownership — the agent computes the embedding before POST (the controlplane stores it as-is for pgvector search)
- Graceful degradation — every HTTP call is wrapped in try/except so a memory-tab outage doesn't break the deliverable
- Auto-instrumentation across LangChain + watsonx —
waxell.init()patches both, so the trace tree includes Granite LLM calls and LangChain tool/chain spans without manual recording
Deploy it
cd dev/waxell-dev/app/demos/watsonx_deploy
# Edit config.toml with your watsonx + Waxell credentials, then:
watsonx-ai service new
watsonx-ai service invoke "Analyze TechCorp for Series B investment"