Watsonx Deploy — Deal Intelligence ReAct Agent

A LangGraph ReAct agent powered by IBM Granite on watsonx.ai, deployed to a watsonx deployment space, with Waxell observability and governance baked in. The agent runs entirely inside the watsonx runtime — but because waxell-observe is installed as an SDK dependency, every LLM call, tool invocation, governance check, and memory write flows back to your Waxell controlplane over HTTPS.

This is the canonical pattern for instrumenting an agent that lives on a third-party platform (watsonx.ai, SageMaker, Vertex, Bedrock Agents) where you control the agent code but not the host runtime.

Environment variables

Requires WATSONX_API_KEY, WATSONX_PROJECT_ID, WAXELL_API_KEY, WAXELL_API_URL, plus per-tool credentials (PINECONE_API_KEY, OPENAI_API_KEY).

Architecture

The integration pattern

The agent uses three layers of observability that combine to fill out a complete trace + Memory tab:

Layer	Mechanism	What it captures
Auto-instrumentation	`waxell.init()`	All LangChain/watsonx LLM calls, token usage, latency
Trace-side memory	`ctx.record_memory_write()` + `ctx.record_reasoning()`	Reasoning + memory spans visible in the trace
Controlplane Memory tab	`POST /memory/episodic/` and `POST /memory/semantic/`	Persisted episodic data + semantic facts with embeddings

The third layer is the key insight: agents on third-party platforms can populate the Waxell Memory tab via HTTPS, even without direct database access.

Initialize observe before importing the agent

def deployable_ai_service(context, ...):
    import os

    # Wire env vars passed from the deployment config
    os.environ["WAXELL_API_KEY"] = waxell_api_key
    os.environ["WAXELL_API_URL"] = waxell_api_url

    # Init BEFORE importing LangChain so the LangChain instrumentor patches
    import waxell_observe
    waxell_observe.init()

    # Now import the agent
    from langchain_ibm import ChatWatsonx
    from langgraph.prebuilt import create_react_agent
    from agent.tools import ALL_TOOLS

    llm = ChatWatsonx(model_id="ibm/granite-4-h-small", watsonx_client=client, ...)
    graph = create_react_agent(llm, ALL_TOOLS, prompt=system_prompt)

Wrap the run with `WaxellContext`

The watsonx-deployed agent uses WaxellContext instead of @observe because the watsonx runtime contract gives you a generate() handler — you instrument inside the handler body, not by decorating it. This is the advanced pattern the context manager docs describe: lifecycle wrapping for an entry point you don't own.

from waxell_observe import WaxellContext

with WaxellContext(
    agent_name="watsonx-react-agent",
    workflow_name="watsonx-deal-analysis",
    enforce_policy=True,
    mid_execution_governance=True,
) as ctx:
    result = graph.invoke({"messages": messages})

    # Post-process the trace
    _walk_and_instrument(ctx, result.get("messages", []), run_id=ctx.run_id)

    # Set the final deliverable
    final = _extract_final_message(result["messages"])
    ctx.set_result({"response": final, "model": model, "agent": "watsonx-react-agent"})

mid_execution_governance=True makes every record_memory_write flush to the controlplane and check governance immediately — so policies that gate memory operations (PII checks, forbidden-content scans) fire while the agent is still running.

Walk the trace and emit reasoning + memory writes

LangGraph hands you back a messages list when the graph completes. Walking it lets you emit structured reasoning + memory spans for each tool call:

def _walk_and_instrument(ctx, messages: list, run_id: str) -> None:
    """Emit a reasoning span + memory write per tool call."""
    scope_key = f"watsonx-react-agent:{run_id}"

    for i, msg in enumerate(messages):
        tool_calls = getattr(msg, "tool_calls", None) or []
        if not tool_calls:
            continue

        for tc in tool_calls:
            tool_name = tc["name"]
            tool_args = tc.get("args", {})

            # Find the ToolMessage that followed
            tool_result = ""
            for nxt in messages[i + 1:]:
                if getattr(nxt, "name", "") == tool_name:
                    tool_result = str(getattr(nxt, "content", ""))
                    break

            # 1. Reasoning span — visible in trace view
            ctx.record_reasoning(
                step=f"tool_call.{tool_name}",
                thought=f"Calling {tool_name}({json.dumps(tool_args)[:200]}) to gather "
                        f"information for the investment brief.",
                conclusion=tool_result[:500] if tool_result else "(no result)",
            )

            # 2. Memory write — episodic span (trace-side)
            ctx.record_memory_write(
                memory_type="episodic",
                content=f"{tool_name}({json.dumps(tool_args)[:120]}) -> {tool_result[:400]}",
            )

            # 3. Populate the Memory tab — episodic slot (HTTPS POST)
            _post_episodic(run_id, slot_name=f"tool.{tool_name}", scope_key=scope_key, data={
                "tool": tool_name,
                "args": tool_args,
                "result_preview": tool_result[:400],
            })

Populate the controlplane Memory tab via HTTPS

Two REST endpoints let an external runtime write to the Memory tab:

Episodic slot — bulk dict/list/value with optional TTL:

def _post_episodic(run_id: str, slot_name: str, scope_key: str, data: dict) -> None:
    httpx.post(
        f"{WAXELL_API_URL}/api/v1/observability/runs/{run_id}/memory/episodic/",
        headers={"X-Wax-Key": WAXELL_API_KEY, "Content-Type": "application/json"},
        json={
            "slot_name": slot_name,
            "scope_key": scope_key,
            "memory_type": "dict",
            "data": data,
            "ttl_seconds": 24 * 3600,
        },
        timeout=10.0,
    )

Semantic fact — content + 1536-dim embedding + importance + tags:

def _post_semantic(run_id: str, slot_name: str, scope_key: str, content: str,
                   importance: float = 0.7, tags: list[str] | None = None,
                   fact_key: str | None = None, source_tool: str = "") -> None:
    # Compute the embedding (same model as your ingestion)
    from openai import OpenAI
    oai = OpenAI()
    emb = oai.embeddings.create(
        model="text-embedding-3-small",
        input=[content[:8000]],
    ).data[0].embedding

    httpx.post(
        f"{WAXELL_API_URL}/api/v1/observability/runs/{run_id}/memory/semantic/",
        headers={"X-Wax-Key": WAXELL_API_KEY, "Content-Type": "application/json"},
        json={
            "slot_name": slot_name,
            "scope_key": scope_key,
            "content": content[:8000],
            "content_embedding": emb,
            "importance": importance,
            "tags": tags or [],
            "fact_key": fact_key,
            "source_tool": source_tool,
        },
        timeout=15.0,
    )

For the full endpoint reference (auth, request schema, validation), see REST API endpoints.

Promote important findings to semantic facts

After walking the trace, lift high-signal tool results into semantic memory with hand-curated facts so the Memory tab's semantic tier has real content (not just raw tool dumps):

if tool_name == "get_company_profile" and tool_result:
    profile = json.loads(tool_result)
    facts = [
        (
            f"{profile['name']} reported {profile['latest_quarter']['total_revenue_usd']} "
            f"in revenue (+{profile['latest_quarter']['yoy_revenue_growth']} YoY) for "
            f"{profile['latest_quarter']['fiscal_period']}.",
            "latest_financials",  # fact_key for dedup
            0.95,                  # importance
            ["financial", "earnings"],  # tags
        ),
        # ... more curated facts ...
    ]
    for content, fact_key, importance, tags in facts:
        _post_semantic(
            run_id, slot_name="company_facts", scope_key=scope_key,
            content=content, importance=importance, tags=tags,
            fact_key=fact_key, source_tool="get_company_profile",
        )

# Also persist the final deliverable as a semantic fact
if final_brief:
    _post_semantic(
        run_id, slot_name="deliverable", scope_key=f"watsonx-react-agent:{run_id}",
        content=final_brief[:8000], importance=1.0,
        tags=["deliverable", "investment_brief"],
        fact_key="final_brief", source_tool="(final-brief)",
    )

The fact_key acts as a dedup key — re-posting with the same fact_key updates the existing fact instead of creating a duplicate.

What this demonstrates

WaxellContext as a deployment-handler wrapper — the right pattern when you don't control the entry function signature (watsonx hands you generate(context))
mid_execution_governance=True — memory writes trigger governance immediately, not at run completion
ctx.record_reasoning() + ctx.record_memory_write() — trace-side capture for visibility
REST memory write endpoints — POST /memory/episodic/ and POST /memory/semantic/ populate the Memory tab from any platform that can make HTTPS calls
Embedding ownership — the agent computes the embedding before POST (the controlplane stores it as-is for pgvector search)
Graceful degradation — every HTTP call is wrapped in try/except so a memory-tab outage doesn't break the deliverable
Auto-instrumentation across LangChain + watsonx — waxell.init() patches both, so the trace tree includes Granite LLM calls and LangChain tool/chain spans without manual recording

Deploy it

cd dev/waxell-dev/app/demos/watsonx_deploy

# Edit config.toml with your watsonx + Waxell credentials, then:
watsonx-ai service new
watsonx-ai service invoke "Analyze TechCorp for Series B investment"

Source

dev/waxell-dev/app/demos/watsonx_deploy/

Architecture​

The integration pattern​

Initialize observe before importing the agent​

Wrap the run with WaxellContext​

Walk the trace and emit reasoning + memory writes​

Populate the controlplane Memory tab via HTTPS​

Promote important findings to semantic facts​

What this demonstrates​

Deploy it​

Source​