Working Memory

The Waxell runtime ships with a three-tier memory primitive that solves the context-window explosion problem in agentic loops. Tool results auto-capture into a fast scratchpad, the LLM sees compact summaries with reference handles, and references resolve to full data on the runtime side -- so the LLM never carries the bytes.

The problem it solves

A typical agentic loop has the LLM call a tool, see the full result, then decide the next action. After 10 turns of fetching prices, dividends, research notes, and benchmarks, the conversation history is 60KB+ of JSON the LLM has to re-read every turn. You either:

Run out of context window
Burn tokens re-reading data you already have
Have the LLM hallucinate data it already fetched but can't see anymore
Truncate history and lose information

Working memory inverts this. Tool results go to a scratchpad keyed by tool_name.turn. The LLM sees a one-line summary per result and references full data by name ($ref:prices.fetch.1) when it needs to pass it to the next tool. The runtime resolves the reference before dispatching, so the actual data flows tool-to-tool without ever entering the LLM's conversation.

Measured impact (finance portfolio_analyst, 6-phase pipeline):

Metric	Without scratchpad	With scratchpad
Tokens per turn	6K–28K (growing)	~7K (flat)
Turns to complete	Runs out of context	16 turns
Tool result re-reads	Every turn	Zero

Three tiers at a glance

Tier	Backend	Lifecycle	Purpose
`working`	Redis	Per-run, cleared on completion	Scratchpad for in-flight tool results, auto-captured
`session`	Postgres	24h default	Conversation/plan state within a user session
`episodic`	Postgres	TTL-based	Cross-run state: conversation history, cached results
`semantic`	pgvector	Indefinite	Long-term facts with vector search

All four tiers share the same MemorySpec API and the same tenant isolation guarantees. The tier= field selects the backend and lifecycle.

Python: declaring memory on an agent

Use the convenience constructors from waxell_sdk.core.specs.memory_spec:

from waxell_sdk import agent
from waxell_sdk.core.specs.memory_spec import (
    scratchpad,
    conversation_memory,
    semantic_memory,
)

@agent(
    name="portfolio_analyst",
    memory={
        # Tier 1: auto-captures every domain tool call during a run
        "scratchpad": scratchpad(ttl="1h"),

        # Tier 2: per-user conversation across runs
        "conversation": conversation_memory(scope="user_id", ttl="30d"),

        # Tier 3: per-client facts the agent has learned
        "client_context": semantic_memory(
            scope=["user_id", "portfolio_id"],
            description="Facts about this advisor's relationship with this client",
        ),
    },
    ...
)
class PortfolioAnalyst:
    ...

Or use MemorySpec directly when you need control:

from waxell_sdk.core.specs.memory_spec import MemorySpec

memory = {
    "scratchpad": MemorySpec(
        tier="working",
        scope=["run_id"],
        auto_capture=True,
        ttl="1h",
        max_size_mb=20,
    ),
    "cached_analysis": MemorySpec(
        tier="episodic",
        scope=["user_id", "portfolio_id"],
        type="dict",
        ttl="24h",
    ),
}

YAML: declaring memory in waxell.yaml

memory:
  scratchpad:
    tier: working
    scope: [run_id]
    auto_capture: true
    ttl: 1h

  conversation:
    tier: episodic
    scope: user_id
    type: list
    ttl: 30d
    max_items: 100

  client_knowledge:
    tier: semantic
    scope: [user_id, portfolio_id]
    searchable: true
    description: Facts about this client relationship

For all available fields and validation rules, see the waxell.yaml memory reference.

How working memory captures tool results

When tier="working" and auto_capture=true, the runtime intercepts every successful domain call and stores the result:

Agent calls market_data.fetch_prices(tickers=["AAPL"], days=90)
The HTTP call returns 15KB of OHLC data
Runtime stores the value in the scratchpad as prices.fetch.1 (turn-1)

The LLM's next system prompt rebuild includes a summary block:

Available scratchpad refs:
- $ref:market_data.fetch_prices.1 — OHLC for [AAPL] (90 days, 15KB)

The LLM sees the summary, not the 15KB blob

When the LLM calls the next tool, it can pass the reference:

calculate_performance(prices="$ref:market_data.fetch_prices.1", ...)

Runtime resolves $ref:market_data.fetch_prices.1 to the full data before dispatch
Next tool gets the real data; the LLM stays compact

The $ref prefix is configurable via reference_syntax on the MemorySpec. Input signals are also seeded into the scratchpad automatically (as input_<name>.1) so the LLM can pass them around the same way.

Tenant isolation

Two layers of isolation, either of which catches a bug in the other:

Physical layer. The Redis backend prefixes every key with tenant:{tenant_id}:. The _tenant_context() context manager extracts tenant_id from the scope key and sets the ContextVar around each Redis operation -- so isolation holds even when the Celery worker has no upstream tenant context set.

Logical layer. The scope key itself always starts with tenant_id, enforced by MemoryScopeResolver regardless of what dimensions the developer declares in scope=[...].

Result: a memory write under tenant A cannot be read by tenant B even if scope keys collide, even if a worker has stale context, even if the dev forgot to include tenant_id in the scope list. tenant_id is always prepended automatically.

Scope dimensions

Multi-dimensional scoping composes on top of tenant_id:

Dimension	Source	Use case
`tenant_id`	Always (automatic)	Outermost boundary
`agent`	Agent name	Per-agent within tenant
`agent_version`	Agent version	Per-version state
`user_id`	Sub-user identity	Per end-user
`user_group`	Sub-user identity	Per user group
`session_id`	Conversation session	Per conversation
`workflow`	Workflow name	Per workflow within agent
`run_id`	Execution run	Per execution (ephemeral)
`channel_id`	Slack/chat	Per chat channel
`thread_ts`	Slack	Per thread

Combine them as needed:

# Per-(advisor, client) cached analysis
MemorySpec(
    tier="episodic",
    scope=["user_id", "portfolio_id"],
    type="dict",
    ttl="24h",
)

When to use which tier

Need	Tier	Why
Pass tool results between LLM turns without burning tokens	`working`	Auto-captures, refs resolve before dispatch
Remember the last 100 messages per user	`episodic` (`conversation_memory()`)	Postgres-backed, TTL-eviction
Cache an expensive calculation per (advisor, client) for 24h	`episodic`	Multi-dimensional scope, TTL
Conversation/plan state within a session that spans multiple workflow runs	`session`	First-class tier with 24h default TTL
Learn facts about a client over time and recall by similarity	`semantic` (`semantic_memory()`)	Vector search, indefinite retention

Typed memory schemas

Memory slots can declare a Pydantic schema for read/write validation. The runtime stamps a schema_version on every stored value so migrations can pick the right deserializer when the schema evolves:

from pydantic import BaseModel

class ClientNote(BaseModel):
    summary: str
    confidence: float
    last_updated: str

memory = {
    "notes": MemorySpec(
        tier="semantic",
        scope=["user_id", "portfolio_id"],
        schema=ClientNote,
        schema_version=2,
    ),
}

On write, the runtime validates against the schema and serializes the model. On read, it rehydrates back to a ClientNote instance. A validation failure on read raises ValueError so the agent can catch and migrate.

Schemas are optional -- omit schema= for pre-3.4 dict-like behavior.

Common gotchas

tenant_id is implicit. Don't include it in your scope=[...] list -- the runtime always prepends it. Listing it explicitly works but is redundant.
Working tier ignores type. Scratchpad entries are always ScratchpadEntry dataclass values; type="dict" / type="list" only applies to episodic/session tiers.
auto_capture is per-spec, not global. If you declare two tier="working" slots, you can opt one in and one out.
$ref resolution is recursive. Pass $ref:foo.1 as a value, a nested dict field, or inside a list -- the runtime walks the structure and resolves all refs.
Local dev uses in-memory backends. InMemoryWorkingMemory mimics Redis for tests; the Redis backend takes over in production via the infra setup.py ready hook.

Reference

waxell.yaml memory field reference -- field-by-field YAML schema
Execution Context -- how ctx.scratchpad is plumbed through agent runs
Workflow Envelope -- run lifecycle that drives scratchpad TTL

The problem it solves​

Three tiers at a glance​

Python: declaring memory on an agent​

YAML: declaring memory in waxell.yaml​

How working memory captures tool results​

Tenant isolation​

Scope dimensions​

When to use which tier​

Typed memory schemas​

Common gotchas​

Reference​