Skip to main content

Working Memory

The Waxell runtime ships with a three-tier memory primitive that solves the context-window explosion problem in agentic loops. Tool results auto-capture into a fast scratchpad, the LLM sees compact summaries with reference handles, and references resolve to full data on the runtime side -- so the LLM never carries the bytes.

The problem it solves

A typical agentic loop has the LLM call a tool, see the full result, then decide the next action. After 10 turns of fetching prices, dividends, research notes, and benchmarks, the conversation history is 60KB+ of JSON the LLM has to re-read every turn. You either:

  • Run out of context window
  • Burn tokens re-reading data you already have
  • Have the LLM hallucinate data it already fetched but can't see anymore
  • Truncate history and lose information

Working memory inverts this. Tool results go to a scratchpad keyed by tool_name.turn. The LLM sees a one-line summary per result and references full data by name ($ref:prices.fetch.1) when it needs to pass it to the next tool. The runtime resolves the reference before dispatching, so the actual data flows tool-to-tool without ever entering the LLM's conversation.

Measured impact (finance portfolio_analyst, 6-phase pipeline):

MetricWithout scratchpadWith scratchpad
Tokens per turn6K–28K (growing)~7K (flat)
Turns to completeRuns out of context16 turns
Tool result re-readsEvery turnZero

Three tiers at a glance

TierBackendLifecyclePurpose
workingRedisPer-run, cleared on completionScratchpad for in-flight tool results, auto-captured
sessionPostgres24h defaultConversation/plan state within a user session
episodicPostgresTTL-basedCross-run state: conversation history, cached results
semanticpgvectorIndefiniteLong-term facts with vector search

All four tiers share the same MemorySpec API and the same tenant isolation guarantees. The tier= field selects the backend and lifecycle.

Python: declaring memory on an agent

Use the convenience constructors from waxell_sdk.core.specs.memory_spec:

from waxell_sdk import agent
from waxell_sdk.core.specs.memory_spec import (
scratchpad,
conversation_memory,
semantic_memory,
)

@agent(
name="portfolio_analyst",
memory={
# Tier 1: auto-captures every domain tool call during a run
"scratchpad": scratchpad(ttl="1h"),

# Tier 2: per-user conversation across runs
"conversation": conversation_memory(scope="user_id", ttl="30d"),

# Tier 3: per-client facts the agent has learned
"client_context": semantic_memory(
scope=["user_id", "portfolio_id"],
description="Facts about this advisor's relationship with this client",
),
},
...
)
class PortfolioAnalyst:
...

Or use MemorySpec directly when you need control:

from waxell_sdk.core.specs.memory_spec import MemorySpec

memory = {
"scratchpad": MemorySpec(
tier="working",
scope=["run_id"],
auto_capture=True,
ttl="1h",
max_size_mb=20,
),
"cached_analysis": MemorySpec(
tier="episodic",
scope=["user_id", "portfolio_id"],
type="dict",
ttl="24h",
),
}

YAML: declaring memory in waxell.yaml

memory:
scratchpad:
tier: working
scope: [run_id]
auto_capture: true
ttl: 1h

conversation:
tier: episodic
scope: user_id
type: list
ttl: 30d
max_items: 100

client_knowledge:
tier: semantic
scope: [user_id, portfolio_id]
searchable: true
description: Facts about this client relationship

For all available fields and validation rules, see the waxell.yaml memory reference.

How working memory captures tool results

When tier="working" and auto_capture=true, the runtime intercepts every successful domain call and stores the result:

  1. Agent calls market_data.fetch_prices(tickers=["AAPL"], days=90)
  2. The HTTP call returns 15KB of OHLC data
  3. Runtime stores the value in the scratchpad as prices.fetch.1 (turn-1)
  4. The LLM's next system prompt rebuild includes a summary block:
    Available scratchpad refs:
    - $ref:market_data.fetch_prices.1 — OHLC for [AAPL] (90 days, 15KB)
  5. The LLM sees the summary, not the 15KB blob
  6. When the LLM calls the next tool, it can pass the reference:
    calculate_performance(prices="$ref:market_data.fetch_prices.1", ...)
  7. Runtime resolves $ref:market_data.fetch_prices.1 to the full data before dispatch
  8. Next tool gets the real data; the LLM stays compact

The $ref prefix is configurable via reference_syntax on the MemorySpec. Input signals are also seeded into the scratchpad automatically (as input_<name>.1) so the LLM can pass them around the same way.

Tenant isolation

Two layers of isolation, either of which catches a bug in the other:

Physical layer. The Redis backend prefixes every key with tenant:{tenant_id}:. The _tenant_context() context manager extracts tenant_id from the scope key and sets the ContextVar around each Redis operation -- so isolation holds even when the Celery worker has no upstream tenant context set.

Logical layer. The scope key itself always starts with tenant_id, enforced by MemoryScopeResolver regardless of what dimensions the developer declares in scope=[...].

Result: a memory write under tenant A cannot be read by tenant B even if scope keys collide, even if a worker has stale context, even if the dev forgot to include tenant_id in the scope list. tenant_id is always prepended automatically.

Scope dimensions

Multi-dimensional scoping composes on top of tenant_id:

DimensionSourceUse case
tenant_idAlways (automatic)Outermost boundary
agentAgent namePer-agent within tenant
agent_versionAgent versionPer-version state
user_idSub-user identityPer end-user
user_groupSub-user identityPer user group
session_idConversation sessionPer conversation
workflowWorkflow namePer workflow within agent
run_idExecution runPer execution (ephemeral)
channel_idSlack/chatPer chat channel
thread_tsSlackPer thread

Combine them as needed:

# Per-(advisor, client) cached analysis
MemorySpec(
tier="episodic",
scope=["user_id", "portfolio_id"],
type="dict",
ttl="24h",
)

When to use which tier

NeedTierWhy
Pass tool results between LLM turns without burning tokensworkingAuto-captures, refs resolve before dispatch
Remember the last 100 messages per userepisodic (conversation_memory())Postgres-backed, TTL-eviction
Cache an expensive calculation per (advisor, client) for 24hepisodicMulti-dimensional scope, TTL
Conversation/plan state within a session that spans multiple workflow runssessionFirst-class tier with 24h default TTL
Learn facts about a client over time and recall by similaritysemantic (semantic_memory())Vector search, indefinite retention

Typed memory schemas

Memory slots can declare a Pydantic schema for read/write validation. The runtime stamps a schema_version on every stored value so migrations can pick the right deserializer when the schema evolves:

from pydantic import BaseModel

class ClientNote(BaseModel):
summary: str
confidence: float
last_updated: str

memory = {
"notes": MemorySpec(
tier="semantic",
scope=["user_id", "portfolio_id"],
schema=ClientNote,
schema_version=2,
),
}

On write, the runtime validates against the schema and serializes the model. On read, it rehydrates back to a ClientNote instance. A validation failure on read raises ValueError so the agent can catch and migrate.

Schemas are optional -- omit schema= for pre-3.4 dict-like behavior.

Common gotchas

  • tenant_id is implicit. Don't include it in your scope=[...] list -- the runtime always prepends it. Listing it explicitly works but is redundant.
  • Working tier ignores type. Scratchpad entries are always ScratchpadEntry dataclass values; type="dict" / type="list" only applies to episodic/session tiers.
  • auto_capture is per-spec, not global. If you declare two tier="working" slots, you can opt one in and one out.
  • $ref resolution is recursive. Pass $ref:foo.1 as a value, a nested dict field, or inside a list -- the runtime walks the structure and resolves all refs.
  • Local dev uses in-memory backends. InMemoryWorkingMemory mimics Redis for tests; the Redis backend takes over in production via the infra setup.py ready hook.

Reference