Letta (MemGPT)

A Letta-style stateful conversation pipeline with long-term memory management across 3 agents. The runner loads agent state, reviews conversation history, searches archival memory, and sends messages. The evaluator synthesizes the response, assesses quality against archival sources, and updates agent memory.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Runner with Letta API operations as `@tool` and archival `@retrieval`

Each Letta API call is recorded as a tool call, with archival memory search as retrieval.

@waxell.tool(tool_type="agent_framework")
def letta_get_agent(agent_id: str) -> dict:
    """Load a Letta agent by ID."""
    return {"agent_id": agent_id, "name": MOCK_AGENT_NAME,
            "memory_state": MOCK_MEMORY_STATE}

@waxell.tool(tool_type="agent_framework")
def letta_send_message(agent_id: str, message: str) -> dict:
    """Send a message to a Letta agent and get response."""
    response_messages = MOCK_SEND_RESPONSE["messages"]
    return {"response_messages_count": len(response_messages),
            "function_calls": len([m for m in response_messages
                                   if m["message_type"] == "function_call"])}

@waxell.retrieval(source="letta_archival")
def search_archival_memory(query: str, agent_id: str) -> list[dict]:
    """Search a Letta agent's archival memory for relevant entries."""
    return [{"text": entry["text"], "score": entry["score"],
             "source": "archival_memory"} for entry in MOCK_ARCHIVAL_RESULTS]

Evaluator with `@reasoning`, memory update, and scores

The evaluator synthesizes the response, assesses quality, and updates archival memory.

@waxell.reasoning_dec(step="response_quality_assessment")
async def evaluate_response_quality(answer: str, archival_results: list, history_count: int) -> dict:
    coverage = len([r for r in archival_results
                    if any(word in answer.lower() for word in r["text"].lower().split()[:3])])
    return {
        "thought": f"Evaluating against {len(archival_results)} archival sources.",
        "evidence": [f"Archival: {r['text'][:60]}... (score: {r['score']})"
                     for r in archival_results],
        "conclusion": "Response well-grounded in archival memory"
                      if coverage > 0 else "Response may lack archival grounding",
    }

# Update agent memory
letta_archival_memory_insert(agent_id=agent_id, text=f"User asked about: {query[:100]}")

waxell.score("response_quality", 0.88, comment="against archival memory")
waxell.score("memory_coherence", True, data_type="boolean", comment="consistent with stored state")

What this demonstrates

@waxell.observe -- parent-child agent hierarchy with automatic lineage
@waxell.step_dec -- pipeline initialization recorded as step
@waxell.tool -- Letta API operations with tool_type="agent_framework"
@waxell.retrieval -- archival memory search with source="letta_archival"
@waxell.decision -- memory strategy selection (archival, recall, or hybrid)
waxell.decide() -- manual agent routing decision
@waxell.reasoning_dec -- response quality assessment against archival sources
waxell.score() -- response quality and memory coherence scores
Auto-instrumented LLM calls -- synthesis call captured automatically
Letta pattern -- stateful agent with archival memory, recall buffer, and function calls

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.letta_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.letta_agent

# Custom query
python -m app.demos.letta_agent --query "Tell me about deployment patterns"

Source

dev/waxell-dev/app/demos/letta_agent.py

Architecture​

Key Code​

Runner with Letta API operations as @tool and archival @retrieval​

Evaluator with @reasoning, memory update, and scores​

What this demonstrates​

Run it​

Source​