Prompt Management

Waxell Observe includes a full prompt management system. Version your prompts, assign deployment labels like "production" and "staging", retrieve them at runtime via the SDK, and test variants in the playground -- all with content hashing that links prompts to their LLM call traces.

Core Concepts

Versions

Each prompt has a sequential version history (v1, v2, v3, ...). When you update a prompt's content, a new version is created. Old versions are preserved for auditing and rollback.

Labels

Labels are named pointers to specific versions. Common labels:

Label	Purpose
`production`	The version currently served to users
`staging`	The version being tested before promotion
`latest`	The most recently created version

Labels can be moved between versions at any time. For example, promoting staging to production is a single label update.

Content Types

Type	Content Format	Use Case
`text`	Plain string with `{{variable}}` placeholders	System prompts, simple templates
`chat`	Array of `{role, content}` messages	Multi-message conversation templates

Content Hashing

Every prompt version gets a SHA-256 hash of its content. When the SDK retrieves a prompt and uses it in an LLM call, the prompt_hash field on the LlmCallRecord links back to the exact version used. This gives you full traceability from production call to prompt version.

SDK Usage

Retrieving Prompts

Use the client to fetch a prompt by name, optionally specifying a label or version:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

# Fetch the "production" label (recommended for production code)
prompt = await client.get_prompt("welcome-message", label="production")

# Fetch a specific version
prompt = await client.get_prompt("welcome-message", version=3)

# Fetch the latest version (default when no label or version specified)
prompt = await client.get_prompt("welcome-message")

Synchronous version:

prompt = client.get_prompt_sync(name="welcome-message", label="production")

Compiling Templates

The returned PromptInfo object has a compile() method that substitutes {{variable}} placeholders:

prompt = await client.get_prompt("welcome-message", label="production")

# Text prompt: returns a string
rendered = prompt.compile(user_name="Alice", company="Acme Corp")
# "Hello Alice! Welcome to Acme Corp."

# Chat prompt: returns a list of messages
rendered = prompt.compile(user_name="Alice")
# [{"role": "system", "content": "You are a helpful assistant for Alice."},
#  {"role": "user", "content": "Hello!"}]

Full Example

Use @observe for the run and let init()'s auto-instrumentation capture the LLM call. Fetch the prompt inside the function and use it normally -- the SDK links the LLM call to the prompt version via content hashing.

import waxell_observe as waxell
from waxell_observe import WaxellObserveClient

waxell.init()  # BEFORE importing the LLM SDK -- enables auto-capture

import openai
openai_client = openai.OpenAI()
prompt_client = WaxellObserveClient()

@waxell.observe(agent_name="chat-agent")
async def chat(user_query: str, user_display_name: str, relevant_docs: str) -> str:
    # Fetch the production prompt
    prompt = await prompt_client.get_prompt("chat-system-prompt", label="production")

    # Compile with variables
    system_message = prompt.compile(
        user_name=user_display_name,
        context=relevant_docs,
    )

    # LLM call -- auto-captured by init(), linked to prompt version via content hash
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_query},
        ],
    )
    return response.choices[0].message.content

PromptInfo Object

The get_prompt methods return a PromptInfo dataclass:

Field	Type	Description
`name`	`str`	Prompt name
`version`	`int`	Version number
`prompt_type`	`str`	`"text"` or `"chat"`
`content`	`str \| list`	Raw content (string for text, message list for chat)
`config`	`dict`	Associated configuration (model, temperature, etc.)
`labels`	`list[str]`	Labels pointing to this version

REST API

All prompt management endpoints require session authentication (UI).

Prompts CRUD

Endpoint	Method	Description
`/api/v1/prompts/`	GET	List all prompts with latest version info and labels
`/api/v1/prompts/`	POST	Create a prompt with initial version
`/api/v1/prompts/{id}/`	GET	Prompt detail with all versions and labels
`/api/v1/prompts/{id}/`	PUT	Update prompt metadata (name, description, tags)
`/api/v1/prompts/{id}/`	DELETE	Delete prompt and all versions/labels

Create a Prompt

curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "chat-system-prompt",
    "description": "System prompt for the chat agent",
    "prompt_type": "text",
    "content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}}.",
    "config": {"model": "gpt-4o", "temperature": 0.7},
    "tags": ["chat", "production"],
    "commit_message": "Initial version"
  }'

Versions

Endpoint	Method	Description
`/api/v1/prompts/{id}/versions/`	GET	List all versions
`/api/v1/prompts/{id}/versions/`	POST	Create a new version
`/api/v1/prompts/{id}/versions/{num}/`	GET	Get specific version with full content

Create a new version:

curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/versions/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}} concisely.",
    "config": {"model": "gpt-4o", "temperature": 0.5},
    "commit_message": "Added conciseness instruction, lowered temperature"
  }'

Labels

Endpoint	Method	Description
`/api/v1/prompts/{id}/labels/{label}/`	PUT	Set or move a label to a version
`/api/v1/prompts/{id}/labels/{label}/`	DELETE	Remove a label

Set the "production" label to version 3:

curl -X PUT "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/labels/production/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{"version": 3}'

Promote staging to production (two calls):

# Get the version that "staging" points to
STAGING_VERSION=$(curl -s ".../api/v1/prompts/{id}/" -H "Cookie: sessionid=..." \
  | jq '.labels[] | select(.label=="staging") | .version')

# Move "production" to that version
curl -X PUT ".../api/v1/prompts/{id}/labels/production/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d "{\"version\": $STAGING_VERSION}"

Playground

Test prompts with variable substitution and compare variants side by side.

Execute a single prompt:

POST /api/v1/prompts/playground/

curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Summarize the following in one sentence: {{text}}",
    "config": {"model": "gpt-4o-mini", "temperature": 0.3, "max_tokens": 256},
    "variables": {"text": "Waxell is an observability platform for AI agents..."}
  }'

Response:

{
  "output": "Waxell provides observability and governance for AI agents.",
  "model": "gpt-4o-mini",
  "tokens_in": 42,
  "tokens_out": 12,
  "cost": 0.0001,
  "latency_ms": 340
}

Compare multiple variants:

POST /api/v1/prompts/playground/compare/

curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/compare/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{
    "variants": [
      {
        "content": "Summarize: {{text}}",
        "config": {"model": "gpt-4o-mini", "temperature": 0.3}
      },
      {
        "content": "Give a one-sentence summary of: {{text}}",
        "config": {"model": "gpt-4o-mini", "temperature": 0.7}
      },
      {
        "content": "Summarize: {{text}}",
        "config": {"model": "gpt-4o", "temperature": 0.3}
      }
    ]
  }'

Up to 10 variants can be compared in a single request. Each result includes output, token counts, cost, and latency for direct comparison.

Prompt Metrics

GET /api/v1/prompts/{id}/metrics/

Shows usage metrics per version, linked via content hash to LlmCallRecord:

{
  "prompt_name": "chat-system-prompt",
  "totals": {
    "call_count": 1520,
    "total_tokens": 456000,
    "total_cost": 2.345678
  },
  "versions": [
    {
      "version": 3,
      "content_hash": "a1b2c3...",
      "call_count": 1200,
      "total_tokens": 360000,
      "total_cost": 1.845678
    },
    {
      "version": 2,
      "content_hash": "d4e5f6...",
      "call_count": 320,
      "total_tokens": 96000,
      "total_cost": 0.500000
    }
  ]
}

Label Cascade (Fallback to Latest)

By default, requesting a label that does not exist raises an error. You can opt into fallback behavior where a missing label resolves to the latest version instead:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

# Strict mode (default) -- raises if "staging" label doesn't exist
prompt = await client.get_prompt("welcome-message", label="staging")

# Cascade mode -- falls back to latest if "staging" is missing
prompt = await client.get_prompt(
    "welcome-message",
    label="staging",
    fallback_to_latest=True,
)

Cascade rules:

Triggers only when fallback_to_latest=True, a label is specified, and the label does not exist on this prompt
Pinned-version lookups (version=3) never cascade -- they fail loudly so you notice the version is gone
A warning is logged when the fallback fires: label 'staging' missing for 'welcome-message' -- falling back to latest
The result is cached under the original label key for the standard 30-second TTL

caution

Do not enable fallback_to_latest=True as a global default. It silently weakens label guarantees. Use it only in environments where "any version is better than none" (e.g., local development, demo instances).

Rollback

Labels have an audit trail that records every move (e.g., production moved from v2 to v3). The rollback endpoint lets you revert a label to the version it was on before a specific move, with a reason for the audit log:

# Rollback the most recent label move
curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/label-history/{event_id}/rollback/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{"reason": "incident-12: prod regression"}'

Response:

{
  "label": "production",
  "version": 2,
  "rolled_back_from_event_id": "...",
  "prior_version": 3,
  "created": false
}

The rollback creates a new history entry so the full chain of moves remains auditable. Rollback is refused in these cases:

Status	Reason
400	The event is a CREATED event (no prior version to revert to)
403	The label is protected -- use the normal label PUT flow which routes through approval
410	The target version was deleted

Prompt Discovery

Discovery surfaces recurring unregistered prompts -- LLM calls that aren't linked to any prompt in the registry. Waxell clusters them by a SHA-256 fingerprint of the prompt content.

Viewing Discovered Prompts

# List discovered prompt clusters
curl "https://acme.waxell.dev/api/v1/prompts/discover/?days=7" \
  -H "Cookie: sessionid=..."

Each cluster shows the fingerprint, a content preview, the agents that used it, and a call count.

Registering Discovered Prompts

Register clusters one at a time or in batch. Batch registration auto-names prompts from the agent name and handles collisions by appending _2, _3, etc.:

# Register all discovered clusters in one call
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/register-batch/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {"fingerprint": "a1b2c3d4e5f6a7b8"},
      {"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name"}
    ],
    "default_label": "production",
    "default_tags": ["discovered"]
  }'

Response:

{
  "registered": [
    {"fingerprint": "a1b2c3d4e5f6a7b8", "name": "support-bot_system", "prompt_id": "...", "version": 1},
    {"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name", "prompt_id": "...", "version": 1}
  ],
  "skipped": []
}

Ignoring Clusters

Dismiss noise (test data, one-off scripts) so it stops appearing in the discover view:

# Ignore a cluster for 30 days
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/ignored/" \
  -H "Cookie: sessionid=..." \
  -H "Content-Type: application/json" \
  -d '{"fingerprint": "a1b2c3d4e5f6a7b8", "days": 30, "note": "PII test data"}'

Ignored clusters automatically resurface when the ignore window expires. You can also remove an ignore early with DELETE /api/v1/prompts/discover/ignored/{fingerprint}/.

CLI

The wax CLI provides the same discover and register workflow:

# View unregistered prompt clusters
wax prompt discover

# Register all clusters (with dry-run first)
wax prompt register-all-clusters --dry-run
wax prompt register-all-clusters --label production --tags discovered

Workflow: Prompt Lifecycle

Create a prompt with an initial version
Test in the playground with different variables and configurations
Label the tested version as staging
Deploy by pointing production to the staging version
Monitor via prompt metrics to compare version performance
Rollback if a regression is detected -- revert the label with an audit trail
Discover unregistered prompts in production and bring them into the registry
Iterate by creating new versions and repeating the cycle

Next Steps

LLM Call Tracking -- See how prompts map to production LLM calls via content hashing
Scoring -- Attach quality scores to runs that use versioned prompts
Evaluators -- Attach automated quality checks to prompt versions
Datasets & Experiments -- Compare prompt versions systematically

Core Concepts​

Versions​

Labels​

Content Types​

Content Hashing​

SDK Usage​

Retrieving Prompts​

Compiling Templates​

Full Example​

PromptInfo Object​

REST API​

Prompts CRUD​

Create a Prompt​

Versions​

Labels​

Playground​

Prompt Metrics​

Label Cascade (Fallback to Latest)​

Rollback​

Prompt Discovery​

Viewing Discovered Prompts​

Registering Discovered Prompts​

Ignoring Clusters​

CLI​

Workflow: Prompt Lifecycle​

Next Steps​

Core Concepts

Versions

Labels

Content Types

Content Hashing

SDK Usage

Retrieving Prompts

Compiling Templates

Full Example

PromptInfo Object

REST API

Prompts CRUD

Create a Prompt

Versions

Labels

Playground

Prompt Metrics

Label Cascade (Fallback to Latest)

Rollback

Prompt Discovery

Viewing Discovered Prompts

Registering Discovered Prompts

Ignoring Clusters

CLI

Workflow: Prompt Lifecycle

Next Steps