Skip to main content

Prompt Management

Waxell Observe includes a full prompt management system. Version your prompts, assign deployment labels like "production" and "staging", retrieve them at runtime via the SDK, and test variants in the playground -- all with content hashing that links prompts to their LLM call traces.

Core Concepts

Versions

Each prompt has a sequential version history (v1, v2, v3, ...). When you update a prompt's content, a new version is created. Old versions are preserved for auditing and rollback.

Labels

Labels are named pointers to specific versions. Common labels:

LabelPurpose
productionThe version currently served to users
stagingThe version being tested before promotion
latestThe most recently created version

Labels can be moved between versions at any time. For example, promoting staging to production is a single label update.

Content Types

TypeContent FormatUse Case
textPlain string with {{variable}} placeholdersSystem prompts, simple templates
chatArray of {role, content} messagesMulti-message conversation templates

Content Hashing

Every prompt version gets a SHA-256 hash of its content. When the SDK retrieves a prompt and uses it in an LLM call, the prompt_hash field on the LlmCallRecord links back to the exact version used. This gives you full traceability from production call to prompt version.

SDK Usage

Retrieving Prompts

Use the client to fetch a prompt by name, optionally specifying a label or version:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

# Fetch the "production" label (recommended for production code)
prompt = await client.get_prompt("welcome-message", label="production")

# Fetch a specific version
prompt = await client.get_prompt("welcome-message", version=3)

# Fetch the latest version (default when no label or version specified)
prompt = await client.get_prompt("welcome-message")

Synchronous version:

prompt = client.get_prompt_sync(name="welcome-message", label="production")

Compiling Templates

The returned PromptInfo object has a compile() method that substitutes {{variable}} placeholders:

prompt = await client.get_prompt("welcome-message", label="production")

# Text prompt: returns a string
rendered = prompt.compile(user_name="Alice", company="Acme Corp")
# "Hello Alice! Welcome to Acme Corp."

# Chat prompt: returns a list of messages
rendered = prompt.compile(user_name="Alice")
# [{"role": "system", "content": "You are a helpful assistant for Alice."},
# {"role": "user", "content": "Hello!"}]

Full Example

Use @observe for the run and let init()'s auto-instrumentation capture the LLM call. Fetch the prompt inside the function and use it normally -- the SDK links the LLM call to the prompt version via content hashing.

import waxell_observe as waxell
from waxell_observe import WaxellObserveClient

waxell.init() # BEFORE importing the LLM SDK -- enables auto-capture

import openai
openai_client = openai.OpenAI()
prompt_client = WaxellObserveClient()

@waxell.observe(agent_name="chat-agent")
async def chat(user_query: str, user_display_name: str, relevant_docs: str) -> str:
# Fetch the production prompt
prompt = await prompt_client.get_prompt("chat-system-prompt", label="production")

# Compile with variables
system_message = prompt.compile(
user_name=user_display_name,
context=relevant_docs,
)

# LLM call -- auto-captured by init(), linked to prompt version via content hash
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_query},
],
)
return response.choices[0].message.content

PromptInfo Object

The get_prompt methods return a PromptInfo dataclass:

FieldTypeDescription
namestrPrompt name
versionintVersion number
prompt_typestr"text" or "chat"
contentstr | listRaw content (string for text, message list for chat)
configdictAssociated configuration (model, temperature, etc.)
labelslist[str]Labels pointing to this version

REST API

All prompt management endpoints require session authentication (UI).

Prompts CRUD

EndpointMethodDescription
/api/v1/prompts/GETList all prompts with latest version info and labels
/api/v1/prompts/POSTCreate a prompt with initial version
/api/v1/prompts/{id}/GETPrompt detail with all versions and labels
/api/v1/prompts/{id}/PUTUpdate prompt metadata (name, description, tags)
/api/v1/prompts/{id}/DELETEDelete prompt and all versions/labels

Create a Prompt

curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"name": "chat-system-prompt",
"description": "System prompt for the chat agent",
"prompt_type": "text",
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}}.",
"config": {"model": "gpt-4o", "temperature": 0.7},
"tags": ["chat", "production"],
"commit_message": "Initial version"
}'

Versions

EndpointMethodDescription
/api/v1/prompts/{id}/versions/GETList all versions
/api/v1/prompts/{id}/versions/POSTCreate a new version
/api/v1/prompts/{id}/versions/{num}/GETGet specific version with full content

Create a new version:

curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/versions/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}} concisely.",
"config": {"model": "gpt-4o", "temperature": 0.5},
"commit_message": "Added conciseness instruction, lowered temperature"
}'

Labels

EndpointMethodDescription
/api/v1/prompts/{id}/labels/{label}/PUTSet or move a label to a version
/api/v1/prompts/{id}/labels/{label}/DELETERemove a label

Set the "production" label to version 3:

curl -X PUT "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"version": 3}'

Promote staging to production (two calls):

# Get the version that "staging" points to
STAGING_VERSION=$(curl -s ".../api/v1/prompts/{id}/" -H "Cookie: sessionid=..." \
| jq '.labels[] | select(.label=="staging") | .version')

# Move "production" to that version
curl -X PUT ".../api/v1/prompts/{id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d "{\"version\": $STAGING_VERSION}"

Playground

Test prompts with variable substitution and compare variants side by side.

Execute a single prompt:

POST /api/v1/prompts/playground/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "Summarize the following in one sentence: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3, "max_tokens": 256},
"variables": {"text": "Waxell is an observability platform for AI agents..."}
}'

Response:

{
"output": "Waxell provides observability and governance for AI agents.",
"model": "gpt-4o-mini",
"tokens_in": 42,
"tokens_out": 12,
"cost": 0.0001,
"latency_ms": 340
}

Compare multiple variants:

POST /api/v1/prompts/playground/compare/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/compare/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"variants": [
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3}
},
{
"content": "Give a one-sentence summary of: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.7}
},
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o", "temperature": 0.3}
}
]
}'

Up to 10 variants can be compared in a single request. Each result includes output, token counts, cost, and latency for direct comparison.

Prompt Metrics

GET /api/v1/prompts/{id}/metrics/

Shows usage metrics per version, linked via content hash to LlmCallRecord:

{
"prompt_name": "chat-system-prompt",
"totals": {
"call_count": 1520,
"total_tokens": 456000,
"total_cost": 2.345678
},
"versions": [
{
"version": 3,
"content_hash": "a1b2c3...",
"call_count": 1200,
"total_tokens": 360000,
"total_cost": 1.845678
},
{
"version": 2,
"content_hash": "d4e5f6...",
"call_count": 320,
"total_tokens": 96000,
"total_cost": 0.500000
}
]
}

Label Cascade (Fallback to Latest)

By default, requesting a label that does not exist raises an error. You can opt into fallback behavior where a missing label resolves to the latest version instead:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

# Strict mode (default) -- raises if "staging" label doesn't exist
prompt = await client.get_prompt("welcome-message", label="staging")

# Cascade mode -- falls back to latest if "staging" is missing
prompt = await client.get_prompt(
"welcome-message",
label="staging",
fallback_to_latest=True,
)

Cascade rules:

  • Triggers only when fallback_to_latest=True, a label is specified, and the label does not exist on this prompt
  • Pinned-version lookups (version=3) never cascade -- they fail loudly so you notice the version is gone
  • A warning is logged when the fallback fires: label 'staging' missing for 'welcome-message' -- falling back to latest
  • The result is cached under the original label key for the standard 30-second TTL
caution

Do not enable fallback_to_latest=True as a global default. It silently weakens label guarantees. Use it only in environments where "any version is better than none" (e.g., local development, demo instances).

Rollback

Labels have an audit trail that records every move (e.g., production moved from v2 to v3). The rollback endpoint lets you revert a label to the version it was on before a specific move, with a reason for the audit log:

# Rollback the most recent label move
curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/label-history/{event_id}/rollback/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"reason": "incident-12: prod regression"}'

Response:

{
"label": "production",
"version": 2,
"rolled_back_from_event_id": "...",
"prior_version": 3,
"created": false
}

The rollback creates a new history entry so the full chain of moves remains auditable. Rollback is refused in these cases:

StatusReason
400The event is a CREATED event (no prior version to revert to)
403The label is protected -- use the normal label PUT flow which routes through approval
410The target version was deleted

Prompt Discovery

Discovery surfaces recurring unregistered prompts -- LLM calls that aren't linked to any prompt in the registry. Waxell clusters them by a SHA-256 fingerprint of the prompt content.

Viewing Discovered Prompts

# List discovered prompt clusters
curl "https://acme.waxell.dev/api/v1/prompts/discover/?days=7" \
-H "Cookie: sessionid=..."

Each cluster shows the fingerprint, a content preview, the agents that used it, and a call count.

Registering Discovered Prompts

Register clusters one at a time or in batch. Batch registration auto-names prompts from the agent name and handles collisions by appending _2, _3, etc.:

# Register all discovered clusters in one call
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/register-batch/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"items": [
{"fingerprint": "a1b2c3d4e5f6a7b8"},
{"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name"}
],
"default_label": "production",
"default_tags": ["discovered"]
}'

Response:

{
"registered": [
{"fingerprint": "a1b2c3d4e5f6a7b8", "name": "support-bot_system", "prompt_id": "...", "version": 1},
{"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name", "prompt_id": "...", "version": 1}
],
"skipped": []
}

Ignoring Clusters

Dismiss noise (test data, one-off scripts) so it stops appearing in the discover view:

# Ignore a cluster for 30 days
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/ignored/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"fingerprint": "a1b2c3d4e5f6a7b8", "days": 30, "note": "PII test data"}'

Ignored clusters automatically resurface when the ignore window expires. You can also remove an ignore early with DELETE /api/v1/prompts/discover/ignored/{fingerprint}/.

CLI

The wax CLI provides the same discover and register workflow:

# View unregistered prompt clusters
wax prompt discover

# Register all clusters (with dry-run first)
wax prompt register-all-clusters --dry-run
wax prompt register-all-clusters --label production --tags discovered

Workflow: Prompt Lifecycle

  1. Create a prompt with an initial version
  2. Test in the playground with different variables and configurations
  3. Label the tested version as staging
  4. Deploy by pointing production to the staging version
  5. Monitor via prompt metrics to compare version performance
  6. Rollback if a regression is detected -- revert the label with an audit trail
  7. Discover unregistered prompts in production and bring them into the registry
  8. Iterate by creating new versions and repeating the cycle

Next Steps

  • LLM Call Tracking -- See how prompts map to production LLM calls via content hashing
  • Scoring -- Attach quality scores to runs that use versioned prompts
  • Evaluators -- Attach automated quality checks to prompt versions
  • Datasets & Experiments -- Compare prompt versions systematically