Skip to main content

Prompt Management

Waxell Observe includes a full prompt management system. Version your prompts, assign deployment labels like "production" and "staging", retrieve them at runtime via the SDK, and test variants in the playground -- all with content hashing that links prompts to their LLM call traces.

Core Concepts

Versions

Each prompt has a sequential version history (v1, v2, v3, ...). When you update a prompt's content, a new version is created. Old versions are preserved for auditing and rollback.

Labels

Labels are named pointers to specific versions. Common labels:

LabelPurpose
productionThe version currently served to users
stagingThe version being tested before promotion
latestThe most recently created version

Labels can be moved between versions at any time. For example, promoting staging to production is a single label update.

Content Types

TypeContent FormatUse Case
textPlain string with {{variable}} placeholdersSystem prompts, simple templates
chatArray of {role, content} messagesMulti-message conversation templates

Content Hashing

Every prompt version gets a SHA-256 hash of its content. When the SDK retrieves a prompt and uses it in an LLM call, the prompt_hash field on the LlmCallRecord links back to the exact version used. This gives you full traceability from production call to prompt version.

SDK Usage

Retrieving Prompts

Use the client to fetch a prompt by name, optionally specifying a label or version:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

# Fetch the "production" label (recommended for production code)
prompt = await client.get_prompt("welcome-message", label="production")

# Fetch a specific version
prompt = await client.get_prompt("welcome-message", version=3)

# Fetch the latest version (default when no label or version specified)
prompt = await client.get_prompt("welcome-message")

Synchronous version:

prompt = client.get_prompt_sync(name="welcome-message", label="production")

Compiling Templates

The returned PromptInfo object has a compile() method that substitutes {{variable}} placeholders:

prompt = await client.get_prompt("welcome-message", label="production")

# Text prompt: returns a string
rendered = prompt.compile(user_name="Alice", company="Acme Corp")
# "Hello Alice! Welcome to Acme Corp."

# Chat prompt: returns a list of messages
rendered = prompt.compile(user_name="Alice")
# [{"role": "system", "content": "You are a helpful assistant for Alice."},
# {"role": "user", "content": "Hello!"}]

Full Example

from waxell_observe import WaxellContext, WaxellObserveClient

client = WaxellObserveClient()

async with WaxellContext(agent_name="chat-agent", client=client) as ctx:
# Fetch the production prompt
prompt = await client.get_prompt("chat-system-prompt", label="production")

# Compile with variables
system_message = prompt.compile(
user_name=user.display_name,
context=relevant_docs,
)

# Use in LLM call
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_query},
],
)

ctx.record_llm_call(
model="gpt-4o",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)

PromptInfo Object

The get_prompt methods return a PromptInfo dataclass:

FieldTypeDescription
namestrPrompt name
versionintVersion number
prompt_typestr"text" or "chat"
contentstr | listRaw content (string for text, message list for chat)
configdictAssociated configuration (model, temperature, etc.)
labelslist[str]Labels pointing to this version

REST API

All prompt management endpoints require session authentication (UI).

Prompts CRUD

EndpointMethodDescription
/api/v1/prompts/GETList all prompts with latest version info and labels
/api/v1/prompts/POSTCreate a prompt with initial version
/api/v1/prompts/{id}/GETPrompt detail with all versions and labels
/api/v1/prompts/{id}/PUTUpdate prompt metadata (name, description, tags)
/api/v1/prompts/{id}/DELETEDelete prompt and all versions/labels

Create a Prompt

curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"name": "chat-system-prompt",
"description": "System prompt for the chat agent",
"prompt_type": "text",
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}}.",
"config": {"model": "gpt-4o", "temperature": 0.7},
"tags": ["chat", "production"],
"commit_message": "Initial version"
}'

Versions

EndpointMethodDescription
/api/v1/prompts/{id}/versions/GETList all versions
/api/v1/prompts/{id}/versions/POSTCreate a new version
/api/v1/prompts/{id}/versions/{num}/GETGet specific version with full content

Create a new version:

curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/versions/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}} concisely.",
"config": {"model": "gpt-4o", "temperature": 0.5},
"commit_message": "Added conciseness instruction, lowered temperature"
}'

Labels

EndpointMethodDescription
/api/v1/prompts/{id}/labels/{label}/PUTSet or move a label to a version
/api/v1/prompts/{id}/labels/{label}/DELETERemove a label

Set the "production" label to version 3:

curl -X PUT "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"version": 3}'

Promote staging to production (two calls):

# Get the version that "staging" points to
STAGING_VERSION=$(curl -s ".../api/v1/prompts/{id}/" -H "Cookie: sessionid=..." \
| jq '.labels[] | select(.label=="staging") | .version')

# Move "production" to that version
curl -X PUT ".../api/v1/prompts/{id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d "{\"version\": $STAGING_VERSION}"

Playground

Test prompts with variable substitution and compare variants side by side.

Execute a single prompt:

POST /api/v1/prompts/playground/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "Summarize the following in one sentence: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3, "max_tokens": 256},
"variables": {"text": "Waxell is an observability platform for AI agents..."}
}'

Response:

{
"output": "Waxell provides observability and governance for AI agents.",
"model": "gpt-4o-mini",
"tokens_in": 42,
"tokens_out": 12,
"cost": 0.0001,
"latency_ms": 340
}

Compare multiple variants:

POST /api/v1/prompts/playground/compare/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/compare/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"variants": [
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3}
},
{
"content": "Give a one-sentence summary of: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.7}
},
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o", "temperature": 0.3}
}
]
}'

Up to 10 variants can be compared in a single request. Each result includes output, token counts, cost, and latency for direct comparison.

Prompt Metrics

GET /api/v1/prompts/{id}/metrics/

Shows usage metrics per version, linked via content hash to LlmCallRecord:

{
"prompt_name": "chat-system-prompt",
"totals": {
"call_count": 1520,
"total_tokens": 456000,
"total_cost": 2.345678
},
"versions": [
{
"version": 3,
"content_hash": "a1b2c3...",
"call_count": 1200,
"total_tokens": 360000,
"total_cost": 1.845678
},
{
"version": 2,
"content_hash": "d4e5f6...",
"call_count": 320,
"total_tokens": 96000,
"total_cost": 0.500000
}
]
}

Workflow: Prompt Lifecycle

  1. Create a prompt with an initial version
  2. Test in the playground with different variables and configurations
  3. Label the tested version as staging
  4. Deploy by pointing production to the staging version
  5. Monitor via prompt metrics to compare version performance
  6. Iterate by creating new versions and repeating the cycle

Next Steps

  • LLM Call Tracking -- See how prompts map to production LLM calls via content hashing
  • Scoring -- Attach quality scores to runs that use versioned prompts