Skip to main content

Add User Feedback

Capture user feedback -- thumbs-up/down, star ratings, categorical labels -- and use it to find issues and improve your agents over time.

Prerequisites

  • Python 3.10+
  • waxell-observe installed and configured with an API key
  • A running application that creates observed runs

What You'll Learn

  • Record numeric, boolean, and categorical feedback scores
  • Capture feedback both inline (during a run) and after the fact (by run ID)
  • Build a feedback API endpoint for your application
  • Analyze feedback distributions in the dashboard

Step 1: Understand the Feedback Pattern

The feedback loop follows three stages:

  1. User interacts -- Your agent processes a request and produces a response
  2. User provides feedback -- A thumbs-up, a rating, a label
  3. You analyze -- Filter by low scores to find failing patterns

Waxell Observe supports three score data types:

TypeExampleValues
numericStar rating, relevance scoreAny float (commonly 0.0--1.0 or 1--5)
booleanThumbs up/downTrue or False
categoricalQuality labelAny string (e.g. "helpful", "incorrect", "off-topic")

Step 2: Record Feedback Inline

The simplest approach records feedback inside the same context that produced the response. Use ctx.record_score():

from waxell_observe import WaxellContext

async def chat(query: str, user_id: str) -> dict:
async with WaxellContext(
agent_name="support-bot",
user_id=user_id,
) as ctx:
response = await generate_response(query)
ctx.set_result({"response": response})

# Return the run_id so the frontend can submit feedback later
return {
"response": response,
"run_id": ctx.run_id,
}

When the user later clicks thumbs-up, record the score in a separate context or via the client directly (see Step 5).

Step 3: Record Numeric Feedback

Numeric scores work well for star ratings or confidence values:

# 5-star rating (normalized to 0-1)
ctx.record_score(
name="user_rating",
value=4 / 5, # 0.8
data_type="numeric",
comment="User gave 4 out of 5 stars",
)

# Relevance score from 0 to 1
ctx.record_score(
name="relevance",
value=0.92,
data_type="numeric",
)

Step 4: Record Boolean Feedback

Boolean scores are ideal for thumbs-up/down:

# Thumbs up
ctx.record_score(
name="thumbs_up",
value=True,
data_type="boolean",
)

# Was the answer correct?
ctx.record_score(
name="correct",
value=False,
data_type="boolean",
comment="User reported the answer was wrong",
)
info

Boolean scores are stored as numeric_value=1.0 (True) or numeric_value=0.0 (False) internally, so you can aggregate them as averages to get approval rates.

Step 5: Record Categorical Feedback

Categorical scores capture labeled feedback:

# Quality category
ctx.record_score(
name="response_quality",
value="helpful",
data_type="categorical",
)

# Issue type (when the user reports a problem)
ctx.record_score(
name="issue_type",
value="off-topic",
data_type="categorical",
comment="Response did not address the question",
)

Step 6: Record Feedback After the Fact

Often, feedback arrives after the run has completed. Use the client's record_scores method with the run_id you saved earlier:

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient()

async def submit_feedback(run_id: str, thumbs_up: bool, comment: str = ""):
await client.record_scores(
run_id=run_id,
scores=[
{
"name": "thumbs_up",
"data_type": "boolean",
"numeric_value": 1.0 if thumbs_up else 0.0,
"string_value": str(thumbs_up).lower(),
"comment": comment,
}
],
)
tip

Always return the run_id to your frontend so users can submit feedback against the correct run.

Step 7: Build a Feedback API Endpoint

Here is a complete FastAPI endpoint that accepts feedback from your frontend:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from waxell_observe import WaxellObserveClient

app = FastAPI()
client = WaxellObserveClient()


class FeedbackRequest(BaseModel):
run_id: str
thumbs_up: bool | None = None
rating: float | None = None
category: str | None = None
comment: str = ""


@app.post("/api/feedback")
async def submit_feedback(feedback: FeedbackRequest):
scores = []

if feedback.thumbs_up is not None:
scores.append({
"name": "thumbs_up",
"data_type": "boolean",
"numeric_value": 1.0 if feedback.thumbs_up else 0.0,
"string_value": str(feedback.thumbs_up).lower(),
"comment": feedback.comment,
})

if feedback.rating is not None:
scores.append({
"name": "user_rating",
"data_type": "numeric",
"numeric_value": feedback.rating,
"comment": feedback.comment,
})

if feedback.category is not None:
scores.append({
"name": "response_quality",
"data_type": "categorical",
"string_value": feedback.category,
"comment": feedback.comment,
})

if not scores:
raise HTTPException(status_code=400, detail="No feedback provided")

await client.record_scores(run_id=feedback.run_id, scores=scores)
return {"status": "recorded"}

Your frontend can call this endpoint:

// After the user clicks thumbs-up
await fetch("/api/feedback", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
run_id: "run_abc123",
thumbs_up: true,
comment: "Great answer!",
}),
});

Step 8: Analyze Feedback in the Dashboard

Open your Waxell dashboard and navigate to Observability > Evaluations:

  1. Score distributions -- See the breakdown of thumbs-up vs thumbs-down, average ratings, and category distributions across all runs.
  2. Filter by low scores -- Click into runs with low ratings to inspect the inputs, outputs, and LLM calls that produced poor results.
  3. Track over time -- Monitor how feedback scores trend as you improve prompts and agent logic.
  4. Compare agents -- If you have multiple agents, compare their feedback distributions side by side.
tip

A quick way to find your worst-performing runs: filter by thumbs_up = false and sort by most recent. These are the runs your users flagged as unhelpful.

Next Steps