Skip to main content

Identity Policy

The identity policy category governs how agents represent themselves. It enforces two distinct rules:

  1. AI disclosure -- every agent response must include a configurable disclosure text identifying it as AI-generated
  2. Impersonation prevention -- agent outputs must not match patterns that claim human identity or professional roles

Use this policy when agents interact with end-users and transparency is required by your product standards, terms of service, or applicable regulations.

Rules

RuleTypeDefaultDescription
require_ai_disclosurebooleantrueRequire the disclosure text to appear in the agent's output
disclosure_textstring"This response was generated by an AI assistant."Exact text that must appear in the output (substring match, case-insensitive)
disclosure_positionstring"prepend"Hint for placement: "prepend", "append", or "footer". Used in warning metadata only -- the handler does not inject text
prevent_impersonationbooleantrueBlock outputs that match configured impersonation patterns
impersonation_patternsstring[]see defaultsSubstring patterns that indicate the agent is claiming to be human
agent_identity_headerbooleantrueTag governance metadata with an identity header marker for trace inspection

Default Impersonation Patterns

"I am a human"
"I am not an AI"
"I am not a bot"
"I am a real person"

Any string in impersonation_patterns is checked as a case-insensitive substring against the output text.

How It Works

The identity handler runs at all three enforcement phases, with different checks at each:

PhaseWhat It ChecksViolation Action
before_workflowStores identity rules into contextAlways ALLOW (setup only)
mid_executionScans intermediate_outputs for impersonation patternsBLOCK on match
after_workflowChecks output_text for impersonation (BLOCK) and AI disclosure (WARN)

Enforcement Detail

mid_execution: Each time ctx.record_intermediate_output(output=...) is called, the handler scans the accumulated list of intermediate outputs. If any output matches an impersonation pattern, the agent is blocked immediately.

after_workflow: Runs two checks in sequence:

  1. Impersonation check on output_text -- BLOCK if matched (same patterns, highest priority)
  2. Disclosure check on output_text -- WARN if disclosure_text is not found as a substring

Note: disclosure violations produce WARN, not BLOCK. The agent completes but a governance incident is recorded. Impersonation violations always produce BLOCK.

There is no configurable action_on_violation field -- impersonation is always blocked, and disclosure is always warned.

Pattern Matching

Pattern in ConfigOutput TextResult
"I am a human""Hello! I am a human customer rep with 5 years experience."BLOCK -- substring match (case-insensitive)
"I am not an AI""I am not an ai, I promise."BLOCK -- case-insensitive match
"I am a human""I am an AI assistant here to help."ALLOW -- pattern not found
"I am a doctor""As your doctor, I recommend..."BLOCK -- matches custom pattern

Disclosure matching works the same way:

Configured disclosure_textOutputResult
"This response was generated by an AI assistant.""...answer text...\n\nThis response was generated by an AI assistant."ALLOW
"This response was generated by an AI assistant.""...answer text only..."WARN
"AI-generated content""Note: AI-generated content below."ALLOW
Disclosure is Agent-Enforced, Not Auto-Injected

The identity handler checks for disclosure text in the output -- it does not add it automatically. Your agent code (or your LLM prompt) must include the configured disclosure_text in every response. Use ctx.set_output_text(text=...) to tell the handler what text to check.

SDK Integration

Using the Context Manager

import waxell_observe as waxell
from waxell_observe.errors import PolicyViolationError

waxell.init()

try:
async with waxell.WaxellContext(
agent_name="support-agent",
enforce_policy=True,
) as ctx:
# Generate intermediate reasoning
reasoning = await think_about(query)

# Record it -- mid_execution checks this for impersonation
ctx.record_intermediate_output(output=reasoning)

# Generate final response (must include disclosure text)
response = await generate_response(query)
response_with_disclosure = (
response + "\n\nThis response was generated by an AI assistant."
)

# Set output text -- after_workflow checks this
ctx.set_output_text(text=response_with_disclosure)

ctx.set_result({"response": response_with_disclosure})

except PolicyViolationError as e:
print(f"Identity block: {e}")
# e.g. "Impersonation detected in intermediate output: matched pattern 'I am a human'"

What Gets Checked

SDK CallWhat the Handler ReadsPhase
ctx.record_intermediate_output(output=...)context.intermediate_outputsmid_execution
ctx.set_output_text(text=...)context.output_textafter_workflow

If set_output_text is never called, the handler falls back to str(result) from ctx.set_result().

Example Policies

Strict AI Disclosure with Human Impersonation Block

{
"require_ai_disclosure": true,
"disclosure_text": "This response was generated by an AI assistant.",
"disclosure_position": "append",
"prevent_impersonation": true,
"impersonation_patterns": [
"I am a human",
"I am not an AI",
"I am not a bot",
"I am a real person"
],
"agent_identity_header": true
}
{
"require_ai_disclosure": true,
"disclosure_text": "This information is AI-generated and not professional advice.",
"disclosure_position": "footer",
"prevent_impersonation": true,
"impersonation_patterns": [
"I am a doctor",
"I am a lawyer",
"I am a licensed",
"As your physician",
"As your attorney",
"I am a human",
"I am not an AI"
],
"agent_identity_header": true
}

Disclosure Only (No Impersonation Check)

{
"require_ai_disclosure": true,
"disclosure_text": "Generated by AI.",
"disclosure_position": "footer",
"prevent_impersonation": false,
"impersonation_patterns": [],
"agent_identity_header": false
}

Enforcement Flow

before_workflow

└── Store rules into context._identity_rules → ALLOW

mid_execution (runs each time record_intermediate_output() is called)

├── prevent_impersonation disabled? → ALLOW
├── impersonation_patterns empty? → ALLOW
└── Scan each intermediate output for each pattern
├── Pattern found? → BLOCK
└── No match → ALLOW

after_workflow

├── Check output_text for impersonation (prevent_impersonation=true)
│ └── Pattern found? → BLOCK (highest priority)

├── Check output_text for disclosure text (require_ai_disclosure=true)
│ ├── disclosure_text not found in output? → WARN
│ └── Found → continue

├── Tag metadata with agent_identity_header (if enabled)

└── Any warnings? → WARN result
No warnings → ALLOW result

Creating via Dashboard

  1. Navigate to Governance > Policies
  2. Click New Policy
  3. Select category Identity
  4. Configure disclosure_text to match what your prompts produce
  5. Add impersonation patterns appropriate for your use case
  6. Set scope to target specific agents (e.g., support-agent)
  7. Enable

Creating via API

curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://acme.waxell.dev/waxell/v1/policies/ \
-d '{
"name": "AI Disclosure Policy",
"category": "identity",
"rules": {
"require_ai_disclosure": true,
"disclosure_text": "This response was generated by an AI assistant.",
"disclosure_position": "append",
"prevent_impersonation": true,
"impersonation_patterns": [
"I am a human",
"I am not an AI",
"I am not a bot",
"I am a real person"
],
"agent_identity_header": true
},
"scope": {
"agents": ["support-agent"]
},
"enabled": true
}'

Observability

Governance Tab

Identity evaluations appear with:

FieldExample
Phasemid_execution or after_workflow
Actionallow, warn, or block
Categoryidentity
Reason"Identity audit passed"

For impersonation blocks:

FieldExample
Reason"Impersonation detected in final output: matched pattern 'I am a human'"
Metadata{"matched_pattern": "I am a human", "output_preview": "...first 200 chars..."}

For disclosure warnings:

FieldExample
Reason"AI disclosure text not found in output"
Metadata{"expected_disclosure": "This response was generated by an AI assistant.", "disclosure_position": "append"}

Common Gotchas

  1. The handler does not inject disclosure text. It only checks whether the text is present. Your LLM prompts must instruct the model to include the configured disclosure_text verbatim.

  2. Disclosure violations produce WARN, not BLOCK. The agent is not stopped. Use impersonation prevention to block harmful outputs; use disclosure to audit compliance.

  3. disclosure_position is metadata only. Setting it to "append" does not move text around -- it appears in the governance incident metadata as a hint for your team.

  4. set_output_text takes precedence over set_result. If you call ctx.set_output_text(text=...), the handler uses that. If you only call ctx.set_result(...), the handler falls back to str(result). Always call set_output_text explicitly.

  5. Pattern matching is case-insensitive substring. "I am a human" will match "i am a human assistant", "Hello, I am a human!", and any variation of case.

  6. Empty impersonation_patterns disables impersonation checks entirely. Setting prevent_impersonation: true with no patterns produces no blocks.

  7. mid_execution fires on every intermediate output call. If your agent produces many intermediate outputs, the first one that matches a pattern will block immediately. Subsequent outputs are not checked.

Next Steps