Test Personas¶

Automated chatbot testing with predefined personas. Define a persona once, run it as many times as you want, get a pass/fail report.

Two modes¶

	Dry run (free)	Live test (costs money)
Where	Claude Code + MCP tools	Portal API
Who plays the chatbot	Claude Code's own model	Agent's configured LLM
Cost	Free	One LLM call per turn
Best for	Iterating on flow design	Final validation before publish
Speed	Interactive (you see each turn)	Automatic (all turns at once)

Both modes use the same persona definition and produce the same report format. Start with dry runs, then do a live test when the flow looks good.

Defining a persona¶

A test persona has:

Name: who they are (e.g. "Maria Santos")
Description: brief background and what they want
Opening message: the first thing they say
Follow-up messages: subsequent messages, in order
Conversation name (optional): which conversation to start with
Expected conversations (optional): the conversations the persona should visit, for pass/fail validation

In Claude Code¶

> Use h-chat-define-persona for agent [agent-id].
> Name: Maria Santos
> Description: Freelance graphic designer in Austin, 4 years solo,
>   wants to double her revenue but can't scale without help.
> Opening message: "Hey, I'm Maria Santos from Austin."
> Follow-up messages:
>   - "I'm a freelance graphic designer. 4 years in, still solo."
>   - "My goal is to double my revenue. Challenge: can't scale
>     without help but hiring feels risky."
> Expected conversations: ["onboarding"]

Claude Code calls h-chat-define-persona and saves the persona in the agent's system memory under test-personas:maria-santos.

List existing personas¶

> List test personas for agent [agent-id].

Uses h-chat-list-personas.

Running a dry-run test (free)¶

In Claude Code:

> Run the Maria Santos persona test for agent [agent-id].

Claude Code will:

Call h-chat-run-persona with step=0 → starts the chat, sends Maria's opening message, gets the compiled prompt back.
Generate a chatbot response based on the prompt (Claude Code's own model plays the chatbot).
Call h-chat-process with the response.
Call h-chat-run-persona with step=1 → sends the first follow-up.
Repeat until all follow-ups are done.
The final call returns a test report.

The test report¶

{
  "chatId": "chats:20260419-abc12345-onboarding",
  "status": "COMPLETE",
  "pass": true,
  "issues": [],
  "visitedConversations": ["onboarding"],
  "totalTurns": 8,
  "routeHistory": [ ... ],
  "goalStack": [ ... ]
}

pass: true if all expectedConversations were visited.

issues: list of problems found (missing conversations, off-track turns, dead ends).

visitedConversations: which conversations the persona actually went through.

routeHistory: full chronological log of stage enters/exits and edge traversals.

goalStack: final state of the goal stack (completed, active, or abandoned goals).

Running a live test (costs money)¶

Cost¶

Live tests use the agent's configured LLM provider. Hadron does not bill per turn — you pay the provider directly through whatever API key the agent's AI configuration uses.

A persona run costs (turns × cost-per-turn). Cost-per-turn varies by provider, model, prompt size, and message-history length. As a rough order of magnitude for a typical chatbot turn (1–3K input tokens, 200–500 output tokens):

Provider / model	Per-turn cost (rough)	A 6-turn persona
OpenAI `gpt-4o-mini`	~$0.001 – $0.003	~$0.01
OpenAI `gpt-4o`	~$0.01 – $0.03	~$0.10
Anthropic Haiku	~$0.001 – $0.005	~$0.02
Anthropic Sonnet	~$0.01 – $0.03	~$0.10
GLM (Z.AI)	typically lower than OpenAI	~$0.01

These are ballpark figures for current pricing — check your provider's pricing page for live rates. Costs scale roughly linearly with conversation length because the prompt grows as message history is appended.

To track real spend, view your provider's billing dashboard for the API key your agent is configured with:

OpenAI: platform.openai.com/usage
Anthropic: console.anthropic.com → Usage
GLM (Z.AI): z.ai → Account → Billing
AWS Bedrock: AWS Cost Explorer, filter by service "Bedrock"

A practical rule of thumb: dry-run while you iterate (free), live-test each persona once before publishing a revision, and keep the persona set small (3–5 personas) so a full validation pass stays under a dollar at small-model pricing.

Calling the live-test endpoint¶

Call the portal API:

POST /api/agent-chat/test-persona
Content-Type: application/json
Authorization: Bearer <your-jwt>

{
  "agentId": "019d99f2...",
  "personaName": "Maria Santos",
  "openingMessage": "Hey, I'm Maria Santos from Austin.",
  "followUpMessages": [
    "I'm a freelance graphic designer. 4 years in, still solo.",
    "My goal is to double my revenue. Challenge: can't scale."
  ],
  "expectedConversations": ["onboarding"]
}

The server runs all turns automatically using the agent's configured LLM (the same model real users will get). Returns the same report format as the dry run, plus a full turn log:

{
  "chatId": "...",
  "personaName": "Maria Santos",
  "pass": true,
  "issues": [],
  "turns": [
    { "step": 0, "role": "assistant", "message": "Welcome! I'm Sage..." },
    { "step": 1, "role": "user", "message": "Hey, I'm Maria Santos..." },
    { "step": 1, "role": "assistant", "message": "Nice to meet you, Maria!...",
      "stageTransitioned": true, "newStageName": "background" },
    ...
  ],
  "stageTransitions": [
    { "step": 1, "newStage": "background" },
    { "step": 2, "newStage": "goals" }
  ],
  "conversationsVisited": ["onboarding"],
  "totalTurns": 8
}

Writing good personas¶

Cover the happy path¶

Start with a persona that follows the expected flow perfectly — gives their name, answers questions, provides all the data the extraction spec expects. This validates that the basic flow works.

Test partial data¶

Create a persona that withholds information: refuses to give their location, dodges the team size question. This validates that the chatbot handles missing data gracefully and that conditional edges fire correctly.

Test off-topic messages¶

Create a persona whose follow-up messages go off-topic: "Actually, can I ask about billing instead?" This tests onTrack detection and re-routing.

Test conversation transitions¶

Create a persona with follow-ups that naturally span two conversations (e.g., onboarding → strategy). Set expectedConversations to both. This validates that edge-based routing works end-to-end.

Name personas clearly¶

Use real-sounding names with a specific scenario. "Maria Santos — freelance designer, partial data" is much more useful than "test-user-3".

Recommended test workflow¶

Define 3–5 personas covering happy path, partial data, off-topic, and cross-conversation flows.
Dry-run each persona in Claude Code. Fix any issues in the conversation design (prompts, stages, edges, goals).
Iterate: edit the conversation, re-run the persona, check the report. No cost, fast feedback.
Live-test each persona once the dry runs pass. This validates the real model.
Save a revision (createRevision) after all personas pass.
Publish the revision (publishRevision).
Re-run personas after any change to catch regressions.

MCP tools reference¶

Tool	Description
`h-chat-define-persona`	Create or update a test persona
`h-chat-list-personas`	List all personas for an agent
`h-chat-run-persona`	Run a dry-run test step (iterative)
`h-chat-get-route-history`	Read route history after a test

conversation-routing.md — how routing, edges, goals, and the goal stack work
chatbot-end-to-end-test.md — manual end-to-end test with both Path A (portal Chat tab) and Path B (Claude Code + MCP)