Test Personas¶
Automated chatbot testing with predefined personas. Define a persona once, run it as many times as you want, get a pass/fail report.
Two modes¶
| Dry run (free) | Live test (costs money) | |
|---|---|---|
| Where | Claude Code + MCP tools | Portal API |
| Who plays the chatbot | Claude Code's own model | Agent's configured LLM |
| Cost | Free | One LLM call per turn |
| Best for | Iterating on flow design | Final validation before publish |
| Speed | Interactive (you see each turn) | Automatic (all turns at once) |
Both modes use the same persona definition and produce the same report format. Start with dry runs, then do a live test when the flow looks good.
Defining a persona¶
A test persona has:
- Name: who they are (e.g. "Maria Santos")
- Description: brief background and what they want
- Opening message: the first thing they say
- Follow-up messages: subsequent messages, in order
- Conversation name (optional): which conversation to start with
- Expected conversations (optional): the conversations the persona should visit, for pass/fail validation
In Claude Code¶
> Use h-chat-define-persona for agent [agent-id].
> Name: Maria Santos
> Description: Freelance graphic designer in Austin, 4 years solo,
> wants to double her revenue but can't scale without help.
> Opening message: "Hey, I'm Maria Santos from Austin."
> Follow-up messages:
> - "I'm a freelance graphic designer. 4 years in, still solo."
> - "My goal is to double my revenue. Challenge: can't scale
> without help but hiring feels risky."
> Expected conversations: ["onboarding"]
Claude Code calls h-chat-define-persona and saves the persona in the
agent's system memory under test-personas:maria-santos.
List existing personas¶
Uses h-chat-list-personas.
Running a dry-run test (free)¶
In Claude Code:
Claude Code will:
- Call
h-chat-run-personawith step=0 → starts the chat, sends Maria's opening message, gets the compiled prompt back. - Generate a chatbot response based on the prompt (Claude Code's own model plays the chatbot).
- Call
h-chat-processwith the response. - Call
h-chat-run-personawith step=1 → sends the first follow-up. - Repeat until all follow-ups are done.
- The final call returns a test report.
The test report¶
{
"chatId": "chats:20260419-abc12345-onboarding",
"status": "COMPLETE",
"pass": true,
"issues": [],
"visitedConversations": ["onboarding"],
"totalTurns": 8,
"routeHistory": [ ... ],
"goalStack": [ ... ]
}
pass: true if all expectedConversations were visited.
issues: list of problems found (missing conversations, off-track
turns, dead ends).
visitedConversations: which conversations the persona actually
went through.
routeHistory: full chronological log of stage enters/exits and
edge traversals.
goalStack: final state of the goal stack (completed, active, or
abandoned goals).
Running a live test (costs money)¶
Cost¶
Live tests use the agent's configured LLM provider. Hadron does not bill per turn — you pay the provider directly through whatever API key the agent's AI configuration uses.
A persona run costs (turns × cost-per-turn). Cost-per-turn varies by provider, model, prompt size, and message-history length. As a rough order of magnitude for a typical chatbot turn (1–3K input tokens, 200–500 output tokens):
| Provider / model | Per-turn cost (rough) | A 6-turn persona |
|---|---|---|
OpenAI gpt-4o-mini |
~$0.001 – $0.003 | ~$0.01 |
OpenAI gpt-4o |
~$0.01 – $0.03 | ~$0.10 |
| Anthropic Haiku | ~$0.001 – $0.005 | ~$0.02 |
| Anthropic Sonnet | ~$0.01 – $0.03 | ~$0.10 |
| GLM (Z.AI) | typically lower than OpenAI | ~$0.01 |
These are ballpark figures for current pricing — check your provider's pricing page for live rates. Costs scale roughly linearly with conversation length because the prompt grows as message history is appended.
To track real spend, view your provider's billing dashboard for the API key your agent is configured with:
- OpenAI: platform.openai.com/usage
- Anthropic: console.anthropic.com → Usage
- GLM (Z.AI): z.ai → Account → Billing
- AWS Bedrock: AWS Cost Explorer, filter by service "Bedrock"
A practical rule of thumb: dry-run while you iterate (free), live-test each persona once before publishing a revision, and keep the persona set small (3–5 personas) so a full validation pass stays under a dollar at small-model pricing.
Calling the live-test endpoint¶
Call the portal API:
POST /api/agent-chat/test-persona
Content-Type: application/json
Authorization: Bearer <your-jwt>
{
"agentId": "019d99f2...",
"personaName": "Maria Santos",
"openingMessage": "Hey, I'm Maria Santos from Austin.",
"followUpMessages": [
"I'm a freelance graphic designer. 4 years in, still solo.",
"My goal is to double my revenue. Challenge: can't scale."
],
"expectedConversations": ["onboarding"]
}
The server runs all turns automatically using the agent's configured LLM (the same model real users will get). Returns the same report format as the dry run, plus a full turn log:
{
"chatId": "...",
"personaName": "Maria Santos",
"pass": true,
"issues": [],
"turns": [
{ "step": 0, "role": "assistant", "message": "Welcome! I'm Sage..." },
{ "step": 1, "role": "user", "message": "Hey, I'm Maria Santos..." },
{ "step": 1, "role": "assistant", "message": "Nice to meet you, Maria!...",
"stageTransitioned": true, "newStageName": "background" },
...
],
"stageTransitions": [
{ "step": 1, "newStage": "background" },
{ "step": 2, "newStage": "goals" }
],
"conversationsVisited": ["onboarding"],
"totalTurns": 8
}
Writing good personas¶
Cover the happy path¶
Start with a persona that follows the expected flow perfectly — gives their name, answers questions, provides all the data the extraction spec expects. This validates that the basic flow works.
Test partial data¶
Create a persona that withholds information: refuses to give their location, dodges the team size question. This validates that the chatbot handles missing data gracefully and that conditional edges fire correctly.
Test off-topic messages¶
Create a persona whose follow-up messages go off-topic: "Actually,
can I ask about billing instead?" This tests onTrack detection and
re-routing.
Test conversation transitions¶
Create a persona with follow-ups that naturally span two conversations
(e.g., onboarding → strategy). Set expectedConversations to both.
This validates that edge-based routing works end-to-end.
Name personas clearly¶
Use real-sounding names with a specific scenario. "Maria Santos — freelance designer, partial data" is much more useful than "test-user-3".
Recommended test workflow¶
- Define 3–5 personas covering happy path, partial data, off-topic, and cross-conversation flows.
- Dry-run each persona in Claude Code. Fix any issues in the conversation design (prompts, stages, edges, goals).
- Iterate: edit the conversation, re-run the persona, check the report. No cost, fast feedback.
- Live-test each persona once the dry runs pass. This validates the real model.
- Save a revision (
createRevision) after all personas pass. - Publish the revision (
publishRevision). - Re-run personas after any change to catch regressions.
MCP tools reference¶
| Tool | Description |
|---|---|
h-chat-define-persona |
Create or update a test persona |
h-chat-list-personas |
List all personas for an agent |
h-chat-run-persona |
Run a dry-run test step (iterative) |
h-chat-get-route-history |
Read route history after a test |
Related docs¶
- conversation-routing.md — how routing, edges, goals, and the goal stack work
- chatbot-end-to-end-test.md — manual end-to-end test with both Path A (portal Chat tab) and Path B (Claude Code + MCP)