Skip to content

MCP Server

PetalTrace exposes an MCP (Model Context Protocol) server that enables AI agents to query trace data, inspect prompts, analyze costs, compare runs, and trigger replays. This enables self-reflective agent patterns where agents can diagnose their own execution history.

Terminal window
petaltrace mcp

The MCP server uses stdio transport, reading JSON-RPC messages from stdin and writing responses to stdout. Logs are written to stderr.

Add PetalTrace to your Claude Code MCP configuration:

{
"mcpServers": {
"petaltrace": {
"command": "petaltrace",
"args": ["mcp"],
"env": {}
}
}
}

After configuration, Claude Code can use PetalTrace tools to inspect trace data.

List recent runs with optional filtering.

Input Schema:

{
"workflow": "string (optional) - Filter by workflow name",
"status": "string (optional) - running|completed|failed|cancelled",
"since": "string (optional) - Duration like '24h' or '7d'",
"limit": "integer (optional) - Max results, default 50",
"cursor": "string (optional) - Pagination cursor"
}

Example:

{
"name": "petaltrace.trace.list",
"arguments": {
"status": "failed",
"since": "24h",
"limit": 10
}
}

Response:

{
"runs": [
{
"id": "run-01JK3ABC",
"workflow_name": "research-pipeline",
"status": "failed",
"duration_ms": 3400,
"total_tokens": 2100,
"estimated_cost": 0.0089,
"started_at": "2026-03-17T10:14:15Z",
"error_message": "Tool 'web_search' timed out"
}
],
"count": 1,
"has_more": false
}

Get detailed information about a specific run.

Input Schema:

{
"run_id": "string (required) - Run identifier",
"include_spans": "boolean (optional) - Include span tree, default true"
}

Example:

{
"name": "petaltrace.trace.get",
"arguments": {
"run_id": "run-01JK3ABC",
"include_spans": true
}
}

Response:

{
"run": {
"id": "run-01JK3ABC",
"workflow_name": "research-pipeline",
"status": "failed",
"started_at": "2026-03-17T10:14:15Z",
"completed_at": "2026-03-17T10:14:18.4Z",
"duration_ms": 3400,
"total_tokens": {...},
"estimated_cost": {...}
},
"spans": [
{
"id": "span-123",
"kind": "node",
"name": "researcher_agent",
"status": "ok",
"duration_ms": 2000
},
{
"id": "span-456",
"kind": "tool",
"name": "web_search",
"status": "error",
"error_message": "Request timeout"
}
]
}

Search runs by content in prompts and completions.

Input Schema:

{
"query": "string (required) - Search query",
"workflow": "string (optional) - Filter by workflow",
"limit": "integer (optional) - Max results, default 20"
}

Example:

{
"name": "petaltrace.trace.search",
"arguments": {
"query": "API authentication error",
"limit": 5
}
}

Get the full prompt and completion for an LLM node.

Input Schema:

{
"run_id": "string (required) - Run identifier",
"node_id": "string (required) - Node identifier",
"include_completion": "boolean (optional) - Include response, default true"
}

Example:

{
"name": "petaltrace.prompt.get",
"arguments": {
"run_id": "run-01JK3ABC",
"node_id": "researcher_agent"
}
}

Response:

{
"span_id": "span-789",
"node_id": "researcher_agent",
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"prompt": {
"system_prompt": "You are a helpful research assistant.",
"messages": [
{"role": "user", "content": "Research the topic of AI observability"}
],
"tool_definitions": [...]
},
"completion": {
"text_content": "Based on my research...",
"stop_reason": "end_turn"
},
"tokens": {
"input_tokens": 500,
"output_tokens": 1200,
"cost_estimate": 0.0089
},
"timing": {
"duration_ms": 1500,
"time_to_first_token_ms": 150
}
}

Get aggregate cost metrics.

Input Schema:

{
"since": "string (optional) - Time window, default '7d'",
"workflow": "string (optional) - Filter by workflow",
"group_by": "string (optional) - workflow|provider|model"
}

Example:

{
"name": "petaltrace.cost.summary",
"arguments": {
"since": "30d",
"group_by": "workflow"
}
}

Response:

{
"since": "2026-02-17T00:00:00Z",
"until": "2026-03-17T23:59:59Z",
"total_runs": 450,
"total_tokens": 5234567,
"total_cost": 52.34,
"by_workflow": {
"research-pipeline": {"runs": 200, "cost": 25.00},
"email-processor": {"runs": 250, "cost": 27.34}
}
}

Get per-run cost breakdown.

Input Schema:

{
"run_id": "string (required) - Run identifier"
}

Compare two runs.

Input Schema:

{
"base_run_id": "string (required) - Base run for comparison",
"compare_run_id": "string (required) - Run to compare against base",
"include_content": "boolean (optional) - Include text diffs",
"include_similarity": "boolean (optional) - Include similarity scores"
}

Example:

{
"name": "petaltrace.diff.compare",
"arguments": {
"base_run_id": "run-01JK3ABC",
"compare_run_id": "run-01JK3XYZ",
"include_content": true
}
}

Response:

{
"summary": {
"status_match": true,
"path_divergence": false,
"duration_delta_ms": 500,
"token_delta": 150,
"cost_delta": 0.0012,
"node_diff_count": 2
},
"node_diffs": [
{
"node_id": "researcher_agent",
"status": "content_diff",
"output_diff": {
"similarity": 0.85
},
"token_diff": {
"base_total": 1700,
"compare_total": 1850
}
}
],
"cost_diff": {
"base_cost": 0.0089,
"compare_cost": 0.0101,
"delta": 0.0012
}
}

Trigger a replay of a prior run.

Input Schema:

{
"run_id": "string (required) - Run to replay",
"mode": "string (optional) - live|mocked|hybrid, default 'live'",
"model": "string (optional) - Override model",
"temperature": "number (optional) - Override temperature",
"auto_diff": "boolean (optional) - Compute diff after completion",
"tags": "object (optional) - Tags for the new run"
}

Example:

{
"name": "petaltrace.run.replay",
"arguments": {
"run_id": "run-01JK3ABC",
"mode": "live",
"model": "claude-3-opus-20240229",
"auto_diff": true,
"tags": {"experiment": "opus-test"}
}
}

Response:

{
"replay_id": "replay-456",
"source_run_id": "run-01JK3ABC",
"new_run_id": "run-01JK3NEW",
"diff_id": "diff-789",
"mode": "live",
"status": "completed"
}

An agent can analyze its own failed runs:

User: Analyze why my research workflow has been failing recently.
Agent: I'll use PetalTrace to investigate recent failures.
[Calls petaltrace.trace.list with status="failed", workflow="research-pipeline", since="24h"]
I found 3 failed runs in the last 24 hours. Let me examine the most recent one.
[Calls petaltrace.trace.get with run_id="run-01JK3ABC", include_spans=true]
The failure occurred in the 'web_search' tool with a timeout error. Let me check the prompt to understand what was being searched.
[Calls petaltrace.prompt.get with run_id="run-01JK3ABC", node_id="researcher_agent"]
I see the issue. The search query was too broad: "comprehensive analysis of all machine learning frameworks from 2020-2026". This likely caused the web search to timeout.
Recommendation: Modify the prompt to generate more focused, specific search queries.

An agent can identify expensive workflows:

User: Help me reduce my AI costs.
Agent: I'll analyze your recent usage patterns.
[Calls petaltrace.cost.summary with since="30d", group_by="workflow"]
Your top cost driver is 'content-writer' at $45.23/month. Let me examine a typical run.
[Calls petaltrace.cost.run with run_id="run-expensive-123"]
The 'draft_generation' node uses 80% of tokens. It's using claude-3-opus. Let me compare costs if we used a different model.
[Calls petaltrace.run.replay with run_id="run-expensive-123", mode="live", model="claude-sonnet-4-20250514", auto_diff=true]
The replay completed with similar quality but 40% lower cost. I recommend switching the draft_generation node to Sonnet.

An agent can compare behavior across versions:

User: I updated my research prompt. Can you verify it still works?
Agent: I'll compare the new behavior against a previous successful run.
[Calls petaltrace.trace.list with workflow="research-pipeline", status="completed", limit=1]
Found a successful run from before your changes. Let me trigger a new run and compare.
[After new run completes, calls petaltrace.diff.compare with base_run_id="run-before", compare_run_id="run-after", include_content=true]
The diff shows:
- Token usage increased by 15% (more detailed responses)
- Output similarity is 0.78 (some structural changes)
- Cost increased by $0.002 per run
The changes look intentional - the new prompt produces more detailed research. No regressions detected.

Using PetalTrace tools in a PetalFlow workflow:

diagnostic-workflow.yaml
agents:
diagnostic_agent:
role: "Workflow Diagnostician"
goal: "Analyze execution failures and recommend improvements"
tools:
- petaltrace.trace.list
- petaltrace.trace.get
- petaltrace.prompt.get
- petaltrace.diff.compare
- petaltrace.cost.summary
tasks:
diagnose_failures:
description: |
Review the last 10 failed runs across all workflows.
For each failure:
1. Identify the failing span
2. Examine the prompt that led to the failure
3. Categorize the failure type (timeout, rate limit, prompt issue, etc.)
Produce a report with:
- Failure categories and counts
- Most problematic workflows
- Specific recommendations for each category
agent: diagnostic_agent
expected_output: "Structured failure analysis report with recommendations"
cost_analysis:
description: |
Analyze cost trends over the past 30 days.
Identify:
- Top 3 most expensive workflows
- Any cost anomalies (runs >2x average)
- Opportunities for model downgrades
For any workflow spending >$10/day, examine specific runs
to understand the cost drivers.
agent: diagnostic_agent
expected_output: "Cost analysis report with optimization recommendations"

PetalTrace MCP server implements JSON-RPC 2.0 over stdio.

{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "petaltrace.trace.list",
"arguments": {
"status": "failed",
"limit": 10
}
}
}
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [
{
"type": "text",
"text": "{\"runs\": [...], \"count\": 5}"
}
]
}
}
MethodDescription
initializeServer capability negotiation
tools/listList available tools
tools/callInvoke a tool

PetalTrace MCP server implements protocol version 2024-11-05.