Concepts
Concepts
Section titled “Concepts”This document explains the core concepts, architecture, and data model of PetalTrace.
Overview
Section titled “Overview”PetalTrace is an observability platform designed specifically for AI agent workflows. It captures and stores detailed execution data, enabling developers to:
- See what agents are thinking (full prompt/completion capture)
- Track token usage and costs across runs
- Replay workflows with different configurations
- Compare runs to identify what changed
- Enable agents to inspect their own execution history
Architecture
Section titled “Architecture”┌─────────────────────────────────────────────────────────────────────────────────┐│ PetalFlow / Any OTel App ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ ││ │ Execution │ │ OTel Spans │ │ Event │ │ SSE Stream │ ││ │ Engine │──│ │──│ System │──│ │ ││ └──────────────┘ └──────┬───────┘ └──────────────┘ └────────────────────┘ ││ │ │└───────────────────────────┼──────────────────────────────────────────────────────┘ │ OTLP/gRPC or OTLP/HTTP ▼┌──────────────────────────────────────────────────────────────────────────────────┐│ PetalTrace ││ ││ ┌────────────────┐ ┌────────────────┐ ┌─────────────────┐ ┌──────────────┐ ││ │ Collector │ │ Trace Store │ │ Replay Engine │ │ Diff Engine │ ││ │ (OTLP ingest) │ │ (SQLite) │ │ │ │ │ ││ └───────┬────────┘ └───────┬────────┘ └────────┬────────┘ └──────┬───────┘ ││ │ │ │ │ ││ ┌───────┴───────────────────┴────────────────────┴──────────────────┴───────┐ ││ │ HTTP API + SSE │ ││ └───────────────────────────┬───────────────────────────────────────────────┘ ││ │ ││ ┌───────────────────────────┴───────────────────────────────────────────────┐ ││ │ MCP Server │ ││ │ petaltrace.trace.* · petaltrace.prompt.* · petaltrace.diff.* │ ││ └───────────────────────────────────────────────────────────────────────────┘ ││ │└──────────────────────────────────────────────────────────────────────────────────┘Components
Section titled “Components”| Component | Responsibility |
|---|---|
| Collector | Receives OTLP spans (gRPC + HTTP), classifies by kind, enriches with costs, writes to store |
| Trace Store | Persists runs, spans, LLM interactions, and metadata in SQLite |
| Replay Engine | Re-executes captured runs with live, mocked, or hybrid modes |
| Diff Engine | Compares two runs structurally, by content, and by cost |
| HTTP API | REST + SSE endpoints for querying traces and triggering operations |
| MCP Server | Exposes trace capabilities to agents via MCP protocol |
| CLI | Command-line interface for all operations |
Data Model
Section titled “Data Model”A Run represents a single execution of a workflow. It contains:
- Unique identifier (ULID format)
- Workflow name and version
- Status:
running,completed,failed, orcancelled - Timestamps for start and completion
- Snapshots of graph definition, inputs, and config
- Aggregated token counts and cost estimates
- User-defined tags
type Run struct { ID string WorkflowID string WorkflowName string Status RunStatus StartedAt time.Time CompletedAt *time.Time GraphSnapshot json.RawMessage InputSnapshot json.RawMessage TotalTokens TokenSummary EstimatedCost CostEstimate Tags map[string]string ParentRunID *string // For replay lineage}A Span represents a single unit of work within a run. Spans form a tree (parent-child relationships) mirroring the OpenTelemetry span model.
type Span struct { ID string RunID string ParentID *string TraceID string Kind SpanKind Name string Status SpanStatus StartedAt time.Time CompletedAt *time.Time DurationMs int64 Node *NodeSpanData LLM *LLMSpanData Tool *ToolSpanData Edge *EdgeSpanData Error *SpanError}Span Kinds
Section titled “Span Kinds”PetalTrace classifies spans into five kinds:
| Kind | Description | Captured Data |
|---|---|---|
node | Graph node execution | Node ID, type, inputs, outputs, config, retry count |
llm | LLM provider API call | Provider, model, full prompt, completion, tokens, latency, cache usage |
tool | Tool invocation | Tool name, action, origin, inputs, outputs |
edge | Data transfer between nodes | Source/target node and port, data size |
custom | Any other span | Attributes from OTel |
LLM Span Data
Section titled “LLM Span Data”The most important span type for debugging. Captures the complete LLM interaction:
type LLMSpanData struct { Provider string // "anthropic", "openai", "google" Model string // "claude-sonnet-4-20250514" SystemPrompt string // Full system prompt Messages []LLMMessage // Complete message history ToolDefinitions []ToolDefinition // Tools presented to the model Completion LLMCompletion // Full response Tokens TokenDetail // Input, output, cache tokens + cost TimeToFirstToken *int64 // Streaming TTFT TotalLatency int64 CacheRead *int // Prompt cache hits CacheCreation *int // Prompt cache writes}Token and Cost Model
Section titled “Token and Cost Model”PetalTrace tracks token usage and computes costs automatically:
type TokenSummary struct { InputTokens int OutputTokens int CacheReadTokens int CacheWriteTokens int TotalTokens int}
type CostEstimate struct { Currency string // "USD" Total float64 ByProvider map[string]float64 // Provider → cost ByModel map[string]float64 // Model → cost ByNode map[string]float64 // Node ID → cost}Costs are computed using a built-in pricing table that includes current rates for:
- Anthropic (Claude 3, 3.5, 4, Haiku, Sonnet, Opus)
- OpenAI (GPT-4o, o1, o3-mini)
- Google (Gemini 1.5, 2.0)
- DeepSeek
- Mistral
Custom pricing can be configured via overrides.
Trace Flow
Section titled “Trace Flow”Ingestion Pipeline
Section titled “Ingestion Pipeline”OTel Span → Receiver → Classifier → Correlator → Enricher → Writer → Store- Receiver: Accepts OTLP spans via gRPC (4317) or HTTP (4318)
- Classifier: Determines span kind from attributes (
gen_ai.*,petalflow.*, etc.) - Correlator: Groups spans into runs, identifies root spans, extracts snapshots
- Enricher: Computes costs, normalizes provider names, extracts text for search
- Writer: Batches writes to SQLite with FTS indexing
Classification Rules
Section titled “Classification Rules”PetalTrace classifies spans using these attribute checks:
gen_ai.systemorgen_ai.request.model→ LLM spanpetalflow.node.idorpetalflow.node.type→ Node spantool.nameorpetalflow.tool.name→ Tool spanpetalflow.edge.sourceorpetalflow.edge.target→ Edge span- Span name contains “llm”, “chat”, “completion” → LLM span
- Default → Custom span
Run Correlation
Section titled “Run Correlation”Spans are grouped into runs using:
petalflow.run.idspan attribute (preferred)petalflow.run.idresource attribute- Fallback:
trace-{traceID}
Capture Modes
Section titled “Capture Modes”When using PetalFlow, you can configure how much data is captured:
| Mode | What’s Captured | Storage Impact | Use Case |
|---|---|---|---|
minimal | Latency, status, token counts, errors | ~1 KB/span | Production monitoring |
standard | + Full prompts, completions, tool I/O | ~10-100 KB/span | Development, debugging |
full | + All edge data, graph/config snapshots | ~100 KB-1 MB/span | Replay-capable runs |
Replay requires standard (live mode) or full (mocked/hybrid mode) capture.
Replay Modes
Section titled “Replay Modes”The replay engine supports three modes:
| Mode | LLM Calls | Tool Calls | Use Case |
|---|---|---|---|
live | Real providers | Real tools | Re-execute with different config |
mocked | Captured responses | Captured results | Deterministic testing, cost-free |
hybrid | Real providers | Captured results | Test prompt changes |
Replayed runs maintain lineage via parent_run_id.
Diff Strategies
Section titled “Diff Strategies”The diff engine compares runs across multiple dimensions:
- Structural diff: Are the same nodes executed in the same order?
- Content diff: Compare LLM outputs with unified diff and similarity scores
- Cost diff: Token and cost comparison per node, per provider, aggregate
MCP Integration
Section titled “MCP Integration”PetalTrace exposes an MCP server that agents can use for self-inspection:
petaltrace.trace.list List recent runspetaltrace.trace.get Get run detail with spanspetaltrace.trace.search Search by contentpetaltrace.prompt.get Get full prompt/completionpetaltrace.cost.summary Aggregate cost metricspetaltrace.cost.run Per-run cost breakdownpetaltrace.diff.compare Compare two runspetaltrace.run.replay Trigger replayThis enables self-reflective agent patterns where an agent can diagnose its own failures by inspecting prior executions.