Skip to content

PetalFlow Integration

PetalTrace provides deep integration with PetalFlow for rich observability of AI agent workflows. While PetalTrace works with any OpenTelemetry-instrumented application, PetalFlow integration captures additional context including graph topology, node-level inputs/outputs, and replay-capable snapshots.

Terminal window
petaltrace serve

Add the PetalTrace configuration to your petalflow.yaml:

observability:
petaltrace:
enabled: true
endpoint: "http://localhost:4318"
capture_mode: standard
tags:
environment: development
team: platform

PetalFlow automatically sends traces to PetalTrace when the configuration is enabled.

observability:
petaltrace:
# Enable/disable PetalTrace integration
enabled: true
# PetalTrace collector endpoint (OTLP/HTTP)
endpoint: "http://localhost:4318"
# Capture mode determines what data is captured
# - minimal: Latency, status, token counts, errors (~1 KB/span)
# - standard: + Full prompts, completions, tool I/O (~10-100 KB/span)
# - full: + All edge data, graph/config snapshots (~100 KB-1 MB/span)
capture_mode: standard
# Custom tags added to all runs
tags:
environment: production
team: platform
version: "1.0.0"
# Sampling rate (1.0 = capture everything)
sample_rate: 1.0
# Always capture failed runs regardless of sample rate
always_capture_errors: true
# Sample rate overrides by tag
sample_overrides:
environment:staging: 1.0 # Always capture staging
environment:production: 0.1 # Sample 10% of production
Terminal window
PETALTRACE_ENDPOINT=http://localhost:4318
PETALTRACE_CAPTURE_MODE=standard
PETALTRACE_SAMPLE_RATE=1.0

Best for production monitoring where storage is a concern.

Captures:

  • Run and span metadata (IDs, timestamps, status)
  • Token counts and cost estimates
  • Error messages and stack traces
  • Latency metrics

Does NOT capture:

  • Full prompt text
  • LLM completions
  • Tool inputs/outputs
  • Edge data payloads
observability:
petaltrace:
capture_mode: minimal

Recommended for development and debugging.

Captures everything in minimal, plus:

  • Full system prompts
  • Complete message history
  • LLM completions
  • Tool definitions
  • Tool inputs and outputs
  • Cache usage metrics
observability:
petaltrace:
capture_mode: standard

Required for replay functionality.

Captures everything in standard, plus:

  • Graph definition snapshot
  • Workflow inputs snapshot
  • Configuration snapshot (secrets masked)
  • All edge data payloads
observability:
petaltrace:
capture_mode: full

To replay a PetalFlow run, it must be captured at the appropriate level:

Replay ModeRequired Capture Level
livestandard or higher
mockedfull
hybridfull

Runs captured at minimal cannot be replayed.

PetalFlow enriches OpenTelemetry spans with these attributes:

AttributeDescription
petalflow.run.idUnique run identifier
petalflow.workflow.idWorkflow identifier
petalflow.workflow.nameHuman-readable workflow name
petalflow.workflow.versionWorkflow version
petalflow.source_kindagent_workflow, graph, or sdk
petalflow.trigger_sourcecli, api, ui, or schedule
petalflow.run.roottrue for root span
petalflow.graphGraph definition JSON (full mode)
petalflow.inputWorkflow inputs JSON (full mode)
petalflow.configConfiguration JSON (full mode)
AttributeDescription
petalflow.node.idGraph node identifier
petalflow.node.typeNode type (e.g., llm_prompt)
petalflow.node.retry_countNumber of retries

PetalFlow uses OTel GenAI semantic conventions plus extensions:

AttributeDescription
gen_ai.systemProvider name
gen_ai.request.modelModel identifier
gen_ai.request.temperatureSampling temperature
gen_ai.request.max_tokensMax tokens
gen_ai.usage.input_tokensInput token count
gen_ai.usage.output_tokensOutput token count
gen_ai.response.finish_reasonStop reason
petalflow.llm.system_promptFull system prompt
petalflow.llm.messagesMessage array JSON
petalflow.llm.completionCompletion JSON
petalflow.llm.tool_definitionsTool definitions
petalflow.llm.ttft_msTime to first token
petalflow.llm.cache_read_tokensPrompt cache reads
petalflow.llm.cache_creation_tokensPrompt cache writes
AttributeDescription
petalflow.tool.nameTool registry name
petalflow.tool.actionAction invoked
petalflow.tool.originnative, mcp, http, stdio
petalflow.tool.invoked_byLLM span ID (for function calling)
tool.use.idProvider’s tool_use block ID
AttributeDescription
petalflow.edge.source_nodeSource node ID
petalflow.edge.source_portSource port name
petalflow.edge.target_nodeTarget node ID
petalflow.edge.target_portTarget port name
petalflow.edge.data_size_bytesPayload size

PetalTrace provides an MCP overlay for PetalFlow integration, allowing workflows to query their own execution history.

Copy the overlay to your PetalFlow installation:

Terminal window
cp petaltrace/mcp/overlay.yaml $PETALFLOW_HOME/tools/overlays/petaltrace.overlay.yaml
workflow.yaml
agents:
diagnostic_agent:
role: "Workflow Diagnostician"
goal: "Analyze execution failures and recommend improvements"
tools:
- petaltrace.trace.list
- petaltrace.trace.get
- petaltrace.prompt.get
- petaltrace.diff.compare
tasks:
diagnose:
description: |
Review the last 5 failed runs of the 'research_pipeline' workflow.
Identify common failure patterns and suggest prompt or configuration changes.
agent: diagnostic_agent
expected_output: "A structured report with failure patterns and recommendations."

Agents can inspect their own prior executions:

agents:
research_agent:
role: "Research Assistant"
goal: "Research topics and learn from past performance"
tools:
- web_search
- petaltrace.trace.list
- petaltrace.prompt.get
tasks:
research_with_learning:
description: |
Research the topic: {topic}
Before researching, check your recent runs for similar topics
using petaltrace.trace.list. If you find relevant prior research,
use petaltrace.prompt.get to review what worked well.
agent: research_agent
  1. Verify PetalTrace is running:

    Terminal window
    curl http://localhost:8090/api/health
  2. Check the endpoint configuration:

    observability:
    petaltrace:
    endpoint: "http://localhost:4318" # OTLP/HTTP port
  3. Verify PetalFlow logging: Look for trace export logs in PetalFlow output.

Ensure capture mode is at least standard:

observability:
petaltrace:
capture_mode: standard

Ensure the run was captured at full mode for mocked/hybrid replay:

observability:
petaltrace:
capture_mode: full

For production, consider:

  1. Reduce capture mode:

    capture_mode: minimal
  2. Enable sampling:

    sample_rate: 0.1 # 10% sampling
    always_capture_errors: true
  3. Reduce retention:

    petaltrace.yaml
    retention:
    default: "7d"
petalflow.yaml
observability:
petaltrace:
enabled: true
endpoint: "http://petaltrace.internal:4318"
capture_mode: standard
sample_rate: 0.1
always_capture_errors: true
sample_overrides:
environment:staging: 1.0
tags:
environment: production
service: research-api
version: "2.1.0"

Query recent failures:

Terminal window
petaltrace runs list --status failed --since 24h --limit 20

Analyze costs by workflow:

Terminal window
petaltrace cost summary --since 7d --group-by workflow

Compare a failed run to a successful one:

Terminal window
petaltrace diff run-failed-123 run-success-456 --include-content