Observability & Pipeline Tracing

Every customer message that reaches your Sales Assistant triggers a deterministic pipeline with multiple phases: order extraction, routing rules, AI processing, callbacks, and post-run effects. QuotyAI captures detailed telemetry for each phase so you can see exactly what happened, when it happened, and why.

What Gets Recorded

Each pipeline run (one customer message → one assistant response) produces a run record with five phases:

Customer sends a message
        ↓
┌───────────────────────────────┐
│  1. Order Extraction          │
│  What the assistant extracted │
│  from the message             │
└───────────────────────────────┘
        ↓
┌───────────────────────────────┐
│  2. Deterministic Router      │
│  Which instructions fired,    │
│  their generated code,        │
│  actions produced, duration   │
└───────────────────────────────┘
        ↓
┌───────────────────────────────┐
│  3. AI Agent (LLM)            │
│  Model used, token usage,     │
│  response preview, linked     │
│  to full LLM trace waterfall  │
└───────────────────────────────┘
        ↓
┌───────────────────────────────┐
│  4. Deterministic Callback    │
│  Which callbacks ran, their   │
│  generated code, actions      │
└───────────────────────────────┘
        ↓
┌───────────────────────────────┐
│  5. Post-Run Effects          │
│  Attachments sent, state      │
│  changes, handover events     │
└───────────────────────────────┘

Every phase captures start time, end time, duration, and status (success / failed / skipped). Errors are recorded at the phase level without breaking the pipeline.

The Pipeline Overview

From any AI message in the conversation view, click the observability button to open the Sales Assistant Observability modal. The default view is the Pipeline Overview — a vertical timeline of all five phases for that message.

Phase Cards

Each phase is a collapsible card showing:

Status dot (green = success, red = failed, gray = skipped)
Phase name and duration
Expand for details

Order Extraction

Shows the input conversation state and the structured order data the assistant extracted. This is the raw data that routers and the AI will work with.

Router (Pre-AI Automation)

Lists every deterministic router instruction that executed, along with:

Instruction content (the original business rule)
Generated TypeScript code (preview with syntax highlighting)
Actions returned (send_message, update_state, short_circuit, etc.)
Per-instruction duration and success/failure
Whether the router short-circuited the AI

Each router instruction runs as an independent generated function. If one fails, the rest continue. The card shows each instruction as a sub-card that expands to reveal the generated code and actions.

AI Agent (LLM)

Shows:

Model used (e.g., gpt-5)
Token usage (prompt, completion, total)
Response preview
Whether a handover to a human agent occurred
Link to the full LLM trace waterfall — the complete LangChain trace with every tool call, chain step, and LLM invocation

The “View Full LLM Trace Waterfall” button switches to the trace tab and filters to the relevant trace. The linked LLM trace ID is stored in the run record so you can always cross-reference.

Callbacks (Post-AI Automation)

Same structure as the router phase — each callback instruction listed with its generated code, actions, duration, and status. Callbacks run after the AI response and cannot affect what was sent to the customer.

Post-Run Effects

Shows:

Attachments sent — which instruction triggered each attachment and whether it succeeded
State changes — a before/after diff of every conversation state key that was modified
Handover events — if the run ended with a handover to a human agent, including the reason

The LLM Trace Waterfall

The second tab in the observability modal shows the LLM Trace Waterfall — a Gantt-chart visualization of everything that happened inside the AI agent:

Each LLM call is a span showing model, prompt tokens, completion tokens, and duration
Each tool call is a span showing the tool name, input arguments, and return value
Each chain step is a span showing the execution order
Spans are color-coded by type (LLM, Tool, Chain, Retriever, Other)
Expand any span to see its inputs, outputs, tags, and error details

The waterfall is the same view as traditional LLM observability but now linked directly from the pipeline overview via the llmRunIds array stored in the AI agent phase, supporting multiple LLM calls per pipeline run.

Search & Filter

Search — filter spans by name, type, or tags
Type filter — show only LLM calls, tool calls, chains, etc.
Sort — by start time, duration, name, or type
Statistics panel — aggregate metrics: total spans, avg/max/min duration, success rate, type breakdown

Export

Export the full trace as JSON, CSV, or copy all spans to clipboard for debugging or external analysis.

When Things Go Wrong

Failed Phase

If a phase fails (status = red), the error message is captured and displayed in the phase card. The pipeline continues — one failed router instruction doesn’t block the others. The overall run status reflects whether the entire run succeeded or failed.

Short-Circuited AI

When a router instruction returns short_circuit, the AI agent phase is skipped entirely. The pipeline overview shows a warning badge on the router phase and the AI agent phase shows status “skipped” with no data. This is expected behavior — the router intentionally bypassed the AI for that message.

Missing Run Record

If the observability modal shows “No pipeline data” but a trace waterfall exists, the run may have started before observability was enabled, or the message was processed by an older assistant version. Pipeline recording was introduced alongside the Sales Assistant Observability feature — existing messages won’t have run records.

Technical Architecture

Data Model

Run records are stored in a separate sales-assistant-runs MongoDB collection (independent from LLM trace records). Each record contains:

{
  runId: string,           // UUID — correlates with LLM trace
  status: string,          // "completed" | "failed"
  totalDurationMs: number,
  customerMessage: string,
  channelType: string,
  phases: {
    orderExtraction?: { ... },
    deterministicRouter?: {
      status: string,
      startTime: number,
      endTime: number,
      durationMs: number,
      instructionResults: [{
        instructionId: string,
        instructionContent: string,
        category: string,
        generatedCode: string,
        actions: [{ type, ... }],
        durationMs: number,
        success: boolean,
        error?: string
      }],
      shortCircuited: boolean,
      totalActionsProduced: number
    },
    aiAgent?: {
      status: string,
      modelUsed: string,
      tokenUsage: { prompt, completion, total },
      responsePreview: string,
      llmRunIds: string[],  // cross-reference to LLM trace(s)
      handoverOccurred: boolean
    },
    deterministicCallback?: { ... },
    afterRun?: {
      attachmentsSent: [{ instructionId, attachmentId, success }],
      stateChanges: [{ key, from, to }],
      handoverOccurred: boolean,
      handoverReason?: string
    }
  },
  createdAt: string,
  completedAt?: string
}

Recording Pipeline

The recorder service wraps each phase with timing, status capture, and error handling:

Phase start — records start time and phase metadata
Phase execution — runs the actual logic (router functions, AI invoke, callbacks)
Phase end — records end time, duration, status, and result data
Error handling — if a phase throws, the error is captured in the phase record and the run continues

The recorder never throws — errors are always captured in the phase status so downstream phases can still execute.

Correlation

The pipeline run _id is the correlation key across the entire system:

Sales Assistant Run record (pipeline phases) —— _id
        ↓
LLM Observability run —— assistantPipelineRunId
        ↓
LangChain run tree —— child_runs (recursive)

The llmRunIds array in the AI agent phase links directly to the LLM observability runs, so you can jump from “the AI said this” to “here’s every LLM call, tool invocation, and chain step that produced it.”

Best Practices

Debugging Router Instructions

When a router instruction doesn’t behave as expected, open the pipeline overview:

Check the Router phase — did the instruction run? (green check) Did it fail? (red X)
Expand the instruction to see the generated code — does it match your intent?
Check the Actions Returned — what did the function actually produce?
If the generated code is wrong, rewrite the instruction in clearer language and rebuild the assistant

Auditing AI Behavior

When an AI response seems off:

Open the Pipeline Overview — check which conditional prompts were active
Switch to the LLM Trace Waterfall — expand the LLM spans to see exactly what was sent to the model
Check tool call inputs and outputs — did the AI call the right tool with the right arguments?

Monitoring Performance

The statistics panel in the trace waterfall shows aggregate metrics. Use it to:

Identify slow LLM calls or tool invocations
Monitor token usage trends
Track success/failure rates over time

Cross-Referencing

The run record includes both the customer message and the AI response preview. Combined with the state changes in the post-run effects, you can reconstruct the full conversation context for any automated interaction — useful for compliance audits and customer dispute resolution.