3. Dual Observability: OpenTelemetry + LangSmith
Status
Category
Context
Decision
Consequences
Positive Consequences
Negative Consequences
Neutral Consequences
Alternatives Considered
1. OpenTelemetry Only
2. LangSmith Only
3. Datadog or New Relic
4. Prometheus + Grafana Only
5. Custom Logging Solution
Implementation Details
OpenTelemetry Stack
LangSmith Integration
Configuration
Docker Compose Stack
Usage
Metrics Tracked
OpenTelemetry Metrics
LangSmith Metrics
References

3. Dual Observability: OpenTelemetry + LangSmith

Date: 2025-10-11

Status

Accepted

Context

Production systems require comprehensive observability to:

Debug issues quickly
Monitor performance
Track user behavior
Optimize LLM usage
Meet SLA requirements

For an LLM-powered MCP server, we have two distinct observability needs:

Infrastructure Observability: Traditional application metrics
- Request latency, error rates, throughput
- System resources (CPU, memory, disk)
- Distributed tracing across services
- Database query performance
LLM-Specific Observability: AI/ML-specific insights
- Prompt quality and effectiveness
- LLM response quality
- Token usage and costs
- Model performance comparison
- Chain/agent execution flow

No single tool excels at both. Using only one would sacrifice critical visibility in the other domain.

Decision

We will implement a dual observability strategy:

OpenTelemetry for infrastructure observability
- Distributed tracing with Jaeger
- Metrics with Prometheus
- Structured logging with trace correlation
- Standard OTLP exporters
LangSmith for LLM-specific observability
- Prompt engineering and debugging
- LLM call tracing and analysis
- Token usage tracking
- Model comparison and evaluation
- Dataset management

Users can enable one or both based on their needs:

OBSERVABILITY_BACKEND=opentelemetry (default)
OBSERVABILITY_BACKEND=langsmith
OBSERVABILITY_BACKEND=both (production recommended)

Consequences

Positive Consequences

Best of Both Worlds: Infrastructure + AI observability
Complete Visibility: No blind spots in production
Tool Specialization: Each tool does what it does best
Flexibility: Can use one or both based on needs
Industry Standard: OpenTelemetry is CNCF standard
LLM Optimization: LangSmith enables prompt engineering
Cost Tracking: Detailed token usage visibility

Negative Consequences

Increased Complexity: Two systems to configure and maintain
Higher Infrastructure Cost: Running both Jaeger and LangSmith
Learning Curve: Team must learn both systems
Data Duplication: Some overlap in traced data
Configuration Overhead: Separate config for each system

Neutral Consequences

Performance: Minimal overhead (~1-2% with both enabled)
Storage: Increased log/trace storage requirements
Vendor Risk: LangSmith is commercial (OpenTelemetry is free)

Alternatives Considered

1. OpenTelemetry Only

Description: Use only OpenTelemetry for all observability Pros:

Single system to maintain
Open-source, no vendor lock-in
Industry standard
Great for infrastructure

Cons:

Poor LLM-specific insights
No prompt debugging tools
Manual token tracking
No model comparison features

Why Rejected: Insufficient for LLM observability needs

2. LangSmith Only

Description: Use only LangSmith for all observability Pros:

Excellent LLM tracing
Great prompt debugging
Built-in evaluations
Cost tracking

Cons:

Vendor lock-in (LangChain product)
Poor infrastructure metrics
No distributed tracing
Less flexible than OpenTelemetry

Why Rejected: Insufficient for infrastructure observability

3. Datadog or New Relic

Description: Use commercial APM solution Pros:

All-in-one solution
Good infrastructure observability
Some LLM features

Cons:

Expensive at scale
Vendor lock-in
LLM features not as mature
Less flexible than open standards

Why Rejected: High cost, less specialized for LLMs

4. Prometheus + Grafana Only

Description: Use metrics-focused stack Pros:

Excellent metrics
Great visualization
Open-source

Cons:

No distributed tracing
No LLM-specific features
Manual instrumentation

Why Rejected: Missing tracing and LLM insights

5. Custom Logging Solution

Description: Build custom observability Pros:

Full control
Exactly what we need

Cons:

Massive development effort
Reinventing the wheel
Hard to maintain
No standard tools

Why Rejected: Not feasible to replicate existing tools

Implementation Details

OpenTelemetry Stack

# observability.py
- Tracer: Distributed tracing
- Meter: Metrics collection
- Logger: Structured logging with trace context

# Exporters:
- OTLP → Jaeger (traces)
- OTLP → Prometheus (metrics)
- Console (development)

LangSmith Integration

# langsmith_src/mcp_server_langgraph/core/config.py
- Conditional initialization
- Automatic trace context
- Token usage tracking
- Run metadata tagging

Configuration

# src/mcp_server_langgraph/core/config.py
observability_backend: str = "both"  # opentelemetry, langsmith, both
langsmith_api_key: Optional[str] = None
langsmith_tracing: bool = False

Docker Compose Stack

services:
  otel-collector:  # Receives traces/metrics
  jaeger:          # Trace visualization
  prometheus:      # Metrics storage
  grafana:         # Unified dashboard

Usage

# Both systems instrumented automatically
with tracer.start_as_current_span("agent.chat") as span:
    response = await llm.ainvoke(messages)
    # → Traced in both Jaeger and LangSmith

Metrics Tracked

OpenTelemetry Metrics

agent.tool.calls - Tool invocation count
agent.calls.successful - Success rate
agent.calls.failed - Error rate
agent.response.duration - Latency histogram
auth.failures - Auth errors
authz.failures - Authorization denials

LangSmith Metrics

Token usage per model
Cost per request
Prompt templates used
Model performance comparison
Chain execution paths

References

OpenTelemetry Documentation
LangSmith Documentation
LangSmith Tracing Guide
Observability Overview
Related Files: observability.py, langsmith_src/mcp_server_langgraph/core/config.py
Related ADRs: 0001 (LLM abstraction)

2. Fine-Grained Authorization with OpenFGA 4. MCP StreamableHTTP Transport Protocol

⌘I

Overview

Project

Core Platform

Authentication & Identity

Infrastructure & Deployment

Development & Quality

Testing Infrastructure

CI/CD & Operations

Tooling & Standards

Compliance

3. Dual Observability: OpenTelemetry + LangSmith

3. Dual Observability: OpenTelemetry + LangSmith

Status

Category

Context

Decision

Consequences

Positive Consequences

Negative Consequences

Neutral Consequences

Alternatives Considered

1. OpenTelemetry Only

2. LangSmith Only

3. Datadog or New Relic

4. Prometheus + Grafana Only

5. Custom Logging Solution

Implementation Details

OpenTelemetry Stack

LangSmith Integration

Configuration

Docker Compose Stack

Usage

Metrics Tracked

OpenTelemetry Metrics

LangSmith Metrics

References

Overview

Project

Core Platform

Authentication & Identity

Infrastructure & Deployment

Development & Quality

Testing Infrastructure

CI/CD & Operations

Tooling & Standards

Compliance

​3. Dual Observability: OpenTelemetry + LangSmith

​Status

​Category

​Context

​Decision

​Consequences

​Positive Consequences

​Negative Consequences

​Neutral Consequences

​Alternatives Considered

​1. OpenTelemetry Only

​2. LangSmith Only

​3. Datadog or New Relic

​4. Prometheus + Grafana Only

​5. Custom Logging Solution

​Implementation Details

​OpenTelemetry Stack

​LangSmith Integration

​Configuration

​Docker Compose Stack

​Usage

​Metrics Tracked

​OpenTelemetry Metrics

​LangSmith Metrics

​References

3. Dual Observability: OpenTelemetry + LangSmith

Status

Category

Context

Decision

Consequences

Positive Consequences

Negative Consequences

Neutral Consequences

Alternatives Considered

1. OpenTelemetry Only

2. LangSmith Only

3. Datadog or New Relic

4. Prometheus + Grafana Only

5. Custom Logging Solution

Implementation Details

OpenTelemetry Stack

LangSmith Integration

Configuration

Docker Compose Stack

Usage

Metrics Tracked

OpenTelemetry Metrics

LangSmith Metrics

References