Skip to main content

Overview

Comprehensive observability with dual backends - OpenTelemetry for distributed tracing and metrics, plus LangSmith for LLM-specific insights. Track every request from ingress to LLM response with full context correlation.
Dual observability provides both infrastructure monitoring (OpenTelemetry) and AI-specific insights (LangSmith) in a unified platform.

Architecture

OpenTelemetry Instrumentation Flow

The following diagram illustrates how OpenTelemetry auto-instrumentation captures telemetry data from your application and routes it through the collector to various backend systems:
Auto-instrumentation captures telemetry without code changes. The collector batches, samples, and filters data before exporting to multiple backends for analysis and visualization.

Quick Start

1

Deploy Observability Stack

# Start all observability services
docker compose up -d jaeger prometheus grafana

# Verify services
curl http://localhost:16686  # Jaeger UI
curl http://localhost:9090   # Prometheus
curl http://localhost:3000   # Grafana
2

Configure Application

# .env
ENABLE_TRACING=true
ENABLE_METRICS=true
OTLP_ENDPOINT=http://localhost:4317

# Optional: LangSmith
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-key-here
LANGSMITH_PROJECT=mcp-server-langgraph
3

Generate Traces

# Make a request
curl -X POST http://localhost:8000/message \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "Hello!"}'

# Response includes trace_id
{
  "content": "Hello! How can I help you?",
  "trace_id": "abc123def456..."
}
4

View in Jaeger

  1. Open http://localhost:16686
  2. Select service: mcp-server-langgraph
  3. Click “Find Traces”
  4. Click on trace to see details

OpenTelemetry Tracing

Trace Structure

Every request creates a trace with multiple spans:
Trace: POST /message
├─ Span: http_request
│  ├─ Span: authenticate_user
│  │  ├─ Span: redis_get_session
│  │  └─ Span: keycloak_verify_token
│  ├─ Span: authorize_user
│  │  └─ Span: openfga_check
│  ├─ Span: execute_agent
│  │  ├─ Span: llm_generate
│  │  │  └─ Span: litellm_completion
│  │  └─ Span: tool_execution
│  └─ Span: refresh_session

Trace Attributes

Each span includes rich metadata:
  • HTTP Spans
  • LLM Spans
  • Auth Spans
{
  "http.method": "POST",
  "http.url": "/message",
  "http.status_code": 200,
  "http.user_agent": "curl/7.68.0",
  "user.id": "alice",
  "trace_id": "abc123...",
  "span_id": "def456..."
}

Custom Instrumentation

Add custom spans to your code:
from mcp_server_langgraph.observability.telemetry import tracer

@tracer.start_as_current_span("custom_operation")
def my_function():
    # Your code here
    with tracer.start_as_current_span("sub_operation") as span:
        span.set_attribute("custom.attribute", "value")
        result = do_work()
        span.set_attribute("result.count", len(result))
        return result

Metrics

Available Metrics

Request Metrics

HTTP request metrics:
  • http_requests_total - Total requests by method, status
  • http_request_duration_seconds - Request latency histogram
  • http_requests_in_progress - Active requests gauge
Query:
# Request rate
rate(http_requests_total[5m])

# p95 latency
histogram_quantile(0.95, http_request_duration_seconds_bucket)

# Error rate
rate(http_requests_total{status=~"5.."}[5m])
Auth metrics (30+ metrics):
  • auth_attempts_total - Auth attempts by result
  • auth_session_created_total - Sessions created
  • auth_session_active - Active sessions gauge
  • auth_token_validation_duration_seconds - Token validation latency
Query:
# Auth success rate
rate(auth_attempts_total{result="success"}[5m]) /
  rate(auth_attempts_total[5m])

# Active sessions
auth_session_active

# Failed logins
increase(auth_attempts_total{result="failure"}[1h])
OpenFGA metrics:
  • authz_check_total - Permission checks by result
  • authz_check_duration_seconds - Authorization latency
  • authz_cache_hits_total - Cache hit rate
Query:
# Authorization success rate
rate(authz_check_total{allowed="true"}[5m])

# Authorization latency p99
histogram_quantile(0.99, authz_check_duration_seconds_bucket)

# Cache hit rate
rate(authz_cache_hits_total[5m]) /
  rate(authz_check_total[5m])
LLM usage metrics:
  • llm_requests_total - LLM requests by provider, model
  • llm_tokens_total - Token usage by type (prompt, completion)
  • llm_latency_seconds - LLM response time
  • llm_errors_total - LLM errors by type
Query:
# Token usage per minute
rate(llm_tokens_total[1m])

# Average LLM latency
rate(llm_latency_seconds_sum[5m]) /
  rate(llm_latency_seconds_count[5m])

# Cost estimation (approximate)
rate(llm_tokens_total{type="prompt"}[1h]) * 0.000003 +
rate(llm_tokens_total{type="completion"}[1h]) * 0.000015

Custom Metrics

Create custom metrics:
from mcp_server_langgraph.observability.telemetry import meter

## Counter
request_counter = meter.create_counter(
    "custom_requests_total",
    description="Total custom requests",
    unit="1"
)
request_counter.add(1, {"type": "custom"})

## Histogram
latency_histogram = meter.create_histogram(
    "custom_duration_seconds",
    description="Custom operation duration",
    unit="s"
)
latency_histogram.record(0.123, {"operation": "custom"})

## Gauge
active_gauge = meter.create_up_down_counter(
    "custom_active",
    description="Active custom operations",
    unit="1"
)
active_gauge.add(1)

Prometheus Configuration

Scraping

Add scraping configuration:
## prometheus.yml
scrape_configs:
  - job_name: 'mcp-server-langgraph'
    scrape_interval: 15s
    static_configs:
      - targets: ['mcp-server-langgraph:8000']
    metrics_path: '/metrics/prometheus'

Alerting Rules

## alerts.yml
groups:
  - name: langgraph_agent
    interval: 30s
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 2m
        annotations:
          summary: "High error rate detected"

      # High LLM latency
      - alert: HighLLMLatency
        expr: |
          histogram_quantile(0.95,
            llm_latency_seconds_bucket) > 5
        for: 5m
        annotations:
          summary: "LLM p95 latency > 5s"

      # Auth failures
      - alert: AuthFailureSpike
        expr: |
          rate(auth_attempts_total{result="failure"}[5m]) > 10
        for: 1m
        annotations:
          summary: "Authentication failure spike"

Grafana Dashboards

Import Dashboards

## Import pre-built dashboards
kubectl create configmap grafana-dashboards \
  --from-file=dashboards/ \
  --namespace=observability

## Label for auto-discovery
kubectl label configmap grafana-dashboards \
  grafana_dashboard=1 \
  --namespace=observability

Key Panels

  • Overview
  • Authentication
  • Authorization
  • LLM
  • Request rate (RED metrics)
  • Error rate
  • p50/p95/p99 latency
  • Active sessions
  • LLM token usage

LangSmith Integration

Setup

1

Create Account

2

Get API Key

Generate API key from settings
3

Configure Application

# .env
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=ls_api_key_...
LANGSMITH_PROJECT=mcp-server-langgraph
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
4

Verify

Make a request and view in LangSmith UI

Features

View full prompts and responses:
  • Input messages
  • System prompts
  • LLM responses
  • Token counts
  • Latency breakdown
See execution flow:
  • Agent state transitions
  • Tool invocations
  • LLM calls
  • Conditional routing
Test and evaluate:
  • Accuracy metrics
  • Cost analysis
  • Latency benchmarks
  • A/B testing
Debug issues:
  • Error traces
  • Failed requests
  • Slow queries
  • Token usage spikes

Custom Annotations

from langsmith import trace

@trace(name="custom_step", project="mcp-server-langgraph")
def custom_function(input_data):
    # Automatically traced to LangSmith
    result = process(input_data)
    return result

Logging

Structured Logging

All logs are JSON-formatted with trace context:
{
  "timestamp": "2025-10-12T10:30:00.123Z",
  "level": "INFO",
  "service": "mcp-server-langgraph",
  "trace_id": "abc123def456...",
  "span_id": "789ghi...",
  "user_id": "alice",
  "event": "llm_request",
  "provider": "anthropic",
  "model": "claude-sonnet-4-5-20250929",
  "tokens": 212,
  "latency_ms": 1245,
  "status": "success"
}

Log Aggregation

  • Loki
  • ELK Stack
  • Cloud Logging
# promtail-config.yml
clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: mcp-server-langgraph
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: [mcp-server-langgraph]

Production Setup

Kubernetes

## Sidecar for OpenTelemetry Collector
- name: otel-collector
  image: otel/opentelemetry-collector:latest
  args:
    - --config=/conf/otel-collector-config.yaml
  volumeMounts:
    - name: otel-config
      mountPath: /conf

Sampling

Configure trace sampling for high-traffic:
## config.py
TRACE_SAMPLE_RATE = 0.1  # Sample 10% of traces

## telemetry.py
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(TRACE_SAMPLE_RATE)

Data Retention

## Jaeger retention
--span-storage.type=elasticsearch
--es.index-prefix=jaeger
--es.tags-as-fields.all=true
--es.num-shards=5
--es.num-replicas=1

## Prometheus retention
--storage.tsdb.retention.time=30d
--storage.tsdb.retention.size=50GB

Troubleshooting

# Check OTLP endpoint
curl -v http://localhost:4317

# Check collector logs
docker compose logs otel-collector

# Verify app configuration
echo $ENABLE_TRACING $OTLP_ENDPOINT

# Test trace export
python scripts/test_tracing.py
Limit label values:
# Bad: user_id in labels (high cardinality)
counter.add(1, {"user_id": user_id})

# Good: user_type in labels
counter.add(1, {"user_type": "premium"})
  • Add indexes on trace_id, span_id
  • Reduce retention period
  • Enable sampling
  • Archive old traces

Next Steps


Full Visibility: Comprehensive observability with OpenTelemetry and LangSmith for complete system insights!