Observability - MCP Server with LangGraph

Overview

Comprehensive observability with dual backends - OpenTelemetry for distributed tracing and metrics, plus LangSmith for LLM-specific insights. Track every request from ingress to LLM response with full context correlation.

Dual observability provides both infrastructure monitoring (OpenTelemetry) and AI-specific insights (LangSmith) in a unified platform.

Architecture

For the complete observability architecture diagram and getting started guide, see Observability Overview.

This guide focuses on advanced observability configurations and operational best practices.

Quick Start

Deploy Observability Stack

# Start all observability services
docker compose up -d jaeger prometheus grafana

# Verify services
curl http://localhost:16686  # Jaeger UI
curl http://localhost:9090   # Prometheus
curl http://localhost:3000   # Grafana

Configure Application

# .env
ENABLE_TRACING=true
ENABLE_METRICS=true
OTLP_ENDPOINT=http://localhost:4317

# Optional: LangSmith
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-key-here
LANGSMITH_PROJECT=mcp-server-langgraph

Generate Traces

# Make a request
curl -X POST http://localhost:8000/message \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "Hello!"}'

# Response includes trace_id
{
  "content": "Hello! How can I help you?",
  "trace_id": "abc123def456..."
}

View in Jaeger

Open http://localhost:16686
Select service: mcp-server-langgraph
Click “Find Traces”
Click on trace to see details

OpenTelemetry Tracing

Trace Structure

Every request creates a trace with multiple spans:

Trace: POST /message
├─ Span: http_request
│  ├─ Span: authenticate_user
│  │  ├─ Span: redis_get_session
│  │  └─ Span: keycloak_verify_token
│  ├─ Span: authorize_user
│  │  └─ Span: openfga_check
│  ├─ Span: execute_agent
│  │  ├─ Span: llm_generate
│  │  │  └─ Span: litellm_completion
│  │  └─ Span: tool_execution
│  └─ Span: refresh_session

Trace Attributes

Each span includes rich metadata:

HTTP Spans
LLM Spans
Auth Spans

{
  "http.method": "POST",
  "http.url": "/message",
  "http.status_code": 200,
  "http.user_agent": "curl/7.68.0",
  "user.id": "alice",
  "trace_id": "abc123...",
  "span_id": "def456..."
}

{
  "llm.provider": "anthropic",
  "llm.model": "claude-sonnet-4-5-20250929",
  "llm.temperature": 0.7,
  "llm.max_tokens": 4096,
  "llm.prompt_tokens": 125,
  "llm.completion_tokens": 87,
  "llm.total_tokens": 212,
  "llm.latency_ms": 1245
}

{
  "auth.method": "session",
  "auth.session_id": "xyz789...",
  "auth.user_id": "alice",
  "authz.resource": "tool:chat",
  "authz.relation": "executor",
  "authz.allowed": true,
  "authz.latency_ms": 15
}

Custom Instrumentation

Add custom spans to your code:

from mcp_server_langgraph.observability.telemetry import tracer

@tracer.start_as_current_span("custom_operation")
def my_function():
    # Your code here
    with tracer.start_as_current_span("sub_operation") as span:
        span.set_attribute("custom.attribute", "value")
        result = do_work()
        span.set_attribute("result.count", len(result))
        return result

Metrics

Available Metrics

Request Metrics

HTTP request metrics:

http_requests_total - Total requests by method, status
http_request_duration_seconds - Request latency histogram
http_requests_in_progress - Active requests gauge

Query:

# Request rate
rate(http_requests_total[5m])

# p95 latency
histogram_quantile(0.95, http_request_duration_seconds_bucket)

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

Authentication Metrics

Auth metrics (30+ metrics):

auth_attempts_total - Auth attempts by result
auth_session_created_total - Sessions created
auth_session_active - Active sessions gauge
auth_token_validation_duration_seconds - Token validation latency

Query:

# Auth success rate
rate(auth_attempts_total{result="success"}[5m]) /
  rate(auth_attempts_total[5m])

# Active sessions
auth_session_active

# Failed logins
increase(auth_attempts_total{result="failure"}[1h])

Authorization Metrics

OpenFGA metrics:

authz_check_total - Permission checks by result
authz_check_duration_seconds - Authorization latency
authz_cache_hits_total - Cache hit rate

Query:

# Authorization success rate
rate(authz_check_total{allowed="true"}[5m])

# Authorization latency p99
histogram_quantile(0.99, authz_check_duration_seconds_bucket)

# Cache hit rate
rate(authz_cache_hits_total[5m]) /
  rate(authz_check_total[5m])

LLM Metrics

LLM usage metrics:

llm_requests_total - LLM requests by provider, model
llm_tokens_total - Token usage by type (prompt, completion)
llm_latency_seconds - LLM response time
llm_errors_total - LLM errors by type

Query:

# Token usage per minute
rate(llm_tokens_total[1m])

# Average LLM latency
rate(llm_latency_seconds_sum[5m]) /
  rate(llm_latency_seconds_count[5m])

# Cost estimation (approximate)
rate(llm_tokens_total{type="prompt"}[1h]) * 0.000003 +
rate(llm_tokens_total{type="completion"}[1h]) * 0.000015

Custom Metrics

Create custom metrics:

from mcp_server_langgraph.observability.telemetry import meter

## Counter
request_counter = meter.create_counter(
    "custom_requests_total",
    description="Total custom requests",
    unit="1"
)
request_counter.add(1, {"type": "custom"})

## Histogram
latency_histogram = meter.create_histogram(
    "custom_duration_seconds",
    description="Custom operation duration",
    unit="s"
)
latency_histogram.record(0.123, {"operation": "custom"})

## Gauge
active_gauge = meter.create_up_down_counter(
    "custom_active",
    description="Active custom operations",
    unit="1"
)
active_gauge.add(1)

Prometheus Configuration

Scraping

Add scraping configuration:

## prometheus.yml
scrape_configs:
  - job_name: 'mcp-server-langgraph'
    scrape_interval: 15s
    static_configs:
      - targets: ['mcp-server-langgraph:8000']
    metrics_path: '/metrics/prometheus'

Alerting Rules

## alerts.yml
groups:
  - name: langgraph_agent
    interval: 30s
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 2m
        annotations:
          summary: "High error rate detected"

      # High LLM latency
      - alert: HighLLMLatency
        expr: |
          histogram_quantile(0.95,
            llm_latency_seconds_bucket) > 5
        for: 5m
        annotations:
          summary: "LLM p95 latency > 5s"

      # Auth failures
      - alert: AuthFailureSpike
        expr: |
          rate(auth_attempts_total{result="failure"}[5m]) > 10
        for: 1m
        annotations:
          summary: "Authentication failure spike"

Grafana Dashboards

Import Dashboards

## Import pre-built dashboards
kubectl create configmap grafana-dashboards \
  --from-file=dashboards/ \
  --namespace=observability

## Label for auto-discovery
kubectl label configmap grafana-dashboards \
  grafana_dashboard=1 \
  --namespace=observability

Key Panels

Overview
Authentication
Authorization
LLM

Request rate (RED metrics)
Error rate
p50/p95/p99 latency
Active sessions
LLM token usage

LangSmith Integration

Setup

Create Account

Get API Key

Generate API key from settings

Configure Application

# .env
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=ls_api_key_...
LANGSMITH_PROJECT=mcp-server-langgraph
LANGSMITH_ENDPOINT=https://api.smith.langchain.com

Verify

Make a request and view in LangSmith UI

Features

Prompt Tracking

View full prompts and responses:

Input messages
System prompts
LLM responses
Token counts
Latency breakdown

Chain Visualization

See execution flow:

Agent state transitions
Tool invocations
LLM calls
Conditional routing

Evaluations

Test and evaluate:

Accuracy metrics
Cost analysis
Latency benchmarks
A/B testing

Debugging

Debug issues:

Error traces
Failed requests
Slow queries
Token usage spikes

Custom Annotations

from langsmith import trace

@trace(name="custom_step", project="mcp-server-langgraph")
def custom_function(input_data):
    # Automatically traced to LangSmith
    result = process(input_data)
    return result

Logging

Structured Logging

All logs are JSON-formatted with trace context:

{
  "timestamp": "2025-10-12T10:30:00.123Z",
  "level": "INFO",
  "service": "mcp-server-langgraph",
  "trace_id": "abc123def456...",
  "span_id": "789ghi...",
  "user_id": "alice",
  "event": "llm_request",
  "provider": "anthropic",
  "model": "claude-sonnet-4-5-20250929",
  "tokens": 212,
  "latency_ms": 1245,
  "status": "success"
}

Log Aggregation

Loki
ELK Stack
Cloud Logging

# promtail-config.yml
clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: mcp-server-langgraph
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: [mcp-server-langgraph]

# filebeat.yml
filebeat.inputs:
  - type: container
    paths:
      - '/var/log/containers/*mcp-server-langgraph*.log'

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

# Google Cloud Logging
import google.cloud.logging
client = google.cloud.logging.Client()
client.setup_logging()

# AWS CloudWatch
import watchtower
logger.addHandler(watchtower.CloudWatchLogHandler())

Production Setup

Kubernetes

## Sidecar for OpenTelemetry Collector
- name: otel-collector
  image: otel/opentelemetry-collector:latest
  args:
    - --config=/conf/otel-collector-config.yaml
  volumeMounts:
    - name: otel-config
      mountPath: /conf

Sampling

Configure trace sampling for high-traffic:

## config.py
TRACE_SAMPLE_RATE = 0.1  # Sample 10% of traces

## telemetry.py
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(TRACE_SAMPLE_RATE)

Data Retention

## Jaeger retention
--span-storage.type=elasticsearch
--es.index-prefix=jaeger
--es.tags-as-fields.all=true
--es.num-shards=5
--es.num-replicas=1

## Prometheus retention
--storage.tsdb.retention.time=30d
--storage.tsdb.retention.size=50GB

Troubleshooting

No traces appearing

# Check OTLP endpoint
curl -v http://localhost:4317

# Check collector logs
docker compose logs otel-collector

# Verify app configuration
echo $ENABLE_TRACING $OTLP_ENDPOINT

# Test trace export
python scripts/test_tracing.py

High cardinality metrics

Limit label values:

# Bad: user_id in labels (high cardinality)
counter.add(1, {"user_id": user_id})

# Good: user_type in labels
counter.add(1, {"user_type": "premium"})

Slow trace queries

Add indexes on trace_id, span_id
Reduce retention period
Enable sampling
Archive old traces

Next Steps

Monitoring Guide

Production monitoring setup

Alerting

Configure alerts

Health Checks

Health check endpoints

Production Checklist

Observability requirements

Full Visibility: Comprehensive observability with OpenTelemetry and LangSmith for complete system insights!

​Overview

​Architecture

​Quick Start

​OpenTelemetry Tracing

​Trace Structure

​Trace Attributes

​Custom Instrumentation

​Metrics

​Available Metrics

​Custom Metrics

​Prometheus Configuration

​Scraping

​Alerting Rules

​Grafana Dashboards

​Import Dashboards

​Key Panels

​LangSmith Integration

​Setup

​Features

​Custom Annotations

​Logging

​Structured Logging

​Log Aggregation

​Production Setup

​Kubernetes

​Sampling

​Data Retention

​Troubleshooting

​Next Steps

Monitoring Guide

Alerting

Health Checks

Production Checklist

Overview

Architecture

Quick Start

OpenTelemetry Tracing

Trace Structure

Trace Attributes

Custom Instrumentation

Metrics

Available Metrics

Custom Metrics

Prometheus Configuration

Scraping

Alerting Rules

Grafana Dashboards

Import Dashboards

Key Panels

LangSmith Integration

Setup

Features

Custom Annotations

Logging

Structured Logging

Log Aggregation

Production Setup

Kubernetes

Sampling

Data Retention

Troubleshooting

Next Steps