Overview
Comprehensive observability with dual backends - OpenTelemetry for distributed tracing and metrics, plus LangSmith for LLM-specific insights. Track every request from ingress to LLM response with full context correlation.
Dual observability provides both infrastructure monitoring (OpenTelemetry) and AI-specific insights (LangSmith) in a unified platform.
Architecture
This guide focuses on advanced observability configurations and operational best practices.
Quick Start
Deploy Observability Stack
# Start all observability services
docker compose up -d jaeger prometheus grafana
# Verify services
curl http://localhost:16686 # Jaeger UI
curl http://localhost:9090 # Prometheus
curl http://localhost:3000 # Grafana
Configure Application
# .env
ENABLE_TRACING = true
ENABLE_METRICS = true
OTLP_ENDPOINT = http://localhost:4317
# Optional: LangSmith
LANGSMITH_TRACING = true
LANGSMITH_API_KEY = your-key-here
LANGSMITH_PROJECT = mcp-server-langgraph
Generate Traces
# Make a request
curl -X POST http://localhost:8000/message \
-H "Authorization: Bearer $TOKEN " \
-H "Content-Type: application/json" \
-d '{"query": "Hello!"}'
# Response includes trace_id
{
"content" : "Hello! How can I help you?",
"trace_id" : "abc123def456..."
}
View in Jaeger
Open http://localhost:16686
Select service: mcp-server-langgraph
Click “Find Traces”
Click on trace to see details
OpenTelemetry Tracing
Trace Structure
Every request creates a trace with multiple spans:
Trace: POST /message
├─ Span: http_request
│ ├─ Span: authenticate_user
│ │ ├─ Span: redis_get_session
│ │ └─ Span: keycloak_verify_token
│ ├─ Span: authorize_user
│ │ └─ Span: openfga_check
│ ├─ Span: execute_agent
│ │ ├─ Span: llm_generate
│ │ │ └─ Span: litellm_completion
│ │ └─ Span: tool_execution
│ └─ Span: refresh_session
Trace Attributes
Each span includes rich metadata:
HTTP Spans
LLM Spans
Auth Spans
{
"http.method" : "POST" ,
"http.url" : "/message" ,
"http.status_code" : 200 ,
"http.user_agent" : "curl/7.68.0" ,
"user.id" : "alice" ,
"trace_id" : "abc123..." ,
"span_id" : "def456..."
}
Custom Instrumentation
Add custom spans to your code:
from mcp_server_langgraph.observability.telemetry import tracer
@tracer.start_as_current_span ( "custom_operation" )
def my_function ():
# Your code here
with tracer.start_as_current_span( "sub_operation" ) as span:
span.set_attribute( "custom.attribute" , "value" )
result = do_work()
span.set_attribute( "result.count" , len (result))
return result
Metrics
Available Metrics
HTTP request metrics :
http_requests_total - Total requests by method, status
http_request_duration_seconds - Request latency histogram
http_requests_in_progress - Active requests gauge
Query :# Request rate
rate(http_requests_total[5m])
# p95 latency
histogram_quantile(0.95, http_request_duration_seconds_bucket)
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
Auth metrics (30+ metrics):
auth_attempts_total - Auth attempts by result
auth_session_created_total - Sessions created
auth_session_active - Active sessions gauge
auth_token_validation_duration_seconds - Token validation latency
Query :# Auth success rate
rate(auth_attempts_total{result="success"}[5m]) /
rate(auth_attempts_total[5m])
# Active sessions
auth_session_active
# Failed logins
increase(auth_attempts_total{result="failure"}[1h])
OpenFGA metrics :
authz_check_total - Permission checks by result
authz_check_duration_seconds - Authorization latency
authz_cache_hits_total - Cache hit rate
Query :# Authorization success rate
rate(authz_check_total{allowed="true"}[5m])
# Authorization latency p99
histogram_quantile(0.99, authz_check_duration_seconds_bucket)
# Cache hit rate
rate(authz_cache_hits_total[5m]) /
rate(authz_check_total[5m])
LLM usage metrics :
llm_requests_total - LLM requests by provider, model
llm_tokens_total - Token usage by type (prompt, completion)
llm_latency_seconds - LLM response time
llm_errors_total - LLM errors by type
Query :# Token usage per minute
rate(llm_tokens_total[1m])
# Average LLM latency
rate(llm_latency_seconds_sum[5m]) /
rate(llm_latency_seconds_count[5m])
# Cost estimation (approximate)
rate(llm_tokens_total{type="prompt"}[1h]) * 0.000003 +
rate(llm_tokens_total{type="completion"}[1h]) * 0.000015
Custom Metrics
Create custom metrics:
from mcp_server_langgraph.observability.telemetry import meter
## Counter
request_counter = meter.create_counter(
"custom_requests_total" ,
description = "Total custom requests" ,
unit = "1"
)
request_counter.add( 1 , { "type" : "custom" })
## Histogram
latency_histogram = meter.create_histogram(
"custom_duration_seconds" ,
description = "Custom operation duration" ,
unit = "s"
)
latency_histogram.record( 0.123 , { "operation" : "custom" })
## Gauge
active_gauge = meter.create_up_down_counter(
"custom_active" ,
description = "Active custom operations" ,
unit = "1"
)
active_gauge.add( 1 )
Prometheus Configuration
Scraping
Add scraping configuration:
## prometheus.yml
scrape_configs :
- job_name : 'mcp-server-langgraph'
scrape_interval : 15s
static_configs :
- targets : [ 'mcp-server-langgraph:8000' ]
metrics_path : '/metrics/prometheus'
Alerting Rules
## alerts.yml
groups :
- name : langgraph_agent
interval : 30s
rules :
# High error rate
- alert : HighErrorRate
expr : |
rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for : 2m
annotations :
summary : "High error rate detected"
# High LLM latency
- alert : HighLLMLatency
expr : |
histogram_quantile(0.95,
llm_latency_seconds_bucket) > 5
for : 5m
annotations :
summary : "LLM p95 latency > 5s"
# Auth failures
- alert : AuthFailureSpike
expr : |
rate(auth_attempts_total{result="failure"}[5m]) > 10
for : 1m
annotations :
summary : "Authentication failure spike"
Grafana Dashboards
Import Dashboards
## Import pre-built dashboards
kubectl create configmap grafana-dashboards \
--from-file=dashboards/ \
--namespace=observability
## Label for auto-discovery
kubectl label configmap grafana-dashboards \
grafana_dashboard= 1 \
--namespace=observability
Key Panels
Overview
Authentication
Authorization
LLM
Request rate (RED metrics)
Error rate
p50/p95/p99 latency
Active sessions
LLM token usage
LangSmith Integration
Setup
Get API Key
Generate API key from settings
Configure Application
# .env
LANGSMITH_TRACING = true
LANGSMITH_API_KEY = ls_api_key_...
LANGSMITH_PROJECT = mcp-server-langgraph
LANGSMITH_ENDPOINT = https://api.smith.langchain.com
Verify
Make a request and view in LangSmith UI
Features
View full prompts and responses:
Input messages
System prompts
LLM responses
Token counts
Latency breakdown
See execution flow:
Agent state transitions
Tool invocations
LLM calls
Conditional routing
Test and evaluate:
Accuracy metrics
Cost analysis
Latency benchmarks
A/B testing
Debug issues:
Error traces
Failed requests
Slow queries
Token usage spikes
Custom Annotations
from langsmith import trace
@trace ( name = "custom_step" , project = "mcp-server-langgraph" )
def custom_function ( input_data ):
# Automatically traced to LangSmith
result = process(input_data)
return result
Logging
Structured Logging
All logs are JSON-formatted with trace context:
{
"timestamp" : "2025-10-12T10:30:00.123Z" ,
"level" : "INFO" ,
"service" : "mcp-server-langgraph" ,
"trace_id" : "abc123def456..." ,
"span_id" : "789ghi..." ,
"user_id" : "alice" ,
"event" : "llm_request" ,
"provider" : "anthropic" ,
"model" : "claude-sonnet-4-5-20250929" ,
"tokens" : 212 ,
"latency_ms" : 1245 ,
"status" : "success"
}
Log Aggregation
Loki
ELK Stack
Cloud Logging
# promtail-config.yml
clients :
- url : http://loki:3100/loki/api/v1/push
scrape_configs :
- job_name : mcp-server-langgraph
kubernetes_sd_configs :
- role : pod
namespaces :
names : [ mcp-server-langgraph ]
Production Setup
Kubernetes
## Sidecar for OpenTelemetry Collector
- name : otel-collector
image : otel/opentelemetry-collector:latest
args :
- --config=/conf/otel-collector-config.yaml
volumeMounts :
- name : otel-config
mountPath : /conf
Sampling
Configure trace sampling for high-traffic:
## config.py
TRACE_SAMPLE_RATE = 0.1 # Sample 10% of traces
## telemetry.py
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased( TRACE_SAMPLE_RATE )
Data Retention
## Jaeger retention
--span-storage.type =elasticsearch
--es.index-prefix =jaeger
--es.tags-as-fields.all = true
--es.num-shards =5
--es.num-replicas =1
## Prometheus retention
--storage.tsdb.retention.time =30d
--storage.tsdb.retention.size =50GB
Troubleshooting
# Check OTLP endpoint
curl -v http://localhost:4317
# Check collector logs
docker compose logs otel-collector
# Verify app configuration
echo $ENABLE_TRACING $OTLP_ENDPOINT
# Test trace export
python scripts/test_tracing.py
Limit label values: # Bad: user_id in labels (high cardinality)
counter.add( 1 , { "user_id" : user_id})
# Good: user_type in labels
counter.add( 1 , { "user_type" : "premium" })
Add indexes on trace_id, span_id
Reduce retention period
Enable sampling
Archive old traces
Next Steps
Full Visibility : Comprehensive observability with OpenTelemetry and LangSmith for complete system insights!