LangSmith Tracing - MCP Server with LangGraph

Overview

LangSmith provides specialized observability for LLM applications, complementing OpenTelemetry’s infrastructure monitoring with LLM-specific insights.

Dual Observability: This project supports both OpenTelemetry (infrastructure) and LangSmith (LLM-specific) tracing simultaneously.

Why LangSmith?

LangSmith gives you superpowers for debugging and optimizing LLM applications:

Trace Every LLM Call

See exact prompts, completions, and intermediate steps

Prompt Engineering

Iterate on prompts using real production data

Create Datasets

Build test datasets from production traces

Compare Models

Run evaluations to compare model performance

Track Costs

Monitor LLM API costs per user and session

Collect Feedback

Gather user ratings and analyze trends

Quick Start

1. Create LangSmith Account

Visit smith.langchain.com and create a free account

Create Project

Create a new project (e.g., “mcp-server-langgraph”)

Get API Key

Go to Settings → API Keys → Create API Key

2. Configure Environment

Add to your .env file:

## LangSmith Configuration
LANGSMITH_API_KEY=lsv2_pt_your_key_here
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph

## Enable dual observability
OBSERVABILITY_BACKEND=both  # OpenTelemetry + LangSmith

3. Start Tracing

That’s it! All agent invocations are now automatically traced:

from agent import agent_graph

## This is automatically traced in LangSmith
result = agent_graph.invoke({
    "messages": [{"role": "user", "content": "Hello!"}],
    "user_id": "alice",
    "request_id": "req123"
})

4. View Traces

Visit smith.langchain.com to see your traces in real-time!

What’s Captured

Every trace includes:

LLM Calls

Full prompts sent to LLM
Complete model responses
Token counts (input/output)
Model parameters (temperature, max_tokens)
Latency breakdown

Agent Steps

Routing decisions
Tool invocations
Intermediate states
Conditional logic flows

Metadata

User ID and session ID
Request source
Environment (dev/staging/prod)
Custom tags

Errors

Full Python stack traces
Input that caused error
Error context and timing

Adding Custom Metadata

Enrich traces with business context:

from langsmith_config import langsmith_config
from langchain_core.runnables import RunnableConfig

## Create rich run configuration
config = langsmith_config.create_run_config(
    run_name="premium-user-analysis",
    user_id="alice@company.com",
    request_id="req789",
    tags=["premium", "priority-high", "sales-dept"],
    metadata={
        "session_id": "sess_abc123",
        "request_source": "slack",
        "cost_center": "sales",
        "user_tier": "premium"
    }
)

## Use in invocation
result = agent_graph.invoke(inputs, config=config)

Now you can filter traces by:

User tier: Find all premium user interactions
Department: Analyze usage by cost center
Priority: Debug high-priority requests first

Datasets and Evaluation

Create Dataset from Traces

Filter Traces

In LangSmith UI, filter for successful traces from the last week

Select Examples

Click “Add to Dataset” and select representative examples

Name Dataset

Save as “prod-examples-2025-01”

Run Evaluations

Compare model performance:

from langsmith import Client

client = Client()

## Run evaluation on dataset
results = client.run_on_dataset(
    dataset_name="prod-examples-2025-01",
    llm_or_chain_factory=lambda: agent_graph,
    project_name="eval-gpt4-vs-claude"
)

View results in LangSmith UI to see:

Latency comparison: Which model is faster?
Cost analysis: Which is more cost-effective?
Quality metrics: Custom evaluators for accuracy

Collecting User Feedback

Capture user ratings on responses:

from langsmith import Client

client = Client()

## After agent response
client.create_feedback(
    run_id=run_id,  # From trace
    key="user_rating",
    score=1.0,  # 1.0 = thumbs up, 0.0 = thumbs down
    comment="Very helpful response!",
    source_info={"user_id": "alice"}
)

Integrate feedback collection into your UI to gather real user sentiment on responses.

Debugging with LangSmith

Find Failing Traces

Filter by Status

In LangSmith UI, filter: status:error

Sort by Time

Sort by timestamp descending to see recent failures

Analyze

Click on trace to see full error context:

Exact input that caused failure
Full Python stack trace
All intermediate steps before error
Timing information

Optimize Slow Traces

Find performance bottlenecks:

Filter: latency > 5s
Sort: By latency descending
Expand trace: See timing breakdown
Identify bottleneck:
- Slow LLM calls → Try faster model
- Slow tool calls → Add caching
- Redundant calls → Optimize logic

Compare Traces

Compare successful vs. failed traces side-by-side:

Select Two Traces

Shift+click to select two traces

Click Compare

See side-by-side diff of inputs, steps, outputs

Identify Differences

Find what caused different outcomes

Best Practices

Project Organization

Create separate projects for each environment:

my-agent-dev - Development
my-agent-staging - Staging
my-agent-prod - Production

# Set project per environment
LANGSMITH_PROJECT=my-agent-prod

Tagging Strategy

Use consistent, searchable tags:

Environment: production, staging, development
User tier: free, pro, enterprise
Feature: chat, analysis, search
Priority: high, medium, low

Cost Monitoring

Track costs in LangSmith:

Go to project → Analytics
View “Cost Over Time” chart
Set budget alerts
Identify high-cost users/features

Sampling (High Volume)

For very high traffic, sample traces:

import random

# Trace 10% of requests
if random.random() < 0.1:
    config = RunnableConfig(tags=["sampled"])

Privacy Compliance

Redact sensitive data before logging:

def redact_pii(text):
    import re
    text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
    text = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', '[PHONE]', text)
    return text

Integration with OpenTelemetry

This project supports both observability systems simultaneously:

Feature	OpenTelemetry	LangSmith
Infrastructure Metrics	✅ CPU, memory, network	❌
Distributed Tracing	✅ Service-to-service	⚠️ Limited
LLM Call Tracing	⚠️ Basic	✅ Full details
Prompt Engineering	❌	✅
Model Evaluation	❌	✅
Cost Tracking	❌	✅
User Feedback	❌	✅

Best of both worlds: Use OpenTelemetry for infrastructure monitoring and LangSmith for LLM-specific insights.

Example Code

See complete examples in the repository:

## Basic tracing
from agent import agent_graph

result = agent_graph.invoke({
    "messages": [{"role": "user", "content": "Hello"}],
    "user_id": "alice"
})

Resources

LangSmith Docs

Official LangSmith documentation

Observability Guide

Dual observability setup

API Reference

LangSmith API reference

Example Code

Complete code examples

Ready to trace? Just set LANGSMITH_TRACING=true in your .env and all agent invocations will be automatically traced!

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

​Overview

​Why LangSmith?

Trace Every LLM Call

Prompt Engineering

Create Datasets

Compare Models

Track Costs

Collect Feedback

​Quick Start

​1. Create LangSmith Account

​2. Configure Environment

​3. Start Tracing

​4. View Traces

​What’s Captured

​Adding Custom Metadata

​Datasets and Evaluation

​Create Dataset from Traces

​Run Evaluations

​Collecting User Feedback

​Debugging with LangSmith

​Find Failing Traces

​Optimize Slow Traces

​Compare Traces

​Best Practices

​Integration with OpenTelemetry

​Example Code

​Resources

LangSmith Docs

Observability Guide

API Reference

Example Code

Overview

Why LangSmith?

Quick Start

1. Create LangSmith Account

2. Configure Environment

3. Start Tracing

4. View Traces

What’s Captured

Adding Custom Metadata

Datasets and Evaluation

Create Dataset from Traces

Run Evaluations

Collecting User Feedback

Debugging with LangSmith

Find Failing Traces

Optimize Slow Traces

Compare Traces

Best Practices

Integration with OpenTelemetry

Example Code

Resources