LangSmith Integration Guide

Complete guide for integrating LangSmith observability into the MCP Server with LangGraph.

Overview
Quick Start
Configuration
Tracing
Datasets and Evaluation
Feedback Collection
Debugging
Best Practices

Overview

LangSmith is LangChain’s observability and debugging platform. It provides:

Automatic Tracing: Capture all LLM calls and agent steps
Prompt Engineering: Iterate on prompts with production data
Dataset Creation: Build test datasets from traces
Evaluation: Compare model performance
Debugging: Root cause analysis for failures
User Feedback: Collect and analyze user ratings
Cost Tracking: Monitor LLM API costs

Why LangSmith?

✅ For Developers:
See exactly what your agent is doing
Debug failures with full context
Optimize prompts based on real usage
Track costs per user/session
✅ For Teams:
Share traces for collaboration
Create regression test suites
Monitor production performance
Analyze user feedback

Quick Start

1. Create LangSmith Account

Visit: https://smith.langchain.com/
Sign up for free account
Create a new project (e.g., “mcp-server-langgraph”)

2. Get API Key

Go to: https://smith.langchain.com/settings
Click “Create API Key”
Copy the key (starts with lsv2_pt_...)

3. Enable LangSmith Tracing

Option A: Environment Variables

# Add to your .env file
LANGSMITH_API_KEY=lsv2_pt_your_key_here
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph

Option B: Programmatic Configuration

from mcp_server_langgraph.core.config import settings

# LangSmith is automatically configured if API key is set
# Tracing starts immediately when you run your agent

4. Run Your Agent

# Start the agent
python -m mcp_server_langgraph.mcp.server_stdio

# Or run with explicit tracing
LANGSMITH_TRACING=true python -m mcp_server_langgraph.mcp.server_stdio

5. View Traces

Go to: https://smith.langchain.com/
Select your project
See all traces appear in real-time!

Configuration

Environment Variables

Required:

LANGSMITH_API_KEY=lsv2_pt_your_key_here     # Your API key
LANGSMITH_TRACING=true                       # Enable tracing

Optional:

LANGSMITH_PROJECT=mcp-server-langgraph        # Project name
LANGSMITH_ENDPOINT=https://api.smith.langchain.com  # API endpoint
LANGSMITH_TRACING_V2=true                    # Use v2 tracing (recommended)

Programmatic Configuration

The agent is pre-configured to use LangSmith. Configuration is in src/mcp_server_langgraph/core/config.py:

class Settings(BaseSettings):
    # LangSmith Observability
    langsmith_api_key: Optional[str] = None
    langsmith_project: str = "mcp-server-langgraph"
    langsmith_endpoint: str = "https://api.smith.langchain.com"
    langsmith_tracing: bool = False
    langsmith_tracing_v2: bool = True

    # Observability Backend Selection
    observability_backend: str = "both"  # opentelemetry, langsmith, both

Dual Observability (OpenTelemetry + LangSmith)

This project supports both OpenTelemetry and LangSmith:

OpenTelemetry: Infrastructure metrics, distributed tracing, custom metrics
LangSmith: LLM-specific tracing, prompt engineering, evaluations

Enable both:

# .env
OBSERVABILITY_BACKEND=both
LANGSMITH_TRACING=true

Use only LangSmith:

OBSERVABILITY_BACKEND=langsmith
LANGSMITH_TRACING=true

Tracing

Automatic Tracing

When LangSmith is enabled, all agent invocations are automatically traced:

from mcp_server_langgraph.core.agent import agent_graph

# This invocation is automatically traced
result = agent_graph.invoke({
    "messages": [HumanMessage(content="Hello")],
    "user_id": "user123",
    "request_id": "req456"
})

What’s captured:

All LLM calls (prompts, completions, tokens)
Agent routing decisions
Tool invocations
Intermediate states
Timing information
Error stack traces

Manual Tracing

Add custom metadata to traces:

from mcp_server_langgraph.observability.langsmith import langsmith_config

# Create run configuration
run_config = langsmith_config.create_run_config(
    run_name="user-query",
    user_id="user123",
    request_id="req456",
    tags=["production", "premium-user"],
    metadata={"query_type": "analysis"}
)

# Use in invocation
result = agent_graph.invoke(inputs, config=run_config)

Nested Tracing

LangSmith automatically creates hierarchical traces:

┌─ Agent Invocation (2.3s)
│  ├─ Router Node (0.1s)
│  ├─ Tool Call: search (1.5s)
│  │  └─ LLM Call (0.8s)
│  └─ Response Generation (0.7s)
│     └─ LLM Call (0.6s)

Tracing with Context

Add business context to traces:

from langchain_core.runnables import RunnableConfig

config = RunnableConfig(
    tags=["user:alice", "department:sales", "priority:high"],
    metadata={
        "user_id": "alice@company.com",
        "session_id": "sess_123",
        "request_source": "slack",
        "cost_center": "sales-dept"
    }
)

result = agent_graph.invoke(inputs, config=config)

Datasets and Evaluation

Create Dataset from Traces

In LangSmith UI:
- Go to your project
- Filter traces (e.g., “success only”, “last 7 days”)
- Click “Add to Dataset”
- Name your dataset (e.g., “prod-examples-2025-01”)
Programmatically:

from langsmith import Client

client = Client()

# Create dataset
dataset = client.create_dataset("my-test-set")

# Add examples
client.create_examples(
    inputs=[
        {"messages": [{"role": "user", "content": "What is LangGraph?"}]},
        {"messages": [{"role": "user", "content": "How do I deploy?"}]}
    ],
    outputs=[
        {"response": "LangGraph is a framework for building..."},
        {"response": "To deploy, you can use..."}
    ],
    dataset_id=dataset.id
)

Run Evaluations

Compare model performance on datasets:

from langsmith import Client
from mcp_server_langgraph.core.agent import agent_graph

client = Client()

# Run evaluation
results = client.run_on_dataset(
    dataset_name="my-test-set",
    llm_or_chain_factory=lambda: agent_graph,
    project_name="eval-2025-01-10"
)

# View results in LangSmith UI

Custom Evaluators

Create custom evaluation metrics:

from langsmith.evaluation import evaluate

def accuracy_evaluator(run, example):
    """Check if response contains expected keywords"""
    response = run.outputs.get("response", "")
    expected = example.outputs.get("expected_keywords", [])

    score = sum(1 for kw in expected if kw.lower() in response.lower()) / len(expected)

    return {"score": score, "key": "accuracy"}

# Run evaluation with custom evaluator
evaluate(
    lambda inputs: agent_graph.invoke(inputs),
    data="my-test-set",
    evaluators=[accuracy_evaluator],
    experiment_prefix="custom-eval"
)

Feedback Collection

Programmatic Feedback

Collect user feedback on responses:

from langsmith import Client

client = Client()

# After agent response, collect feedback
run_id = "run-uuid-from-trace"  # Get from trace

client.create_feedback(
    run_id=run_id,
    key="user_rating",
    score=1.0,  # 0.0 to 1.0
    comment="Great response!",
    source_info={"user_id": "user123"}
)

Feedback Schema

Built-in feedback types:

Thumbs up/down: Binary rating

client.create_feedback(run_id=run_id, key="thumbs", score=1.0)  # thumbs up

Star rating: 1-5 stars

client.create_feedback(run_id=run_id, key="stars", score=0.8)  # 4 stars

Correctness: Factual accuracy

client.create_feedback(run_id=run_id, key="correctness", score=1.0)

Feedback Analysis

View feedback in LangSmith:

Go to project
Click “Feedback” tab
Filter by feedback type
Analyze trends over time

Debugging

Find Failing Traces

In LangSmith UI:

Go to your project
Filter: status:error
Sort by: timestamp desc
Click on trace to see details

Programmatically:

from langsmith import Client

client = Client()

# Get recent error traces
runs = client.list_runs(
    project_name="mcp-server-langgraph",
    error=True,
    limit=10
)

for run in runs:
    print(f"Error: {run.error}")
    print(f"Stack trace: {run.stacktrace}")

Analyze Slow Traces

Find performance bottlenecks: In LangSmith UI:

Filter: latency > 5s
Sort by: latency desc
Expand trace to see timing breakdown

Optimize based on findings:

Identify slow LLM calls → use faster model
Identify slow tool calls → add caching
Identify redundant calls → optimize logic

Compare Traces

Compare successful vs failed traces:

Select two traces (shift+click)
Click “Compare”
See side-by-side diff of:
- Inputs
- Intermediate steps
- Outputs
- Timing

Root Cause Analysis

For any trace, you can see:

Full input/output: Exact data sent and received
Intermediate steps: All agent decisions
LLM calls: Prompts and completions
Error stack traces: Full Python traceback
Timing breakdown: Where time was spent
Token usage: Tokens per LLM call
Cost: Estimated cost per call

Best Practices

1. Project Organization

Create separate projects for:

Development: my-agent-dev
Staging: my-agent-staging
Production: my-agent-prod

# Set project per environment
LANGSMITH_PROJECT=my-agent-prod python -m mcp_server_langgraph.mcp.server_stdio

2. Tagging Strategy

Use consistent tags:

Environment: production, staging, development
User tier: free, pro, enterprise
Feature: chat, analysis, search
Priority: high, medium, low

config = RunnableConfig(
    tags=["production", "pro-user", "chat", "high-priority"]
)

3. Metadata Best Practices

Include actionable metadata:

metadata = {
    "user_id": "user123",           # Track per-user
    "session_id": "sess_456",       # Group conversations
    "request_source": "web",        # API vs web vs mobile
    "model_version": "v1.2.0",      # Track model changes
    "deployment": "us-west-2",      # Geographic tracking
    "cost_center": "sales",         # Business unit
}

4. Sampling for High Volume

For very high-traffic applications:

import random

# Sample 10% of requests
should_trace = random.random() < 0.1

if should_trace:
    config = RunnableConfig(tags=["sampled"])
else:
    config = RunnableConfig(tags=["not-traced"])

5. Cost Monitoring

Track costs in LangSmith:

Go to project
Click “Analytics”
View “Cost Over Time”
Set budget alerts

Optimize costs:

Use cheaper models for simple tasks
Implement caching
Set max token limits
Monitor high-cost users

6. Privacy and Compliance

Redact sensitive data:

def redact_pii(text):
    # Redact emails, phone numbers, etc.
    import re
    text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
    text = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', '[PHONE]', text)
    return text

# Before logging
inputs = {"messages": [{"content": redact_pii(user_input)}]}

GDPR compliance:

Delete user traces on request
Set data retention policies
Use metadata for user identification

7. Performance Monitoring

Set up monitoring for:

Latency (P95): Alert if >5 seconds
Error rate: Alert if >5%
Token usage: Alert on anomalies
Cost per user: Track trends

Advanced Features

Prompt Hub

Save and version prompts:

from langchain import hub

# Pull prompt from hub
prompt = hub.pull("my-org/agent-prompt:v3")

# Use in agent
model_with_prompt = model.bind(prompt=prompt)

Online Evaluation

Run evaluations on production traces:

from langsmith.evaluation import evaluate_existing

# Evaluate last 100 production traces
evaluate_existing(
    "mcp-server-langgraph",
    data="last-100-traces",
    evaluators=[accuracy_evaluator, toxicity_evaluator]
)

A/B Testing

Compare model versions:

# Version A
config_a = RunnableConfig(tags=["model:gpt-4", "version:a"])

# Version B
config_b = RunnableConfig(tags=["model:claude-3.5", "version:b"])

# Analyze in LangSmith: filter by tag, compare metrics

Troubleshooting

Traces Not Appearing

Check:

API key is set: echo $LANGSMITH_API_KEY
Tracing is enabled: echo $LANGSMITH_TRACING
Project name is correct: echo $LANGSMITH_PROJECT
Network connectivity to api.smith.langchain.com

Test connection:

from langsmith import Client

client = Client()
print(client.list_projects())  # Should list your projects

Slow Tracing Performance

LangSmith tracing is async and shouldn’t slow down requests. If you experience slowness:

Check network latency to api.smith.langchain.com
Reduce trace size: Avoid logging huge payloads
Use sampling: Don’t trace every request

Missing LLM Calls

If LLM calls aren’t traced:

# Ensure using LangChain models
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

# These are automatically traced
model = ChatAnthropic(model="claude-sonnet-4-5-20250929")

Resources

LangSmith Docs: https://docs.langchain.com/langsmith
LangSmith Cookbook: https://github.com/langchain-ai/langsmith-cookbook
API Reference: https://docs.smith.langchain.com/api-reference
Community: https://github.com/langchain-ai/langsmith-sdk/discussions

Need help?

LangSmith support: https://support.langchain.com/
Project issues: https://github.com/vishnu2kmohan/mcp-server-langgraph/issues

Last Updated: 2025-10-10

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

​LangSmith Integration Guide

​Table of Contents

​Overview

​Why LangSmith?

​Quick Start

​1. Create LangSmith Account

​2. Get API Key

​3. Enable LangSmith Tracing

​4. Run Your Agent

​5. View Traces

​Configuration

​Environment Variables

​Programmatic Configuration

​Dual Observability (OpenTelemetry + LangSmith)

​Tracing

​Automatic Tracing

​Manual Tracing

​Nested Tracing

​Tracing with Context

​Datasets and Evaluation

​Create Dataset from Traces

​Run Evaluations

​Custom Evaluators

​Feedback Collection

​Programmatic Feedback

​Feedback Schema

​Feedback Analysis

​Debugging

​Find Failing Traces

​Analyze Slow Traces

​Compare Traces

​Root Cause Analysis

​Best Practices

​1. Project Organization

​2. Tagging Strategy

​3. Metadata Best Practices

​4. Sampling for High Volume

​5. Cost Monitoring

​6. Privacy and Compliance

​7. Performance Monitoring

​Advanced Features

​Prompt Hub

​Online Evaluation

​A/B Testing

​Troubleshooting

​Traces Not Appearing

​Slow Tracing Performance

​Missing LLM Calls

​Resources

LangSmith Integration Guide

Table of Contents

Overview

Why LangSmith?

Quick Start

1. Create LangSmith Account

2. Get API Key

3. Enable LangSmith Tracing

4. Run Your Agent

5. View Traces

Configuration

Environment Variables

Programmatic Configuration

Dual Observability (OpenTelemetry + LangSmith)

Tracing

Automatic Tracing

Manual Tracing

Nested Tracing

Tracing with Context

Datasets and Evaluation

Create Dataset from Traces

Run Evaluations

Custom Evaluators

Feedback Collection

Programmatic Feedback

Feedback Schema

Feedback Analysis

Debugging

Find Failing Traces

Analyze Slow Traces

Compare Traces

Root Cause Analysis

Best Practices

1. Project Organization

2. Tagging Strategy

3. Metadata Best Practices

4. Sampling for High Volume

5. Cost Monitoring

6. Privacy and Compliance

7. Performance Monitoring

Advanced Features

Prompt Hub

Online Evaluation

A/B Testing

Troubleshooting

Traces Not Appearing

Slow Tracing Performance

Missing LLM Calls

Resources