Skip to main content

Overview

LangSmith provides specialized observability for LLM applications, complementing OpenTelemetry’s infrastructure monitoring with LLM-specific insights.
Dual Observability: This project supports both OpenTelemetry (infrastructure) and LangSmith (LLM-specific) tracing simultaneously.

Why LangSmith?

LangSmith gives you superpowers for debugging and optimizing LLM applications:

Trace Every LLM Call

See exact prompts, completions, and intermediate steps

Prompt Engineering

Iterate on prompts using real production data

Create Datasets

Build test datasets from production traces

Compare Models

Run evaluations to compare model performance

Track Costs

Monitor LLM API costs per user and session

Collect Feedback

Gather user ratings and analyze trends

Quick Start

1. Create LangSmith Account

1

Sign Up

Visit smith.langchain.com and create a free account
2

Create Project

Create a new project (e.g., “mcp-server-langgraph”)
3

Get API Key

Go to Settings → API Keys → Create API Key

2. Configure Environment

Add to your .env file:
## LangSmith Configuration
LANGSMITH_API_KEY=lsv2_pt_your_key_here
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=mcp-server-langgraph

## Enable dual observability
OBSERVABILITY_BACKEND=both  # OpenTelemetry + LangSmith

3. Start Tracing

That’s it! All agent invocations are now automatically traced:
from agent import agent_graph

## This is automatically traced in LangSmith
result = agent_graph.invoke({
    "messages": [{"role": "user", "content": "Hello!"}],
    "user_id": "alice",
    "request_id": "req123"
})

4. View Traces

Visit smith.langchain.com to see your traces in real-time!

What’s Captured

Every trace includes:
  • Full prompts sent to LLM
  • Complete model responses
  • Token counts (input/output)
  • Model parameters (temperature, max_tokens)
  • Latency breakdown
  • Routing decisions
  • Tool invocations
  • Intermediate states
  • Conditional logic flows
  • User ID and session ID
  • Request source
  • Environment (dev/staging/prod)
  • Custom tags
  • Full Python stack traces
  • Input that caused error
  • Error context and timing

Adding Custom Metadata

Enrich traces with business context:
from langsmith_config import langsmith_config
from langchain_core.runnables import RunnableConfig

## Create rich run configuration
config = langsmith_config.create_run_config(
    run_name="premium-user-analysis",
    user_id="alice@company.com",
    request_id="req789",
    tags=["premium", "priority-high", "sales-dept"],
    metadata={
        "session_id": "sess_abc123",
        "request_source": "slack",
        "cost_center": "sales",
        "user_tier": "premium"
    }
)

## Use in invocation
result = agent_graph.invoke(inputs, config=config)
Now you can filter traces by:
  • User tier: Find all premium user interactions
  • Department: Analyze usage by cost center
  • Priority: Debug high-priority requests first

Datasets and Evaluation

Create Dataset from Traces

1

Filter Traces

In LangSmith UI, filter for successful traces from the last week
2

Select Examples

Click “Add to Dataset” and select representative examples
3

Name Dataset

Save as “prod-examples-2025-01”

Run Evaluations

Compare model performance:
from langsmith import Client

client = Client()

## Run evaluation on dataset
results = client.run_on_dataset(
    dataset_name="prod-examples-2025-01",
    llm_or_chain_factory=lambda: agent_graph,
    project_name="eval-gpt4-vs-claude"
)
View results in LangSmith UI to see:
  • Latency comparison: Which model is faster?
  • Cost analysis: Which is more cost-effective?
  • Quality metrics: Custom evaluators for accuracy

Collecting User Feedback

Capture user ratings on responses:
from langsmith import Client

client = Client()

## After agent response
client.create_feedback(
    run_id=run_id,  # From trace
    key="user_rating",
    score=1.0,  # 1.0 = thumbs up, 0.0 = thumbs down
    comment="Very helpful response!",
    source_info={"user_id": "alice"}
)
Integrate feedback collection into your UI to gather real user sentiment on responses.

Debugging with LangSmith

Find Failing Traces

1

Filter by Status

In LangSmith UI, filter: status:error
2

Sort by Time

Sort by timestamp descending to see recent failures
3

Analyze

Click on trace to see full error context:
  • Exact input that caused failure
  • Full Python stack trace
  • All intermediate steps before error
  • Timing information

Optimize Slow Traces

Find performance bottlenecks:
  1. Filter: latency > 5s
  2. Sort: By latency descending
  3. Expand trace: See timing breakdown
  4. Identify bottleneck:
    • Slow LLM calls → Try faster model
    • Slow tool calls → Add caching
    • Redundant calls → Optimize logic

Compare Traces

Compare successful vs. failed traces side-by-side:
1

Select Two Traces

Shift+click to select two traces
2

Click Compare

See side-by-side diff of inputs, steps, outputs
3

Identify Differences

Find what caused different outcomes

Best Practices

Create separate projects for each environment:
  • my-agent-dev - Development
  • my-agent-staging - Staging
  • my-agent-prod - Production
# Set project per environment
LANGSMITH_PROJECT=my-agent-prod
Use consistent, searchable tags:
  • Environment: production, staging, development
  • User tier: free, pro, enterprise
  • Feature: chat, analysis, search
  • Priority: high, medium, low
Track costs in LangSmith:
  1. Go to project → Analytics
  2. View “Cost Over Time” chart
  3. Set budget alerts
  4. Identify high-cost users/features
For very high traffic, sample traces:
import random

# Trace 10% of requests
if random.random() < 0.1:
    config = RunnableConfig(tags=["sampled"])
Redact sensitive data before logging:
def redact_pii(text):
    import re
    text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
    text = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', '[PHONE]', text)
    return text

Integration with OpenTelemetry

This project supports both observability systems simultaneously:
FeatureOpenTelemetryLangSmith
Infrastructure Metrics✅ CPU, memory, network
Distributed Tracing✅ Service-to-service⚠️ Limited
LLM Call Tracing⚠️ Basic✅ Full details
Prompt Engineering
Model Evaluation
Cost Tracking
User Feedback
Best of both worlds: Use OpenTelemetry for infrastructure monitoring and LangSmith for LLM-specific insights.

Example Code

See complete examples in the repository:
## Basic tracing
from agent import agent_graph

result = agent_graph.invoke({
    "messages": [{"role": "user", "content": "Hello"}],
    "user_id": "alice"
})

Resources


Ready to trace? Just set LANGSMITH_TRACING=true in your .env and all agent invocations will be automatically traced!