Performance Problems

This guide covers common performance issues and optimization strategies.

Slow LLM Responses

Symptom: Requests take 30+ seconds to complete Diagnosis:

# Enable detailed logging
import logging
logging.getLogger("mcp_server_langgraph").setLevel(logging.DEBUG)

# Check LangSmith trace
# Look for slow steps in execution

Common Causes:

1. Large Context Window

# Solution: Reduce context size
from langchain_core.messages import trim_messages

messages = trim_messages(
    messages,
    max_tokens=4000,  # Reduce from 8000+
    strategy="last",  # Keep most recent
)

2. Too Many Tool Calls

# Limit tool execution iterations
agent = create_langgraph_agent(
    max_iterations=5,  # Add limit
)

3. Slow External Tool Calls

# Add timeouts to tool calls
import asyncio

async def tool_with_timeout():
    try:
        return await asyncio.wait_for(
            slow_tool(),
            timeout=10.0  # 10 second timeout
        )
    except asyncio.TimeoutError:
        return "Tool execution timed out"

Memory Leaks

Symptom: Memory usage grows over time, eventually OOMKilled Diagnosis:

# Add memory profiling
import tracemalloc

tracemalloc.start()

# ... run your code ...

current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current / 10**6}MB")
print(f"Peak memory usage: {peak / 10**6}MB")
tracemalloc.stop()

Common Causes:

1. Session Store Not Clearing Old Sessions

# Add session cleanup
from datetime import datetime, timedelta

async def cleanup_old_sessions():
    cutoff = datetime.utcnow() - timedelta(days=7)
    await session_store.delete_expired(cutoff)

# Run periodically
import asyncio
asyncio.create_task(periodic_cleanup())

2. Large Conversation History

# Limit checkpointed messages
from langgraph.checkpoint import MemorySaver

checkpointer = MemorySaver(
    max_history=20  # Keep last 20 messages only
)

3. Unclosed AsyncMock in Tests

# See: tests/MEMORY_SAFETY_GUIDELINES.md
class TestMyFeature:
    def teardown_method(self):
        # Force garbage collection
        import gc
        gc.collect()

Redis Connection Timeouts

Symptom: redis.exceptions.TimeoutError: Timeout reading from socket Solutions:

1. Connection Pool Exhausted

# Increase connection pool size
REDIS_SESSION_POOL_SIZE=50  # Increase from default 10

2. Redis Under Load

# Check Redis stats
redis-cli INFO stats

# Look for:
# - instantaneous_ops_per_sec
# - used_memory_human
# - evicted_keys

# Scale Redis if needed
kubectl scale statefulset redis --replicas=3 -n mcp-server

3. Network Latency

# Test Redis latency
redis-cli --latency -h redis -p 6379

# If high latency, consider:
# - Using Redis Cluster
# - Enabling Redis pipelining
# - Collocating Redis with app

High CPU Usage

Symptom: CPU usage consistently at 80%+ Diagnosis:

# Profile in production
kubectl top pods -n mcp-server

# Get detailed CPU usage
kubectl exec -it <pod-name> -n mcp-server -- top -b -n 1

Solutions:

1. Inefficient Serialization

# Use orjson instead of json
import orjson

data = orjson.dumps(obj)  # Faster than json.dumps

2. Synchronous Operations Blocking Event Loop

# Move CPU-intensive work to thread pool
import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=4)

async def cpu_intensive_task():
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(executor, blocking_function)

3. Missing Connection Pooling

# Use connection pooling for database
from sqlalchemy.pool import QueuePool

engine = create_async_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=10,
    max_overflow=20,
)

Slow Startup Time

Symptom: Application takes 60+ seconds to start Causes & Solutions:

1. Large Model Loading

# Lazy load models
_model = None

def get_model():
    global _model
    if _model is None:
        _model = load_model()
    return _model

2. Synchronous Initialization

# Use lazy initialization pattern
from mcp_server_langgraph.observability.lazy_init import LazyTelemetryProvider

# Telemetry initializes on first use, not at import

3. Health Check Too Aggressive

# Increase initial delay
livenessProbe:
  initialDelaySeconds: 60  # Increase from 30
  periodSeconds: 15

Request Queue Buildup

Symptom: Requests queuing, response times increasing Diagnosis:

# Check active connections
from prometheus_client import Gauge

active_requests = Gauge('active_requests', 'Number of active requests')

Solutions:

1. Scale Horizontally

# Increase replicas
kubectl scale deployment mcp-server --replicas=5 -n mcp-server

2. Add Rate Limiting

# Configure rate limits
RATE_LIMIT_REQUESTS_PER_MINUTE=100
RATE_LIMIT_TOKENS_PER_MINUTE=10000

3. Enable Request Timeouts

# Add global timeout
import httpx

client = httpx.AsyncClient(timeout=30.0)  # 30 second timeout

Still Having Issues?

For advanced performance optimization:

Enable Metrics: Set up Prometheus metrics collection
Use LangSmith: Enable tracing for LLM call analysis
Profile Locally: Use cProfile for detailed profiling
Review Architecture: See performance best practices

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

Performance Problems

Performance Problems

Slow LLM Responses

1. Large Context Window

2. Too Many Tool Calls

3. Slow External Tool Calls

Memory Leaks

1. Session Store Not Clearing Old Sessions

2. Large Conversation History

3. Unclosed AsyncMock in Tests

Redis Connection Timeouts

1. Connection Pool Exhausted

2. Redis Under Load

3. Network Latency

High CPU Usage

1. Inefficient Serialization

2. Synchronous Operations Blocking Event Loop

3. Missing Connection Pooling

Slow Startup Time

1. Large Model Loading

2. Synchronous Initialization

3. Health Check Too Aggressive

Request Queue Buildup

1. Scale Horizontally

2. Add Rate Limiting

3. Enable Request Timeouts

Still Having Issues?

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

​Performance Problems

​Slow LLM Responses

​1. Large Context Window

​2. Too Many Tool Calls

​3. Slow External Tool Calls

​Memory Leaks

​1. Session Store Not Clearing Old Sessions

​2. Large Conversation History

​3. Unclosed AsyncMock in Tests

​Redis Connection Timeouts

​1. Connection Pool Exhausted

​2. Redis Under Load

​3. Network Latency

​High CPU Usage

​1. Inefficient Serialization

​2. Synchronous Operations Blocking Event Loop

​3. Missing Connection Pooling

​Slow Startup Time

​1. Large Model Loading

​2. Synchronous Initialization

​3. Health Check Too Aggressive

​Request Queue Buildup

​1. Scale Horizontally

​2. Add Rate Limiting

​3. Enable Request Timeouts

​Still Having Issues?

Performance Problems

Slow LLM Responses

1. Large Context Window

2. Too Many Tool Calls

3. Slow External Tool Calls

Memory Leaks

1. Session Store Not Clearing Old Sessions

2. Large Conversation History

3. Unclosed AsyncMock in Tests

Redis Connection Timeouts

1. Connection Pool Exhausted

2. Redis Under Load

3. Network Latency

High CPU Usage

1. Inefficient Serialization

2. Synchronous Operations Blocking Event Loop

3. Missing Connection Pooling

Slow Startup Time

1. Large Model Loading

2. Synchronous Initialization

3. Health Check Too Aggressive

Request Queue Buildup

1. Scale Horizontally

2. Add Rate Limiting

3. Enable Request Timeouts

Still Having Issues?