Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Performance Problems
This guide covers common performance issues and optimization strategies.
Slow LLM Responses
Symptom: Requests take 30+ seconds to complete
Diagnosis:
# Enable detailed logging
import logging
logging.getLogger("mcp_server_langgraph").setLevel(logging.DEBUG)
# Check LangSmith trace
# Look for slow steps in execution
Common Causes:
1. Large Context Window
# Solution: Reduce context size
from langchain_core.messages import trim_messages
messages = trim_messages(
messages,
max_tokens=4000, # Reduce from 8000+
strategy="last", # Keep most recent
)
# Limit tool execution iterations
agent = create_langgraph_agent(
max_iterations=5, # Add limit
)
# Add timeouts to tool calls
import asyncio
async def tool_with_timeout():
try:
return await asyncio.wait_for(
slow_tool(),
timeout=10.0 # 10 second timeout
)
except asyncio.TimeoutError:
return "Tool execution timed out"
Memory Leaks
Symptom: Memory usage grows over time, eventually OOMKilled
Diagnosis:
# Add memory profiling
import tracemalloc
tracemalloc.start()
# ... run your code ...
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage: {current / 10**6}MB")
print(f"Peak memory usage: {peak / 10**6}MB")
tracemalloc.stop()
Common Causes:
1. Session Store Not Clearing Old Sessions
# Add session cleanup
from datetime import datetime, timedelta
async def cleanup_old_sessions():
cutoff = datetime.utcnow() - timedelta(days=7)
await session_store.delete_expired(cutoff)
# Run periodically
import asyncio
asyncio.create_task(periodic_cleanup())
2. Large Conversation History
# Limit checkpointed messages
from langgraph.checkpoint import MemorySaver
checkpointer = MemorySaver(
max_history=20 # Keep last 20 messages only
)
3. Unclosed AsyncMock in Tests
# See: tests/MEMORY_SAFETY_GUIDELINES.md
class TestMyFeature:
def teardown_method(self):
# Force garbage collection
import gc
gc.collect()
Redis Connection Timeouts
Symptom: redis.exceptions.TimeoutError: Timeout reading from socket
Solutions:
1. Connection Pool Exhausted
# Increase connection pool size
REDIS_SESSION_POOL_SIZE=50 # Increase from default 10
2. Redis Under Load
# Check Redis stats
redis-cli INFO stats
# Look for:
# - instantaneous_ops_per_sec
# - used_memory_human
# - evicted_keys
# Scale Redis if needed
kubectl scale statefulset redis --replicas=3 -n mcp-server
3. Network Latency
# Test Redis latency
redis-cli --latency -h redis -p 6379
# If high latency, consider:
# - Using Redis Cluster
# - Enabling Redis pipelining
# - Collocating Redis with app
High CPU Usage
Symptom: CPU usage consistently at 80%+
Diagnosis:
# Profile in production
kubectl top pods -n mcp-server
# Get detailed CPU usage
kubectl exec -it <pod-name> -n mcp-server -- top -b -n 1
Solutions:
1. Inefficient Serialization
# Use orjson instead of json
import orjson
data = orjson.dumps(obj) # Faster than json.dumps
2. Synchronous Operations Blocking Event Loop
# Move CPU-intensive work to thread pool
import asyncio
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=4)
async def cpu_intensive_task():
loop = asyncio.get_event_loop()
return await loop.run_in_executor(executor, blocking_function)
3. Missing Connection Pooling
# Use connection pooling for database
from sqlalchemy.pool import QueuePool
engine = create_async_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=10,
max_overflow=20,
)
Slow Startup Time
Symptom: Application takes 60+ seconds to start
Causes & Solutions:
1. Large Model Loading
# Lazy load models
_model = None
def get_model():
global _model
if _model is None:
_model = load_model()
return _model
2. Synchronous Initialization
# Use lazy initialization pattern
from mcp_server_langgraph.observability.lazy_init import LazyTelemetryProvider
# Telemetry initializes on first use, not at import
3. Health Check Too Aggressive
# Increase initial delay
livenessProbe:
initialDelaySeconds: 60 # Increase from 30
periodSeconds: 15
Request Queue Buildup
Symptom: Requests queuing, response times increasing
Diagnosis:
# Check active connections
from prometheus_client import Gauge
active_requests = Gauge('active_requests', 'Number of active requests')
Solutions:
1. Scale Horizontally
# Increase replicas
kubectl scale deployment mcp-server --replicas=5 -n mcp-server
2. Add Rate Limiting
# Configure rate limits
RATE_LIMIT_REQUESTS_PER_MINUTE=100
RATE_LIMIT_TOKENS_PER_MINUTE=10000
3. Enable Request Timeouts
# Add global timeout
import httpx
client = httpx.AsyncClient(timeout=30.0) # 30 second timeout
Still Having Issues?
For advanced performance optimization:
- Enable Metrics: Set up Prometheus metrics collection
- Use LangSmith: Enable tracing for LLM call analysis
- Profile Locally: Use cProfile for detailed profiling
- Review Architecture: See performance best practices