Resilience Patterns
MCP Server LangGraph includes production-grade resilience patterns to handle failures gracefully. See ADR-0030: Resilience Patterns for design rationale.Overview
The resilience module provides five patterns:| Pattern | Purpose | When to Use |
|---|---|---|
| Circuit Breaker | Prevent cascade failures | External service calls |
| Retry with Backoff | Handle transient failures | Network errors, rate limits |
| Timeout | Prevent hanging requests | Slow external APIs |
| Bulkhead | Isolate resource pools | Limit concurrent requests |
| Fallback | Graceful degradation | When all else fails |
Circuit Breaker
The circuit breaker prevents cascade failures by failing fast when a service is unhealthy.States
- CLOSED: Normal operation, requests pass through
- OPEN: Service unhealthy, requests fail immediately
- HALF_OPEN: Testing if service recovered
Usage
Configuration
Retry with Backoff
Automatically retry failed operations with exponential backoff and jitter.Usage
Retry Strategies
Backoff Calculation
With exponential backoff and jitter:Timeout Enforcement
Prevent hanging requests with configurable timeouts.Usage
Handling Timeouts
Bulkhead Isolation
Limit concurrent requests to prevent resource exhaustion.Usage
Configuration
Bulkhead per Service
Create separate bulkheads for different services:Fallback Strategies
Define what happens when operations fail.Available Strategies
Custom Fallback Functions
Combining Patterns
For production use, combine multiple patterns:Recommended Pattern Order
Configuration via Environment
Configure resilience patterns globally:Environment Variables
| Variable | Default | Description |
|---|---|---|
RESILIENCE_CIRCUIT_THRESHOLD | 5 | Circuit breaker failure threshold |
RESILIENCE_RECOVERY_TIMEOUT | 30 | Circuit recovery timeout (seconds) |
RESILIENCE_DEFAULT_TIMEOUT | 30 | Default operation timeout (seconds) |
RESILIENCE_MAX_RETRIES | 3 | Default max retry attempts |
RESILIENCE_BULKHEAD_LLM | 10 | LLM bulkhead max concurrent |
RESILIENCE_BULKHEAD_DB | 50 | Database bulkhead max concurrent |