Overview
Health check endpoints provide real-time service health status for monitoring, load balancers, and orchestration systems. Designed for Kubernetes, Cloud Run, and other cloud platforms.
Endpoints
GET /health
Primary health check - Returns overall service health.
Use this endpoint for:
Load balancer health checks
Monitoring systems
General health status
Request Example :
curl https://api.yourdomain.com/health
Response (Healthy) :
{
"status" : "healthy" ,
"service" : "mcp-server-langgraph" ,
"version" : "2.8.0" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"checks" : {
"llm_provider" : "healthy" ,
"openfga" : "healthy" ,
"keycloak" : "healthy" ,
"redis" : "healthy"
}
}
Response (Degraded) :
{
"status" : "degraded" ,
"service" : "mcp-server-langgraph" ,
"version" : "2.8.0" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"checks" : {
"llm_provider" : "healthy" ,
"openfga" : "healthy" ,
"keycloak" : "unhealthy" ,
"redis" : "healthy"
},
"warnings" : [
"Keycloak connection timeout - authentication may be slow"
]
}
Response (Unhealthy) :
{
"status" : "unhealthy" ,
"service" : "mcp-server-langgraph" ,
"version" : "2.8.0" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"checks" : {
"llm_provider" : "unhealthy" ,
"openfga" : "healthy" ,
"keycloak" : "healthy" ,
"redis" : "healthy"
},
"errors" : [
"LLM provider API key invalid or quota exceeded"
]
}
Status Codes :
Service is healthy (all checks passed)
Service is degraded or unhealthy (one or more checks failed)
GET /health/ready
Readiness probe - Indicates if service is ready to accept traffic.
Use this endpoint for:
Kubernetes readiness probes
Load balancer registration
Traffic routing decisions
Returns 200 only when all critical dependencies are available. Service may be running but not ready.
Request Example :
curl https://api.yourdomain.com/health/ready
Response (Ready) :
{
"ready" : true ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"dependencies" : {
"llm_provider" : "ready" ,
"openfga" : "ready" ,
"keycloak" : "ready" ,
"redis" : "ready"
}
}
Response (Not Ready) :
{
"ready" : false ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"dependencies" : {
"llm_provider" : "ready" ,
"openfga" : "not_ready" ,
"keycloak" : "ready" ,
"redis" : "ready"
},
"blocking" : [
"OpenFGA authorization service not responding"
]
}
Status Codes :
Service is ready to accept traffic
Service is not ready (dependencies unavailable)
Kubernetes Configuration :
readinessProbe :
httpGet :
path : /health/ready
port : 8000
initialDelaySeconds : 20
periodSeconds : 5
timeoutSeconds : 3
failureThreshold : 3
GET /health/live
Liveness probe - Indicates if service is alive and not deadlocked.
Use this endpoint for:
Kubernetes liveness probes
Auto-restart decisions
Deadlock detection
Failing liveness checks triggers pod restarts. Only use for detecting unrecoverable failures.
Request Example :
curl https://api.yourdomain.com/health/live
Response (Alive) :
{
"alive" : true ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"uptime_seconds" : 86400
}
Response (Not Alive) :
{
"alive" : false ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"reason" : "Event loop blocked for 30+ seconds"
}
Status Codes :
Service is alive and responsive
Service is deadlocked or unresponsive
Kubernetes Configuration :
livenessProbe :
httpGet :
path : /health/live
port : 8000
initialDelaySeconds : 30
periodSeconds : 10
timeoutSeconds : 5
failureThreshold : 3
GET /health/startup
Startup probe - Indicates if service has completed initialization.
Use this endpoint for:
Kubernetes startup probes
Slow-starting applications
Initial dependency checks
Prevents premature liveness/readiness checks during startup. Useful for services with long initialization.
Request Example :
curl https://api.yourdomain.com/health/startup
Response (Started) :
{
"started" : true ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"startup_duration_seconds" : 12.5 ,
"initialization" : {
"llm_connection" : "completed" ,
"openfga_model" : "loaded" ,
"keycloak_config" : "completed" ,
"redis_connection" : "completed"
}
}
Response (Starting) :
{
"started" : false ,
"service" : "mcp-server-langgraph" ,
"timestamp" : "2025-10-12T10:30:00Z" ,
"initialization" : {
"llm_connection" : "in_progress" ,
"openfga_model" : "pending" ,
"keycloak_config" : "pending" ,
"redis_connection" : "pending"
},
"progress" : "30%"
}
Status Codes :
Service has completed startup
Service is still starting up
Kubernetes Configuration :
startupProbe :
httpGet :
path : /health/startup
port : 8000
initialDelaySeconds : 10
periodSeconds : 5
timeoutSeconds : 3
failureThreshold : 30 # Allow up to 150s startup time
GET /health/dependencies
Detailed dependency status - Shows health of all external dependencies.
Use this endpoint for:
Debugging connectivity issues
Monitoring dashboards
Operational visibility
Request Example :
curl https://api.yourdomain.com/health/dependencies
Response :
{
"timestamp" : "2025-10-12T10:30:00Z" ,
"dependencies" : {
"llm_provider" : {
"status" : "healthy" ,
"provider" : "anthropic" ,
"model" : "claude-sonnet-4-5-20250929" ,
"response_time_ms" : 45 ,
"last_check" : "2025-10-12T10:29:55Z"
},
"openfga" : {
"status" : "healthy" ,
"url" : "http://openfga:8080" ,
"store_id" : "01HXXXXXXXXX" ,
"response_time_ms" : 12 ,
"last_check" : "2025-10-12T10:29:58Z"
},
"keycloak" : {
"status" : "healthy" ,
"url" : "https://sso.yourdomain.com" ,
"realm" : "mcp-server-langgraph" ,
"response_time_ms" : 34 ,
"last_check" : "2025-10-12T10:29:57Z"
},
"redis" : {
"status" : "healthy" ,
"url" : "redis://redis-session:6379/0" ,
"connected_clients" : 15 ,
"used_memory_mb" : 128 ,
"response_time_ms" : 3 ,
"last_check" : "2025-10-12T10:30:00Z"
},
"postgresql" : {
"status" : "healthy" ,
"host" : "postgres:5432" ,
"database" : "keycloak" ,
"active_connections" : 12 ,
"response_time_ms" : 8 ,
"last_check" : "2025-10-12T10:29:59Z"
}
},
"overall_status" : "healthy"
}
Status Codes :
Dependency check completed (may include unhealthy dependencies)
Health Check Responses
Status Values
Component is functioning normally
Component is operational but with reduced performance
Component is not functioning
Component status cannot be determined
Component Checks
LLM Provider
OpenFGA
Keycloak
Redis
Checks :
API key validity
Model availability
Response time < 5s
Quota availability
Failure Scenarios :
Invalid API key
Quota exceeded
Connection timeout
Model not found
Monitoring Integration
Prometheus
Expose health check metrics:
## Service health status (1 = healthy, 0 = unhealthy)
health_status{service="mcp-server-langgraph"} 1
## Dependency health
dependency_status{dependency="llm_provider"} 1
dependency_status{dependency="openfga"} 1
dependency_status{dependency="keycloak"} 1
dependency_status{dependency="redis"} 1
## Health check response time
health_check_duration_seconds{endpoint="/health"} 0.015
Example Alerts :
## Alert on unhealthy service
- alert : ServiceUnhealthy
expr : health_status == 0
for : 2m
annotations :
summary : "Service {{ $labels.service }} is unhealthy"
## Alert on dependency failure
- alert : DependencyDown
expr : dependency_status == 0
for : 1m
annotations :
summary : "Dependency {{ $labels.dependency }} is down"
Kubernetes
Complete Probe Configuration :
apiVersion : apps/v1
kind : Deployment
metadata :
name : mcp-server-langgraph
spec :
template :
spec :
containers :
- name : mcp-server-langgraph
ports :
- containerPort : 8000
name : http
# Startup probe - initial check
startupProbe :
httpGet :
path : /health/startup
port : http
initialDelaySeconds : 10
periodSeconds : 5
timeoutSeconds : 3
failureThreshold : 30 # 150s max startup
# Liveness probe - restart if unhealthy
livenessProbe :
httpGet :
path : /health/live
port : http
initialDelaySeconds : 30
periodSeconds : 10
timeoutSeconds : 5
failureThreshold : 3 # Restart after 30s
# Readiness probe - remove from service if not ready
readinessProbe :
httpGet :
path : /health/ready
port : http
initialDelaySeconds : 20
periodSeconds : 5
timeoutSeconds : 3
failureThreshold : 3 # Remove after 15s
Cloud Run
apiVersion : serving.knative.dev/v1
kind : Service
metadata :
name : mcp-server-langgraph
spec :
template :
spec :
containers :
- image : gcr.io/project/mcp-server-langgraph:latest
ports :
- containerPort : 8000
# Health check for Cloud Run
livenessProbe :
httpGet :
path : /health
initialDelaySeconds : 30
periodSeconds : 10
startupProbe :
httpGet :
path : /health/startup
initialDelaySeconds : 10
periodSeconds : 5
failureThreshold : 30
Debugging Health Issues
# Check detailed health
curl https://api.yourdomain.com/health/dependencies | jq
# Check specific dependency
curl https://api.yourdomain.com/health/dependencies | \
jq '.dependencies.openfga'
# Check application logs
kubectl logs -l app=mcp-server-langgraph --tail=100 | grep -i error
# Check events
kubectl get events -n mcp-server-langgraph --sort-by= '.lastTimestamp'
# Check readiness probe status
kubectl describe pod < pod-nam e > -n mcp-server-langgraph | \
grep -A 10 "Readiness:"
# Test readiness manually
kubectl exec -it < pod-nam e > -n mcp-server-langgraph -- \
curl http://localhost:8000/health/ready
# Check dependency connectivity
kubectl exec -it < pod-nam e > -n mcp-server-langgraph -- \
nc -zv openfga 8080
# Check liveness probe failures
kubectl describe pod < pod-nam e > -n mcp-server-langgraph | \
grep -A 10 "Liveness:"
# Check restart count
kubectl get pods -n mcp-server-langgraph -o wide
# Check last termination reason
kubectl get pod < pod-nam e > -n mcp-server-langgraph \
-o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
# Time health check
time curl https://api.yourdomain.com/health
# Check dependency response times
curl https://api.yourdomain.com/health/dependencies | \
jq '.dependencies | to_entries[] | {name: .key, response_time: .value.response_time_ms}'
# Adjust probe timeouts if needed
kubectl patch deployment mcp-server-langgraph -n mcp-server-langgraph \
--type json -p= '[{
"op": "replace",
"path": "/spec/template/spec/containers/0/readinessProbe/timeoutSeconds",
"value": 5
}]'
Best Practices
Startup Probe :
Use for slow-starting services (>30s initialization)
Set failureThreshold to allow sufficient startup time
Disable liveness/readiness until startup succeeds
Liveness Probe :
Check only critical functionality
Avoid checking external dependencies (may cause cascade failures)
Set generous timeouts to avoid false positives
Use longer periodSeconds (10-30s) to reduce load
Readiness Probe :
Check all critical dependencies
Use short periodSeconds (5-10s) for fast traffic routing
Allow temporary failures (set appropriate failureThreshold)
Monitor health check response times
Alert on sustained unhealthy status
Track dependency availability
Set up dashboards for health metrics
Use different alert severities (critical vs warning)
Use /health/ready for load balancer health checks
Set appropriate check intervals (5-30s)
Configure healthy/unhealthy thresholds
Enable connection draining on unhealthy instances
Always Available : Comprehensive health checks ensure your service is monitored and reliable!