Health Check API

Overview

Health check endpoints provide real-time service health status for monitoring, load balancers, and orchestration systems. Designed for Kubernetes, Cloud Run, and other cloud platforms.

Endpoints

GET /health

Primary health check - Returns overall service health. Use this endpoint for:

Load balancer health checks
Monitoring systems
General health status

Request Example:

curl https://api.yourdomain.com/health

Response (Healthy):

{
  "status": "healthy",
  "service": "mcp-server-langgraph",
  "version": "2.8.0",
  "timestamp": "2025-10-12T10:30:00Z",
  "checks": {
    "llm_provider": "healthy",
    "openfga": "healthy",
    "keycloak": "healthy",
    "redis": "healthy"
  }
}

Response (Degraded):

{
  "status": "degraded",
  "service": "mcp-server-langgraph",
  "version": "2.8.0",
  "timestamp": "2025-10-12T10:30:00Z",
  "checks": {
    "llm_provider": "healthy",
    "openfga": "healthy",
    "keycloak": "unhealthy",
    "redis": "healthy"
  },
  "warnings": [
    "Keycloak connection timeout - authentication may be slow"
  ]
}

Response (Unhealthy):

{
  "status": "unhealthy",
  "service": "mcp-server-langgraph",
  "version": "2.8.0",
  "timestamp": "2025-10-12T10:30:00Z",
  "checks": {
    "llm_provider": "unhealthy",
    "openfga": "healthy",
    "keycloak": "healthy",
    "redis": "healthy"
  },
  "errors": [
    "LLM provider API key invalid or quota exceeded"
  ]
}

Status Codes:

200

Service is healthy (all checks passed)

503

Service Unavailable

Service is degraded or unhealthy (one or more checks failed)

GET /health/ready

Readiness probe - Indicates if service is ready to accept traffic. Use this endpoint for:

Kubernetes readiness probes
Load balancer registration
Traffic routing decisions

Returns 200 only when all critical dependencies are available. Service may be running but not ready.

Request Example:

curl https://api.yourdomain.com/health/ready

Response (Ready):

{
  "ready": true,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "dependencies": {
    "llm_provider": "ready",
    "openfga": "ready",
    "keycloak": "ready",
    "redis": "ready"
  }
}

Response (Not Ready):

{
  "ready": false,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "dependencies": {
    "llm_provider": "ready",
    "openfga": "not_ready",
    "keycloak": "ready",
    "redis": "ready"
  },
  "blocking": [
    "OpenFGA authorization service not responding"
  ]
}

Status Codes:

200

Service is ready to accept traffic

503

Service Unavailable

Service is not ready (dependencies unavailable)

Kubernetes Configuration:

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 20
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

GET /health/live

Liveness probe - Indicates if service is alive and not deadlocked. Use this endpoint for:

Kubernetes liveness probes
Auto-restart decisions
Deadlock detection

Failing liveness checks triggers pod restarts. Only use for detecting unrecoverable failures.

Request Example:

curl https://api.yourdomain.com/health/live

Response (Alive):

{
  "alive": true,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "uptime_seconds": 86400
}

Response (Not Alive):

{
  "alive": false,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "reason": "Event loop blocked for 30+ seconds"
}

Status Codes:

200

Service is alive and responsive

503

Service Unavailable

Service is deadlocked or unresponsive

Kubernetes Configuration:

livenessProbe:
  httpGet:
    path: /health/live
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

GET /health/startup

Startup probe - Indicates if service has completed initialization. Use this endpoint for:

Kubernetes startup probes
Slow-starting applications
Initial dependency checks

Prevents premature liveness/readiness checks during startup. Useful for services with long initialization.

Request Example:

curl https://api.yourdomain.com/health/startup

Response (Started):

{
  "started": true,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "startup_duration_seconds": 12.5,
  "initialization": {
    "llm_connection": "completed",
    "openfga_model": "loaded",
    "keycloak_config": "completed",
    "redis_connection": "completed"
  }
}

Response (Starting):

{
  "started": false,
  "service": "mcp-server-langgraph",
  "timestamp": "2025-10-12T10:30:00Z",
  "initialization": {
    "llm_connection": "in_progress",
    "openfga_model": "pending",
    "keycloak_config": "pending",
    "redis_connection": "pending"
  },
  "progress": "30%"
}

Status Codes:

200

Service has completed startup

503

Service Unavailable

Service is still starting up

Kubernetes Configuration:

startupProbe:
  httpGet:
    path: /health/startup
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 30  # Allow up to 150s startup time

GET /health/dependencies

Detailed dependency status - Shows health of all external dependencies. Use this endpoint for:

Debugging connectivity issues
Monitoring dashboards
Operational visibility

Request Example:

curl https://api.yourdomain.com/health/dependencies

Response:

{
  "timestamp": "2025-10-12T10:30:00Z",
  "dependencies": {
    "llm_provider": {
      "status": "healthy",
      "provider": "anthropic",
      "model": "claude-sonnet-4-5-20250929",
      "response_time_ms": 45,
      "last_check": "2025-10-12T10:29:55Z"
    },
    "openfga": {
      "status": "healthy",
      "url": "http://openfga:8080",
      "store_id": "01HXXXXXXXXX",
      "response_time_ms": 12,
      "last_check": "2025-10-12T10:29:58Z"
    },
    "keycloak": {
      "status": "healthy",
      "url": "https://sso.yourdomain.com",
      "realm": "mcp-server-langgraph",
      "response_time_ms": 34,
      "last_check": "2025-10-12T10:29:57Z"
    },
    "redis": {
      "status": "healthy",
      "url": "redis://redis-session:6379/0",
      "connected_clients": 15,
      "used_memory_mb": 128,
      "response_time_ms": 3,
      "last_check": "2025-10-12T10:30:00Z"
    },
    "postgresql": {
      "status": "healthy",
      "host": "postgres:5432",
      "database": "keycloak",
      "active_connections": 12,
      "response_time_ms": 8,
      "last_check": "2025-10-12T10:29:59Z"
    }
  },
  "overall_status": "healthy"
}

Status Codes:

200

Dependency check completed (may include unhealthy dependencies)

Health Check Responses

Status Values

healthy

string

Component is functioning normally

degraded

string

Component is operational but with reduced performance

unhealthy

string

Component is not functioning

unknown

string

Component status cannot be determined

Component Checks

LLM Provider
OpenFGA
Keycloak
Redis

Checks:

API key validity
Model availability
Response time < 5s
Quota availability

Failure Scenarios:

Invalid API key
Quota exceeded
Connection timeout
Model not found

Monitoring Integration

Prometheus

Expose health check metrics:

## Service health status (1 = healthy, 0 = unhealthy)
health_status{service="mcp-server-langgraph"} 1

## Dependency health
dependency_status{dependency="llm_provider"} 1
dependency_status{dependency="openfga"} 1
dependency_status{dependency="keycloak"} 1
dependency_status{dependency="redis"} 1

## Health check response time
health_check_duration_seconds{endpoint="/health"} 0.015

Example Alerts:

## Alert on unhealthy service
- alert: ServiceUnhealthy
  expr: health_status == 0
  for: 2m
  annotations:
    summary: "Service {{ $labels.service }} is unhealthy"

## Alert on dependency failure
- alert: DependencyDown
  expr: dependency_status == 0
  for: 1m
  annotations:
    summary: "Dependency {{ $labels.dependency }} is down"

Kubernetes

Complete Probe Configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server-langgraph
spec:
  template:
    spec:
      containers:
      - name: mcp-server-langgraph
        ports:
        - containerPort: 8000
          name: http

        # Startup probe - initial check
        startupProbe:
          httpGet:
            path: /health/startup
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 30  # 150s max startup

        # Liveness probe - restart if unhealthy
        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3  # Restart after 30s

        # Readiness probe - remove from service if not ready
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 20
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3  # Remove after 15s

Cloud Run

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: mcp-server-langgraph
spec:
  template:
    spec:
      containers:
      - image: gcr.io/project/mcp-server-langgraph:latest
        ports:
        - containerPort: 8000

        # Health check for Cloud Run
        livenessProbe:
          httpGet:
            path: /health
          initialDelaySeconds: 30
          periodSeconds: 10

        startupProbe:
          httpGet:
            path: /health/startup
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 30

Debugging Health Issues

Service Shows Unhealthy

# Check detailed health
curl https://api.yourdomain.com/health/dependencies | jq

# Check specific dependency
curl https://api.yourdomain.com/health/dependencies | \
  jq '.dependencies.openfga'

# Check application logs
kubectl logs -l app=mcp-server-langgraph --tail=100 | grep -i error

# Check events
kubectl get events -n mcp-server-langgraph --sort-by='.lastTimestamp'

Pods Not Ready

# Check readiness probe status
kubectl describe pod <pod-name> -n mcp-server-langgraph | \
  grep -A 10 "Readiness:"

# Test readiness manually
kubectl exec -it <pod-name> -n mcp-server-langgraph -- \
  curl http://localhost:8000/health/ready

# Check dependency connectivity
kubectl exec -it <pod-name> -n mcp-server-langgraph -- \
  nc -zv openfga 8080

Pods Restarting

# Check liveness probe failures
kubectl describe pod <pod-name> -n mcp-server-langgraph | \
  grep -A 10 "Liveness:"

# Check restart count
kubectl get pods -n mcp-server-langgraph -o wide

# Check last termination reason
kubectl get pod <pod-name> -n mcp-server-langgraph \
  -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'

Slow Health Checks

# Time health check
time curl https://api.yourdomain.com/health

# Check dependency response times
curl https://api.yourdomain.com/health/dependencies | \
  jq '.dependencies | to_entries[] | {name: .key, response_time: .value.response_time_ms}'

# Adjust probe timeouts if needed
kubectl patch deployment mcp-server-langgraph -n mcp-server-langgraph \
  --type json -p='[{
    "op": "replace",
    "path": "/spec/template/spec/containers/0/readinessProbe/timeoutSeconds",
    "value": 5
  }]'

Best Practices

Probe Configuration

Startup Probe:

Use for slow-starting services (>30s initialization)
Set failureThreshold to allow sufficient startup time
Disable liveness/readiness until startup succeeds

Liveness Probe:

Check only critical functionality
Avoid checking external dependencies (may cause cascade failures)
Set generous timeouts to avoid false positives
Use longer periodSeconds (10-30s) to reduce load

Readiness Probe:

Check all critical dependencies
Use short periodSeconds (5-10s) for fast traffic routing
Allow temporary failures (set appropriate failureThreshold)

Monitoring

Monitor health check response times
Alert on sustained unhealthy status
Track dependency availability
Set up dashboards for health metrics
Use different alert severities (critical vs warning)

Load Balancers

Use /health/ready for load balancer health checks
Set appropriate check intervals (5-30s)
Configure healthy/unhealthy thresholds
Enable connection draining on unhealthy instances

Kubernetes Deployment

Configure health probes

Monitoring Guide

Set up observability

Troubleshooting

Debug health issues

Production Checklist

Health check requirements

Always Available: Comprehensive health checks ensure your service is monitored and reliable!

API Documentation

Developer Tools

MCP Protocol

Overview

Endpoints

GET /health

GET /health/ready

GET /health/live

GET /health/startup

GET /health/dependencies

Health Check Responses

Status Values

Component Checks

Monitoring Integration

Prometheus

Kubernetes

Cloud Run

Debugging Health Issues

Best Practices

Kubernetes Deployment

Monitoring Guide

Troubleshooting

Production Checklist

API Documentation

Developer Tools

MCP Protocol

​Overview

​Endpoints

​GET /health

​GET /health/ready

​GET /health/live

​GET /health/startup

​GET /health/dependencies

​Health Check Responses

​Status Values

​Component Checks

​Monitoring Integration

​Prometheus

​Kubernetes

​Cloud Run

​Debugging Health Issues

​Best Practices

​Related Documentation

Kubernetes Deployment

Monitoring Guide

Troubleshooting

Production Checklist

Overview

Endpoints

GET /health

GET /health/ready

GET /health/live

GET /health/startup

GET /health/dependencies

Health Check Responses

Status Values

Component Checks

Monitoring Integration

Prometheus

Kubernetes

Cloud Run

Debugging Health Issues

Best Practices

Related Documentation