Skip to main content

Docker Compose Health Check Patterns - Portable Best Practices

Overview

This guide documents portable health check patterns for Docker Compose services. Following these patterns ensures health checks work across different base images, container runtimes, and environments without relying on utilities that may not be present.

Table of Contents


Why Portable Health Checks Matter

The Problem

Many Docker images use minimal base images (Alpine, distroless, scratch) that:
  • ❌ Lack common utilities (curl, wget, nc, telnet)
  • ❌ Don’t include health check tools (grpc_health_probe, httpie)
  • ❌ May not have shell interpreters (/bin/bash, /bin/sh)
  • ❌ Minimize attack surface by removing non-essential binaries

The Solution

Use service-native health check commands that are guaranteed to exist in the container:
  • ✅ Database clients (pg_isready, redis-cli, mongosh)
  • ✅ Service-specific health endpoints (kc.sh show-config)
  • ✅ Built-in health check binaries shipped with the service
  • ✅ TCP port checks as fallback (most portable)

Benefits

  1. Reliability: Health checks don’t fail due to missing utilities
  2. Portability: Works across different image variants (alpine, distroless, etc.)
  3. Security: Doesn’t require installing additional packages
  4. Performance: Uses lightweight native commands
  5. Maintainability: Survives base image updates

Common Pitfalls

❌ Anti-Pattern 1: Assuming curl Exists

# WRONG: curl may not exist in minimal images
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
Why it fails:
  • Alpine-based images: curl not installed by default
  • Distroless images: No package manager to install utilities
  • Security-hardened images: Utilities removed to reduce attack surface
Example failures:
  • qdrant:v1.15.1 - lacks curl, wget, grpc_health_probe
  • quay.io/keycloak/keycloak:latest - minimal image without curl

❌ Anti-Pattern 2: Using wget Without Verification

# WRONG: wget also commonly missing
healthcheck:
  test: ["CMD", "wget", "--spider", "http://localhost:8080/health"]

❌ Anti-Pattern 3: Hardcoded Shell Paths

# WRONG: /bin/bash may not exist (Alpine uses /bin/sh)
healthcheck:
  test: ["CMD-SHELL", "/bin/bash -c 'redis-cli ping'"]

❌ Anti-Pattern 4: External Dependencies

# WRONG: Requires installing grpc_health_probe
healthcheck:
  test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=:8081"]
  # What if grpc_health_probe isn't installed?

Portable Patterns by Service Type

PostgreSQL

✅ Recommended: Use pg_isready
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 5s
  timeout: 3s
  retries: 10
  start_period: 10s
Why it works:
  • pg_isready is always included in official PostgreSQL images
  • Checks database readiness, not just process existence
  • Returns proper exit codes (0 = healthy, 1/2 = unhealthy)
Alternative (if pg_isready unavailable):
healthcheck:
  test: ["CMD-SHELL", "psql -U postgres -c 'SELECT 1' || exit 1"]

Redis

✅ Recommended: Use redis-cli ping
healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 3s
  timeout: 2s
  retries: 10
  start_period: 5s
Why it works:
  • redis-cli always bundled with Redis
  • ping command is lightweight and fast
  • Returns PONG on success (exit code 0)
For Redis with AUTH:
healthcheck:
  test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
  # Note: Password visible in docker inspect - use redis.conf ACL for production

Keycloak

✅ Recommended: Use built-in kc.sh command
healthcheck:
  # Keycloak minimal image doesn't include curl
  # Use kc.sh show-config to verify Keycloak is initialized
  test: ["CMD-SHELL", "/opt/keycloak/bin/kc.sh show-config | grep -q 'kc.db' || exit 1"]
  interval: 5s
  timeout: 5s
  retries: 40
  start_period: 45s  # Keycloak slow to start
Why it works:
  • kc.sh is the native Keycloak management script
  • show-config verifies configuration loaded
  • grep -q 'kc.db' confirms database configured
  • No external utilities required
Alternative (if HTTP endpoint available):
healthcheck:
  # Only if curl/wget installed or using full image variant
  test: ["CMD", "curl", "-f", "http://localhost:8080/health/ready"]

MongoDB

✅ Recommended: Use mongosh or mongo client
healthcheck:
  test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
  interval: 5s
  timeout: 3s
  retries: 10
  start_period: 10s
For older MongoDB versions (<5.0):
healthcheck:
  test: ["CMD", "mongo", "--eval", "db.adminCommand('ping')"]

OpenFGA (gRPC Services)

✅ Recommended Option 1: Use bundled grpc_health_probe
healthcheck:
  test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=:8081"]
  interval: 3s
  timeout: 3s
  retries: 15
  start_period: 5s
Requirements:
  • Verify grpc_health_probe is in the image
  • Check with: docker run --rm <image> ls /usr/local/bin/grpc_health_probe
✅ Recommended Option 2: TCP port check (most portable)
healthcheck:
  test: ["CMD-SHELL", "timeout 1 bash -c '</dev/tcp/localhost/8081' || exit 1"]
  interval: 5s
  timeout: 3s
  retries: 10
Why it works:
  • No external utilities needed (uses Bash built-in /dev/tcp)
  • Works on any image with Bash
  • Checks if port is listening
For Alpine/sh-only environments:
healthcheck:
  test: ["CMD-SHELL", "nc -z localhost 8081 || exit 1"]
  # Note: Requires nc (netcat) - usually present in Alpine

Qdrant (Vector Database)

✅ Recommended: TCP port check
healthcheck:
  # qdrant:v1.15.1 image lacks wget, curl, and grpc_health_probe
  # Use TCP port check as most portable option
  test: ["CMD-SHELL", "timeout 1 bash -c '</dev/tcp/localhost/6333' || exit 1"]
  interval: 5s
  timeout: 3s
  retries: 10
  start_period: 10s
Why TCP check:
  • Qdrant minimal image has no HTTP clients
  • Installing utilities defeats minimal image purpose
  • Port listening = service ready for most use cases
Alternative (if Python available):
healthcheck:
  test: ["CMD-SHELL", "python3 -c 'import socket; s=socket.socket(); s.connect((\"localhost\", 6333))' || exit 1"]

Elasticsearch

✅ Recommended: Use curl if available, fallback to TCP
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
  interval: 10s
  timeout: 5s
  retries: 12
  start_period: 30s
For minimal images:
healthcheck:
  test: ["CMD-SHELL", "timeout 1 bash -c '</dev/tcp/localhost/9200' || exit 1"]

RabbitMQ

✅ Recommended: Use rabbitmqctl
healthcheck:
  test: ["CMD", "rabbitmqctl", "status"]
  interval: 10s
  timeout: 5s
  retries: 5
  start_period: 30s
Alternative (check cluster health):
healthcheck:
  test: ["CMD", "rabbitmq-diagnostics", "-q", "ping"]

Health Check Configuration Guidelines

Timing Parameters Best Practices

healthcheck:
  interval: 5s      # How often to check (balance between responsiveness and load)
  timeout: 3s       # MUST be <= interval (Codex validation requirement)
  retries: 10       # Number of consecutive failures before unhealthy
  start_period: 10s # Grace period during startup (no failures count)
Rules:
  1. interval >= timeout (required by Docker Compose validation)
  2. start_period should cover typical startup time
  3. retries should account for temporary failures (network blips)
Service-Specific Recommendations:
ServiceIntervalTimeoutRetriesStart PeriodRationale
Redis3s2s105sFast startup, lightweight check
PostgreSQL5s3s1010sModerate startup, db initialization
Keycloak5s5s4045sSlow startup, complex initialization
OpenFGA3s3s155sFast startup, gRPC ready quickly
Qdrant5s3s1010sModerate startup, index loading

Exit Codes

Health check commands must return proper exit codes:
  • 0: Healthy (container ready)
  • 1: Unhealthy (container not ready or failed)
Example with explicit exit codes:
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres && exit 0 || exit 1"]

Shell vs. Exec Form

CMD-SHELL Form (requires shell):
test: ["CMD-SHELL", "pg_isready -U postgres"]
  • Runs command through /bin/sh -c
  • Required for: pipes, redirections, variable expansion
  • Risk: Fails if /bin/sh doesn’t exist (distroless images)
CMD Form (direct exec):
test: ["CMD", "redis-cli", "ping"]
  • Executes command directly (no shell)
  • Preferred when possible (more portable)
  • Works in distroless/minimal images

Testing Health Checks

Verify Health Check Works

1. Start service and monitor health status:
docker-compose up -d postgres
watch -n 1 'docker-compose ps postgres'
2. Check health check logs:
docker inspect --format='{{json .State.Health}}' postgres | jq
3. Manually run health check command:
docker exec postgres pg_isready -U postgres
echo $?  # Should output 0 if healthy

Test in Different Image Variants

# Test with Alpine variant
docker run --rm postgres:17-alpine pg_isready --version

# Test with Debian variant
docker run --rm postgres:17 pg_isready --version

# Test with distroless (if applicable)
docker run --rm gcr.io/distroless/base ls /bin/sh
# Should fail if truly distroless

Validate Timing Parameters

# Add this to your docker-compose file temporarily
healthcheck:
  # Intentionally misconfigured to test validation
  interval: 2s
  timeout: 3s  # timeout > interval - should fail validation
Run validation:
docker-compose config
# Should report: healthcheck interval must be >= timeout

Troubleshooting

Health Check Never Becomes Healthy

Symptom: Container stays in starting or unhealthy state Debug steps:
  1. Check if command exists:
    docker exec <container> which pg_isready
    docker exec <container> which redis-cli
    
  2. Run health check manually:
    docker exec <container> pg_isready -U postgres
    # Note the exit code and output
    
  3. Check service is actually running:
    docker exec <container> ps aux
    docker logs <container>
    
  4. Verify ports are listening:
    docker exec <container> netstat -tlnp
    # or
    docker exec <container> ss -tlnp
    
  5. Check start_period is sufficient:
    healthcheck:
      start_period: 60s  # Increase if service slow to start
    

Health Check Command Not Found

Symptom: executable file not found or command not found Solutions:
  1. Verify command path:
    docker exec <container> which pg_isready
    docker exec <container> find / -name pg_isready 2>/dev/null
    
  2. Check shell availability:
    docker exec <container> ls -l /bin/sh
    # If missing, use CMD form instead of CMD-SHELL
    
  3. Use absolute paths:
    healthcheck:
      test: ["CMD", "/usr/bin/redis-cli", "ping"]
    

Health Check Times Out

Symptom: Health checks fail with timeout Solutions:
  1. Increase timeout:
    healthcheck:
      timeout: 10s  # Increase for slow services
    
  2. Use faster health check:
    # Instead of full HTTP request
    healthcheck:
      test: ["CMD-SHELL", "timeout 1 bash -c '</dev/tcp/localhost/8080'"]
    
  3. Check network latency:
    docker exec <container> time redis-cli ping
    # Measure actual health check execution time
    

Permission Denied Errors

Symptom: Health check fails with permission errors Solutions:
  1. Run as correct user:
    healthcheck:
      test: ["CMD-SHELL", "su - postgres -c 'pg_isready'"]
    
  2. Check file permissions:
    docker exec <container> ls -l /opt/keycloak/bin/kc.sh
    
  3. Use sudo if available:
    healthcheck:
      test: ["CMD-SHELL", "sudo -u postgres pg_isready"]
    

Decision Tree: Choosing the Right Health Check


Complete Examples

Example 1: PostgreSQL with Optimal Settings

postgres:
  image: postgres:17-alpine
  environment:
    POSTGRES_USER: testuser
    POSTGRES_PASSWORD: testpass
    POSTGRES_DB: testdb
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U testuser -d testdb"]
    interval: 5s
    timeout: 3s
    retries: 10
    start_period: 10s
  ports:
    - "5432:5432"
  networks:
    - app-network

Example 2: Redis with Auth

redis:
  image: redis:7-alpine
  command: redis-server --requirepass ${REDIS_PASSWORD}
  healthcheck:
    test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
    interval: 3s
    timeout: 2s
    retries: 10
    start_period: 5s
  environment:
    - REDIS_PASSWORD=${REDIS_PASSWORD}
  ports:
    - "6379:6379"

Example 3: Keycloak with Slow Startup

keycloak:
  image: quay.io/keycloak/keycloak:latest
  command: start-dev
  environment:
    KC_DB: postgres
    KC_DB_URL: jdbc:postgresql://postgres:5432/keycloak
    KEYCLOAK_ADMIN: admin
    KEYCLOAK_ADMIN_PASSWORD: admin
  healthcheck:
    # Keycloak minimal image lacks curl - use kc.sh
    test: ["CMD-SHELL", "/opt/keycloak/bin/kc.sh show-config | grep -q 'kc.db' || exit 1"]
    interval: 5s
    timeout: 5s
    retries: 40
    start_period: 45s  # Keycloak needs time to initialize
  ports:
    - "8080:8080"
  depends_on:
    postgres:
      condition: service_healthy

Example 4: Multi-Service with Dependencies

version: '3.8'

services:
  postgres:
    image: postgres:17-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 3s
      timeout: 2s
      retries: 10
      start_period: 5s

  app:
    build: .
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "timeout 1 bash -c '</dev/tcp/localhost/8000' || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    ports:
      - "8000:8000"

References


Summary: Quick Reference

ServiceBest Health CheckWhy
PostgreSQLpg_isready -U postgresNative, always available
Redisredis-cli pingBundled client, fast
Keycloakkc.sh show-config | grep -q 'kc.db'Native command, no curl needed
OpenFGAgrpc_health_probe -addr=:8081gRPC standard
Qdranttimeout 1 bash -c '</dev/tcp/localhost/6333'No utilities in image
MongoDBmongosh --eval "db.adminCommand('ping')"Native client
RabbitMQrabbitmqctl statusService management tool
Generic HTTPtimeout 1 bash -c '</dev/tcp/localhost/PORT'Most portable
Golden Rules:
  1. Use service-native commands when possible
  2. Fallback to TCP checks for minimal images
  3. interval >= timeout (required)
  4. Test in actual container before deploying
  5. Don’t assume curl/wget exist
  6. Don’t use hardcoded shell paths

Last Updated: 2024-11-17 Maintained By: Infrastructure Team Related: docker-compose.test.yml, docker-compose.dev.yml