Skip to main content

Overview

This guide provides step-by-step instructions for deploying MCP Server LangGraph on Google Kubernetes Engine (GKE) Autopilot with enterprise-grade security, monitoring, and cost optimization.

Deployment Time

2-3 hours for complete production setup

Cost Savings

40-60% vs. traditional GKE Standard

Infrastructure Maturity

94/100 production-readiness score

Best Practices

100% GCP best practices compliance

What You’ll Deploy

  • GKE Autopilot Cluster: Fully managed Kubernetes (regional, multi-zone)
  • Cloud SQL PostgreSQL: High-availability database with automated backups
    • 3 databases: Keycloak (identity), OpenFGA (authorization), GDPR (compliance per ADR-0041)
  • Memorystore Redis: High-availability cache with persistence
  • Workload Identity: Secure pod-to-GCP service authentication
  • Private Networking: VPC-native with Cloud NAT
  • Observability: Cloud Monitoring, Logging, Trace, Profiler
  • Security: Binary Authorization, Network Policies, Encryption

Key Benefits

GKE Autopilot uses pay-per-pod pricing with no idle node costs. Production environment runs for 8801,275/monthvs.880-1,275/month vs. 1,290-1,970 with traditional GKE.
Google manages all node infrastructure, upgrades, and scaling automatically. Focus on your application, not Kubernetes operations.
Regional deployment across 3 zones with automated failover for databases and cache provides enterprise-grade reliability.
Built-in Workload Identity, Binary Authorization, Shielded Nodes, Network Policies, and encryption at rest/transit.

Prerequisites

1

GCP Account & Project

Create a GCP project with billing enabled:
# Create project
gcloud projects create PROJECT_ID --name="MCP LangGraph Production"

# Link billing
gcloud billing projects link PROJECT_ID --billing-account=BILLING_ACCOUNT_ID

# Set active project
gcloud config set project PROJECT_ID
2

Install Required Tools

ToolVersionInstallation
gcloud CLILatestcurl https://sdk.cloud.google.com | bash
Terraform≥ 1.5.0terraform.io/downloads
kubectl≥ 1.28gcloud components install kubectl
kustomize≥ 5.0brew install kustomize
Verify installations:
gcloud version
terraform version
kubectl version --client
kustomize version
3

Enable Required APIs

Enable 20+ required GCP APIs:
gcloud services enable \
    container.googleapis.com \
    compute.googleapis.com \
    sqladmin.googleapis.com \
    redis.googleapis.com \
    servicenetworking.googleapis.com \
    cloudresourcemanager.googleapis.com \
    iam.googleapis.com \
    binaryauthorization.googleapis.com \
    monitoring.googleapis.com \
    logging.googleapis.com \
    cloudtrace.googleapis.com \
    cloudprofiler.googleapis.com \
    secretmanager.googleapis.com \
    artifactregistry.googleapis.com \
    --project=PROJECT_ID
4

Configure IAM Permissions

Required roles (or roles/owner):
  • roles/compute.networkAdmin
  • roles/container.admin
  • roles/cloudsql.admin
  • roles/redis.admin
  • roles/iam.securityAdmin
  • roles/resourcemanager.projectIamAdmin

Architecture

The production deployment creates a fully managed, highly available infrastructure:

Phase 1: Infrastructure Setup (30 minutes)

Step 1: Create Terraform State Backend

  • Quick Setup
  • With Options
cd terraform/backend-setup-gcp

# Create terraform.tfvars
cat > terraform.tfvars <<EOF
project_id    = "YOUR_PROJECT_ID"
region        = "us-central1"
bucket_prefix = "mcp-langgraph"
EOF

# Deploy
terraform init
terraform apply -auto-approve

# Save bucket name
export TF_STATE_BUCKET=$(terraform output -raw terraform_state_bucket)
Expected: GCS bucket created with versioning and logging enabled

Step 2: Configure Production Environment

cd terraform/environments/gcp-prod

## Update main.tf with state bucket
sed -i "s/bucket = \"mcp-langgraph-terraform-state-us-central1-XXXXX\"/bucket = \"$TF_STATE_BUCKET\"/g" main.tf

## Create terraform.tfvars
cp terraform.tfvars.example terraform.tfvars
Edit terraform.tfvars with your configuration:
project_id  = "YOUR_PROJECT_ID"
region      = "us-central1"
team        = "platform"
cost_center = "engineering"

## Security (enable after Phase 4)
enable_binary_authorization = false
enable_private_endpoint     = false  # Set true for maximum security

## Optional: Restrict control plane access
master_authorized_networks_cidrs = [
  {
    cidr_block   = "YOUR_IP/32"
    display_name = "My IP"
  }
]
Important: Replace YOUR_PROJECT_ID with your actual GCP project ID throughout the configuration.

Phase 2: Deploy Infrastructure (25 minutes)

Step 1: Initialize and Plan

cd terraform/environments/gcp-prod
terraform init
terraform plan -out=tfplan
Review the plan. It should create:
  • 1 VPC network with 3 subnets
  • 1 GKE Autopilot cluster (regional)
  • 1 Cloud SQL instance (PostgreSQL 15, HA)
  • 1 Memorystore instance (Redis 7.0, HA)
  • 2 NAT IPs
  • Multiple service accounts
  • IAM bindings
  • Firewall rules
  • Monitoring alerts

Step 2: Deploy

terraform apply tfplan
Duration: 20-25 minutes. Cloud SQL takes 10-12 minutes, GKE takes 8-10 minutes.

Step 3: Configure kubectl

## Get credentials
eval $(terraform output -raw kubectl_config_command)

## Verify access
kubectl cluster-info
kubectl get nodes
kubectl get namespaces
Expected output:
Kubernetes control plane is running at https://X.X.X.X
NAME              STATUS   AGE
default           Active   5m
kube-system       Active   5m

Phase 3: Application Deployment (20 minutes)

Step 1: Create Secrets in Secret Manager

PROJECT_ID="YOUR_PROJECT_ID"

## Create secret
gcloud secrets create mcp-production-secrets \
  --replication-policy="automatic" \
  --project="$PROJECT_ID"

## Prepare secret data
cat > /tmp/secrets.json <<EOF
{
  "anthropic_api_key": "sk-ant-...",
  "google_api_key": "AIza...",
  "jwt_secret": "$(openssl rand -base64 32)",
  "postgres_password": "$(terraform output -raw cloudsql_user_password)",
  "cloudsql_connection_name": "$(terraform output -raw cloudsql_connection_name)",
  "redis_host": "$(terraform output -raw redis_host)",
  "redis_port": "$(terraform output -raw redis_port)",
  "redis_password": "$(terraform output -raw redis_auth_string)"
}
EOF

## Upload secrets
gcloud secrets versions add mcp-production-secrets \
  --data-file=/tmp/secrets.json \
  --project="$PROJECT_ID"

## Cleanup
rm -f /tmp/secrets.json

Step 2: Install External Secrets Operator

helm repo add external-secrets https://charts.external-secrets.io
helm repo update

helm install external-secrets \
  external-secrets/external-secrets \
  --namespace external-secrets-system \
  --create-namespace \
  --set installCRDs=true

kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/name=external-secrets \
  -n external-secrets-system \
  --timeout=120s

Step 3: Deploy Application

cd deployments/overlays/production-gke

## Update with your project ID
sed -i "s/PROJECT_ID/$PROJECT_ID/g" *.yaml

## Deploy
kubectl apply -k .

## Watch rollout
kubectl rollout status deployment/production-mcp-server-langgraph \
  -n mcp-production \
  --timeout=10m
Verify deployment:
kubectl get pods -n mcp-production
kubectl logs -n mcp-production -l app=mcp-server-langgraph --tail=50

Phase 4: Security Hardening (30 minutes)

Binary Authorization

Enable image signing to ensure only trusted container images run in your cluster:
1

Run Setup Script

./deployments/security/binary-authorization/setup-binary-auth.sh \
  PROJECT_ID \
  production
This creates:
  • KMS key for signing
  • Binary Authorization attestor
  • Enforcement policy
2

Sign Images

IMAGE="us-central1-docker.pkg.dev/PROJECT_ID/mcp-production/mcp-server-langgraph:2.8.0"

./deployments/security/binary-authorization/sign-image.sh \
  PROJECT_ID \
  production \
  "$IMAGE"
3

Enable in Cluster

Edit terraform.tfvars:
enable_binary_authorization = true
Apply:
terraform apply -auto-approve
Learn more about Binary Authorization for complete setup details.

Phase 5: Observability (10 minutes)

Setup Monitoring

## Create dashboards and alerts
./monitoring/gcp/setup-monitoring.sh PROJECT_ID
This configures:
  • Custom Cloud Monitoring dashboard
  • Alert policies (CPU, memory, errors, latency)
  • Uptime checks
  • SLO definitions

Access Dashboards


Verification & Testing

Health Checks

Test all health endpoints:
kubectl port-forward -n mcp-production \
  svc/production-mcp-server-langgraph 8000:8000 &

curl http://localhost:8000/health/live
curl http://localhost:8000/health/ready
curl http://localhost:8000/health/startup
All health checks should return HTTP 200

Database Connectivity

kubectl exec -it -n mcp-production \
  $(kubectl get pod -n mcp-production -l app=mcp-server-langgraph -o jsonpath='{.items[0].metadata.name}') \
  -c cloud-sql-proxy \
  -- wget -qO- http://localhost:9801/readiness
Should return: ok

Workload Identity

Verify service account annotation:
kubectl get sa production-mcp-server-langgraph \
  -n mcp-production \
  -o jsonpath='{.metadata.annotations.iam\.gke\.io/gcp-service-account}'
Should show: mcp-prod-app-sa@PROJECT_ID.iam.gserviceaccount.com

Troubleshooting

Diagnosis:
kubectl describe pod POD_NAME -n mcp-production
Common causes:
  1. Resource requests too high (Autopilot provisions automatically but has limits)
  2. Image pull errors (check Workload Identity permissions)
  3. Binary Authorization blocking unsigned images
Solution:
  • Reduce CPU/memory requests in deployment
  • Verify image exists in Artifact Registry
  • Sign the image if Binary Auth is enabled
Diagnosis:
kubectl logs -n mcp-production POD_NAME -c cloud-sql-proxy
Solution:
  1. Verify private service connection exists
  2. Check Cloud SQL instance is running
  3. Verify Cloud SQL Proxy sidecar configuration
  4. Check Workload Identity IAM bindings
Diagnosis:
kubectl get sa -n mcp-production production-mcp-server-langgraph -o yaml

gcloud iam service-accounts get-iam-policy \
  mcp-prod-app-sa@PROJECT_ID.iam.gserviceaccount.com
Solution:
  1. Verify annotation: iam.gke.io/gcp-service-account
  2. Check IAM binding exists
  3. Wait 1-2 minutes for propagation
For complete troubleshooting, see GKE Operational Runbooks.

Cost Optimization

Expected Monthly Costs

  • Production
  • With Commitments
ComponentConfigurationCost/Month
GKE Autopilot~25 pods (500m CPU, 1GB RAM avg)$360
Cloud SQL4 vCPU, 15GB RAM, HA + replica$280
Memorystore5GB Redis HA$220
NetworkingNAT, egress$60
MonitoringStandard retention$50
Total$970/month
vs. Traditional GKE: 1,290/month(Save1,290/month (Save 320/month = 25%)

Cost Optimization Guide

Learn how to achieve 40-60% cost savings with rightsizing, committed use discounts, and automation.

Next Steps



Support Resources

Complete Technical Documentation

For detailed technical documentation, see:
  • GKE Deployment Guide (800+ lines, root directory)
  • Terraform Module READMEs (5,000+ lines technical docs)
  • GCP Best Practices Summary (root directory)
Need help? Check our operational runbooks.