Skip to main content

Overview

This guide provides step-by-step instructions for deploying MCP Server LangGraph on Google Kubernetes Engine (GKE) Autopilot with enterprise-grade security, monitoring, and cost optimization.

Deployment Time

2-3 hours for complete production setup

Cost Savings

40-60% vs. traditional GKE Standard

Infrastructure Maturity

94/100 production-readiness score

Best Practices

100% GCP best practices compliance

What You’ll Deploy

  • GKE Autopilot Cluster: Fully managed Kubernetes (regional, multi-zone)
  • Cloud SQL PostgreSQL: High-availability database with automated backups
    • 3 databases: Keycloak (identity), OpenFGA (authorization), GDPR (compliance per ADR-0041)
  • Memorystore Redis: High-availability cache with persistence
  • Workload Identity: Secure pod-to-GCP service authentication
  • Private Networking: VPC-native with Cloud NAT
  • Observability: Cloud Monitoring, Logging, Trace, Profiler
  • Security: Binary Authorization, Network Policies, Encryption

Key Benefits

GKE Autopilot uses pay-per-pod pricing with no idle node costs. Production environment runs for 8801,275/monthvs.880-1,275/month vs. 1,290-1,970 with traditional GKE.
Google manages all node infrastructure, upgrades, and scaling automatically. Focus on your application, not Kubernetes operations.
Regional deployment across 3 zones with automated failover for databases and cache provides enterprise-grade reliability.
Built-in Workload Identity, Binary Authorization, Shielded Nodes, Network Policies, and encryption at rest/transit.

Prerequisites

1

GCP Account & Project

Create a GCP project with billing enabled:
# Create project
gcloud projects create PROJECT_ID --name="MCP LangGraph Production"

# Link billing
gcloud billing projects link PROJECT_ID --billing-account=BILLING_ACCOUNT_ID

# Set active project
gcloud config set project PROJECT_ID
2

Install Required Tools

ToolVersionInstallation
gcloud CLILatestcurl https://sdk.cloud.google.com | bash
Terraform≥ 1.5.0terraform.io/downloads
kubectl≥ 1.28gcloud components install kubectl
kustomize≥ 5.0brew install kustomize
Verify installations:
gcloud version
terraform version
kubectl version --client
kustomize version
3

Enable Required APIs

Enable 20+ required GCP APIs:
gcloud services enable \
    container.googleapis.com \
    compute.googleapis.com \
    sqladmin.googleapis.com \
    redis.googleapis.com \
    servicenetworking.googleapis.com \
    cloudresourcemanager.googleapis.com \
    iam.googleapis.com \
    binaryauthorization.googleapis.com \
    monitoring.googleapis.com \
    logging.googleapis.com \
    cloudtrace.googleapis.com \
    cloudprofiler.googleapis.com \
    secretmanager.googleapis.com \
    artifactregistry.googleapis.com \
    --project=PROJECT_ID
4

Configure IAM Permissions

Required roles (or roles/owner):
  • roles/compute.networkAdmin
  • roles/container.admin
  • roles/cloudsql.admin
  • roles/redis.admin
  • roles/iam.securityAdmin
  • roles/resourcemanager.projectIamAdmin

Architecture

The production deployment creates a fully managed, highly available infrastructure:

Phase 1: Infrastructure Setup (30 minutes)

Step 1: Create Terraform State Backend

cd terraform/backend-setup-gcp

# Create terraform.tfvars
cat > terraform.tfvars <<EOF
project_id    = "YOUR_PROJECT_ID"
region        = "us-central1"
bucket_prefix = "mcp-langgraph"
EOF

# Deploy
terraform init
terraform apply -auto-approve

# Save bucket name
export TF_STATE_BUCKET=$(terraform output -raw terraform_state_bucket)
Expected: GCS bucket created with versioning and logging enabled

Step 2: Configure Production Environment

cd terraform/environments/gcp-prod

## Update main.tf with state bucket
sed -i "s/bucket = \"mcp-langgraph-terraform-state-us-central1-XXXXX\"/bucket = \"$TF_STATE_BUCKET\"/g" main.tf

## Create terraform.tfvars
cp terraform.tfvars.example terraform.tfvars
Edit terraform.tfvars with your configuration:
project_id  = "YOUR_PROJECT_ID"
region      = "us-central1"
team        = "platform"
cost_center = "engineering"

## Security (enable after Phase 4)
enable_binary_authorization = false
enable_private_endpoint     = false  # Set true for maximum security

## Optional: Restrict control plane access
master_authorized_networks_cidrs = [
  {
    cidr_block   = "YOUR_IP/32"
    display_name = "My IP"
  }
]
Important: Replace YOUR_PROJECT_ID with your actual GCP project ID throughout the configuration.

Phase 2: Deploy Infrastructure (25 minutes)

Step 1: Initialize and Plan

cd terraform/environments/gcp-prod
terraform init
terraform plan -out=tfplan
Review the plan. It should create:
  • 1 VPC network with 3 subnets
  • 1 GKE Autopilot cluster (regional)
  • 1 Cloud SQL instance (PostgreSQL 15, HA)
  • 1 Memorystore instance (Redis 7.0, HA)
  • 2 NAT IPs
  • Multiple service accounts
  • IAM bindings
  • Firewall rules
  • Monitoring alerts

Step 2: Deploy

terraform apply tfplan
Duration: 20-25 minutes. Cloud SQL takes 10-12 minutes, GKE takes 8-10 minutes.

Step 3: Configure kubectl

## Get credentials
eval $(terraform output -raw kubectl_config_command)

## Verify access
kubectl cluster-info
kubectl get nodes
kubectl get namespaces
Expected output:
Kubernetes control plane is running at https://X.X.X.X
NAME              STATUS   AGE
default           Active   5m
kube-system       Active   5m

Phase 3: Application Deployment (20 minutes)

Step 1: Create Secrets in Secret Manager

PROJECT_ID="YOUR_PROJECT_ID"

## Create secret
gcloud secrets create mcp-production-secrets \
  --replication-policy="automatic" \
  --project="$PROJECT_ID"

## Prepare secret data
cat > /tmp/secrets.json <<EOF
{
  "anthropic_api_key": "sk-ant-...",
  "google_api_key": "AIza...",
  "jwt_secret": "$(openssl rand -base64 32)",
  "postgres_password": "$(terraform output -raw cloudsql_user_password)",
  "cloudsql_connection_name": "$(terraform output -raw cloudsql_connection_name)",
  "redis_host": "$(terraform output -raw redis_host)",
  "redis_port": "$(terraform output -raw redis_port)",
  "redis_password": "$(terraform output -raw redis_auth_string)"
}
EOF

## Upload secrets
gcloud secrets versions add mcp-production-secrets \
  --data-file=/tmp/secrets.json \
  --project="$PROJECT_ID"

## Cleanup
rm -f /tmp/secrets.json

Step 2: Install External Secrets Operator

helm repo add external-secrets https://charts.external-secrets.io
helm repo update

helm install external-secrets \
  external-secrets/external-secrets \
  --namespace external-secrets-system \
  --create-namespace \
  --set installCRDs=true

kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/name=external-secrets \
  -n external-secrets-system \
  --timeout=120s

Step 3: Deploy Application

cd deployments/overlays/production-gke

## Update with your project ID
sed -i "s/PROJECT_ID/$PROJECT_ID/g" *.yaml

## Deploy
kubectl apply -k .

## Watch rollout
kubectl rollout status deployment/production-mcp-server-langgraph \
  -n mcp-production \
  --timeout=10m
Verify deployment:
kubectl get pods -n mcp-production
kubectl logs -n mcp-production -l app=mcp-server-langgraph --tail=50

Phase 4: Security Hardening (30 minutes)

Binary Authorization

Enable image signing to ensure only trusted container images run in your cluster:
1

Run Setup Script

./deployments/security/binary-authorization/setup-binary-auth.sh \
  PROJECT_ID \
  production
This creates:
  • KMS key for signing
  • Binary Authorization attestor
  • Enforcement policy
2

Sign Images

IMAGE="us-central1-docker.pkg.dev/PROJECT_ID/mcp-production/mcp-server-langgraph:2.8.0"

./deployments/security/binary-authorization/sign-image.sh \
  PROJECT_ID \
  production \
  "$IMAGE"
3

Enable in Cluster

Edit terraform.tfvars:
enable_binary_authorization = true
Apply:
terraform apply -auto-approve
Learn more about Binary Authorization for complete setup details.

Phase 5: Observability (10 minutes)

Setup Monitoring

## Create dashboards and alerts
./monitoring/gcp/setup-monitoring.sh PROJECT_ID
This configures:
  • Custom Cloud Monitoring dashboard
  • Alert policies (CPU, memory, errors, latency)
  • Uptime checks
  • SLO definitions

Access Dashboards

GKE Workloads

View pod status, deployments, services

Cloud Monitoring

Custom dashboards, metrics, alerts

Cloud Logging

Centralized log aggregation

Cloud Trace

Distributed tracing

Verification & Testing

Health Checks

Test all health endpoints:
kubectl port-forward -n mcp-production \
  svc/production-mcp-server-langgraph 8000:8000 &

curl http://localhost:8000/health/live
curl http://localhost:8000/health/ready
curl http://localhost:8000/health/startup
All health checks should return HTTP 200

Database Connectivity

kubectl exec -it -n mcp-production \
  $(kubectl get pod -n mcp-production -l app=mcp-server-langgraph -o jsonpath='{.items[0].metadata.name}') \
  -c cloud-sql-proxy \
  -- wget -qO- http://localhost:9801/readiness
Should return: ok

Workload Identity

Verify service account annotation:
kubectl get sa production-mcp-server-langgraph \
  -n mcp-production \
  -o jsonpath='{.metadata.annotations.iam\.gke\.io/gcp-service-account}'
Should show: mcp-prod-app-sa@PROJECT_ID.iam.gserviceaccount.com

Troubleshooting

Diagnosis:
kubectl describe pod POD_NAME -n mcp-production
Common causes:
  1. Resource requests too high (Autopilot provisions automatically but has limits)
  2. Image pull errors (check Workload Identity permissions)
  3. Binary Authorization blocking unsigned images
Solution:
  • Reduce CPU/memory requests in deployment
  • Verify image exists in Artifact Registry
  • Sign the image if Binary Auth is enabled
Diagnosis:
kubectl logs -n mcp-production POD_NAME -c cloud-sql-proxy
Solution:
  1. Verify private service connection exists
  2. Check Cloud SQL instance is running
  3. Verify Cloud SQL Proxy sidecar configuration
  4. Check Workload Identity IAM bindings
Diagnosis:
kubectl get sa -n mcp-production production-mcp-server-langgraph -o yaml

gcloud iam service-accounts get-iam-policy \
  mcp-prod-app-sa@PROJECT_ID.iam.gserviceaccount.com
Solution:
  1. Verify annotation: iam.gke.io/gcp-service-account
  2. Check IAM binding exists
  3. Wait 1-2 minutes for propagation
For complete troubleshooting, see GKE Operational Runbooks.

Cost Optimization

Expected Monthly Costs

ComponentConfigurationCost/Month
GKE Autopilot~25 pods (500m CPU, 1GB RAM avg)$360
Cloud SQL4 vCPU, 15GB RAM, HA + replica$280
Memorystore5GB Redis HA$220
NetworkingNAT, egress$60
MonitoringStandard retention$50
Total$970/month
vs. Traditional GKE: 1,290/month(Save1,290/month (Save 320/month = 25%)

Cost Optimization Guide

Learn how to achieve 40-60% cost savings with rightsizing, committed use discounts, and automation.

Next Steps

Operational Runbooks

Day-2 operations, incident response, maintenance procedures

Security Hardening

Enable VPC Service Controls, configure Cloud Armor, implement policies

CI/CD Pipeline

Setup automated deployments with ArgoCD and GitHub Actions

Monitoring & SLOs

Configure custom dashboards, define SLIs/SLOs, set up alerts

Infrastructure as Code

Terraform modules for VPC, GKE, Cloud SQL, Redis

Multi-Environment Setup

Dev, staging, production configurations

Disaster Recovery

Multi-region failover and backup automation

Service Mesh

Anthos Service Mesh for advanced traffic management

Binary Authorization

Image signing and policy enforcement

GKE Preview

Preview environment setup

Support Resources

Complete Technical Documentation

For detailed technical documentation, see:
  • GKE Deployment Guide (800+ lines, root directory)
  • Terraform Module READMEs (5,000+ lines technical docs)
  • GCP Best Practices Summary (root directory)
Need help? Check our operational runbooks.