Skip to main content

Overview

This guide covers deploying the MCP Server to a production-grade staging environment on Google Kubernetes Engine (GKE) with:
  • 🔒 Security Hardening: Private nodes, Binary Authorization, Network Policies
  • 🔑 Keyless Authentication: Workload Identity Federation for GitHub Actions
  • 🌐 Network Isolation: Separate VPC from production
  • 📊 Full Observability: Cloud Logging, Monitoring, and Trace
  • 🤖 Automated Deployments: GitHub Actions with approval gates
This is a production-ready staging environment suitable for pre-production testing and validation.

Architecture

Prerequisites

  • GCP Project: vishnu-sandbox-20250310 (or your project ID)
  • gcloud CLI: Installed and authenticated
  • kubectl: Installed
  • GitHub Repository: Access to repository settings

Step 1: Infrastructure Setup

Run the automated infrastructure setup script:
## Set your GCP project ID
export GCP_PROJECT_ID=vishnu-sandbox-20250310

## Run the setup script
./scripts/gcp/setup-staging-infrastructure.sh
This script will create:
  • ✅ Staging VPC network (10.1.0.0/20)
  • ✅ GKE Autopilot cluster with security hardening
  • ✅ Cloud SQL PostgreSQL instance
  • ✅ Memorystore Redis instance
  • ✅ Workload Identity for pods
  • ✅ GitHub Actions Workload Identity Federation
  • ✅ Artifact Registry repository
  • ✅ Secret Manager secrets
If you prefer manual setup, follow these steps:

1.1 Create VPC Network

## Create VPC
gcloud compute networks create staging-vpc \
  --subnet-mode=custom \
  --project=vishnu-sandbox-20250310

## Create subnet
gcloud compute networks subnets create staging-gke-subnet \
  --network=staging-vpc \
  --range=10.1.0.0/20 \
  --region=us-central1 \
  --secondary-range pods=10.2.0.0/16,services=10.3.0.0/16 \
  --enable-flow-logs \
  --enable-private-ip-google-access

1.2 Create GKE Cluster

gcloud container clusters create-auto staging-mcp-server-langgraph-gke \
  --region=us-central1 \
  --network=staging-vpc \
  --subnetwork=staging-gke-subnet \
  --enable-private-nodes \
  --workload-pool=vishnu-sandbox-20250310.svc.id.goog \
  --enable-shielded-nodes \
  --shielded-secure-boot \
  --binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE

1.3 Create Managed Services

See the full setup script for complete Cloud SQL and Redis setup.

Setup Output

After running the setup script, you’ll receive:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Staging Infrastructure Setup Complete!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

GitHub Actions Configuration:
  workload_identity_provider: 'projects/PROJECT_NUMBER/locations/global/...'
  service_account: 'mcp-staging-sa@vishnu-sandbox-20250310.iam.gserviceaccount.com'

Security Features Enabled:
  ✓ Private GKE nodes
  ✓ Shielded nodes with secure boot
  ✓ Binary authorization
  ✓ Workload Identity
  ✓ Network isolation (separate VPC)
⚠️ Important: Save the Workload Identity Provider value - you’ll need it for GitHub Actions.

Step 2: Update API Keys

Update the placeholder secrets with real API keys:
## Update Anthropic API key
echo -n "sk-ant-YOUR_REAL_KEY" | gcloud secrets versions add staging-anthropic-api-key --data-file=-

## Update Google API key
echo -n "YOUR_REAL_GOOGLE_KEY" | gcloud secrets versions add staging-google-api-key --data-file=-

Step 3: Install External Secrets Operator

External Secrets Operator syncs secrets from GCP Secret Manager to Kubernetes:
## Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm repo update

helm install external-secrets \
  external-secrets/external-secrets \
  --namespace external-secrets-system \
  --create-namespace \
  --set installCRDs=true
Verify installation:
kubectl get pods -n external-secrets-system

Step 4: Configure GitHub Environment

4.1 Create GitHub Environment

  1. Go to your repository on GitHub
  2. Navigate to SettingsEnvironments
  3. Click New environment
  4. Name it staging
  5. Configure protection rules:
Protection Rules:
  • Required reviewers: Add 1-2 reviewers
  • Wait timer: 5 minutes (allows review before deployment)
  • Deployment branches: main, release/*

4.2 Update GitHub Workflow

Edit .github/workflows/deploy-staging-gke.yaml and replace PROJECT_NUMBER with your actual project number:
## Get your project number:
gcloud projects describe vishnu-sandbox-20250310 --format="value(projectNumber)"

## Update these lines in the workflow:
workload_identity_provider: 'projects/YOUR_PROJECT_NUMBER/locations/global/workloadIdentityPools/github-actions-pool/providers/github-provider'

Step 5: Deploy Application

First-Time Deployment (Manual)

For the first deployment, deploy manually to verify everything works:
## Get GKE credentials
gcloud container clusters get-credentials staging-mcp-server-langgraph-gke \
  --region=us-central1

## Verify connection
kubectl cluster-info

## Deploy application
kubectl apply -k deployments/overlays/staging-gke

## Watch deployment
kubectl rollout status deployment/staging-mcp-server-langgraph -n staging-mcp-server-langgraph

## Verify pods
kubectl get pods -n staging-mcp-server-langgraph

Automated Deployments (GitHub Actions)

Once manual deployment succeeds, GitHub Actions will handle future deployments: Triggered by:
  • ✅ Push to main branch
  • ✅ Pre-release creation
  • ✅ Manual workflow dispatch
Deployment Flow:
  1. Build Docker image
  2. Push to Artifact Registry
  3. Deploy to GKE staging
  4. Run smoke tests
  5. Validate deployment
  6. Auto-rollback on failure

Step 6: Verify Deployment

Run Smoke Tests

./scripts/gcp/staging-smoke-tests.sh
Expected output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Smoke Test Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total Tests:  11
Passed:       11
Failed:       0

✓ All smoke tests passed!

Check Health Endpoints

## Port-forward to service
kubectl port-forward -n staging-mcp-server-langgraph svc/staging-mcp-server-langgraph 8080:80

## Test endpoints
curl http://localhost:8080/health
curl http://localhost:8080/health/ready
curl http://localhost:8080/health/live

View Logs

Cloud Logging (recommended):
gcloud logging read \
  "resource.type=k8s_container
   resource.labels.cluster_name=staging-mcp-server-langgraph-gke
   resource.labels.namespace_name=mcp-staging" \
  --limit=50 \
  --format=json
Kubectl logs:
kubectl logs -n staging-mcp-server-langgraph -l app=mcp-server-langgraph --tail=100

Security Features

Network Isolation

  • Separate VPC: Staging has its own VPC (10.1.0.0/20)
  • Network Policies: Restrict pod-to-pod and egress traffic
  • Private GKE nodes: Nodes have no public IP addresses
Verify network policies:
kubectl get networkpolicies -n staging-mcp-server-langgraph

Workload Identity

Pods authenticate as GCP service accounts without keys:
## Verify Workload Identity annotation
kubectl describe sa mcp-server-langgraph -n staging-mcp-server-langgraph | grep iam.gke.io

## Test from pod
kubectl run -it --rm test \
  --image=google/cloud-sdk:slim \
  --serviceaccount=mcp-server-langgraph \
  --namespace=mcp-staging \
  -- gcloud auth list

Binary Authorization

Only signed/approved images can be deployed:
## Check Binary Authorization status
gcloud container clusters describe staging-mcp-server-langgraph-gke \
  --region=us-central1 \
  --format='value(binaryAuthorization)'

Secret Management

Secrets are stored in GCP Secret Manager and synced via External Secrets:
## List secrets in Secret Manager
gcloud secrets list --filter="name:staging-*"

## View External Secrets sync status
kubectl get externalsecrets -n staging-mcp-server-langgraph

Monitoring & Observability

Cloud Console Dashboards

Access monitoring in Cloud Console:

Key Metrics

Monitor these metrics in Cloud Monitoring:
  • kubernetes.io/container/cpu/core_usage_time - CPU usage
  • kubernetes.io/container/memory/used_bytes - Memory usage
  • kubernetes.io/pod/network/received_bytes_count - Network traffic
  • Custom metrics: custom.googleapis.com/mcp-staging/*

Set Up Alerts

Create alert policies for:
## CPU usage > 80%
## Memory usage > 80%
## Pod restart count > 5
## Deployment replica mismatch
## Cloud SQL connection failures

Troubleshooting

Check pod status:
kubectl describe pod <pod-name> -n staging-mcp-server-langgraph
Common issues:
  • Image pull errors: Check Artifact Registry permissions
  • Cloud SQL proxy fails: Verify service account has cloudsql.client role
  • Secrets not found: Check External Secrets sync status
Check ExternalSecret status:
kubectl describe externalsecret mcp-staging-secrets -n staging-mcp-server-langgraph
Common fixes:
  • Verify Workload Identity binding
  • Check secret exists in Secret Manager
  • Ensure service account has secretAccessor role
Check Cloud SQL proxy logs:
kubectl logs -n staging-mcp-server-langgraph <pod-name> -c cloud-sql-proxy
Verify connection:
# From inside pod
kubectl exec -it <pod-name> -n staging-mcp-server-langgraph -- \
  nc -zv 127.0.0.1 5432
List network policies:
kubectl get networkpolicies -n staging-mcp-server-langgraph
Temporarily disable for testing:
kubectl delete networkpolicy <policy-name> -n staging-mcp-server-langgraph
# Re-apply when done testing

Rollback Procedures

Automatic Rollback

GitHub Actions automatically rolls back on deployment failure.

Manual Rollback

## Rollback to previous version
kubectl rollout undo deployment/staging-mcp-server-langgraph -n staging-mcp-server-langgraph

## Rollback to specific revision
kubectl rollout history deployment/staging-mcp-server-langgraph -n staging-mcp-server-langgraph
kubectl rollout undo deployment/staging-mcp-server-langgraph -n staging-mcp-server-langgraph --to-revision=3

## Verify rollback
kubectl rollout status deployment/staging-mcp-server-langgraph -n staging-mcp-server-langgraph

Cost Optimization

Current Costs (Estimated)

ResourceTierMonthly Cost
GKE Autopilot2-3 pods avg~$100
Cloud SQLdb-custom-1-4096~$40
Memorystore Redis2GB standard~$50
NetworkingVPC, egress~$20
Total~$210/month

Optimization Tips

Use Autopilot

GKE Autopilot optimizes resource usage automatically. You only pay for pod resources, not node overhead.

Rightsize Resources

Review resource requests/limits:
kubectl top pods -n staging-mcp-server-langgraph
Adjust in deployment-patch.yaml if needed.

Use Preemptible Nodes

Not recommended for staging, but possible for dev environments.

Monitor Egress

Excessive egress to LLM APIs can increase costs. Consider:
  • Caching responses
  • Request batching
  • Rate limiting

Next Steps


Staging Deployment Complete! Your production-grade staging environment is ready for testing.