Skip to main content

Overview

MCP Server LangGraph uses Terraform as the Infrastructure as Code (IaC) tool to provision and manage cloud infrastructure across GCP and AWS. This modular approach enables consistent, repeatable, and version-controlled infrastructure deployments.

Architecture Principles

1

Modularity

Reusable modules with clear interfaces
  • Each cloud resource is a self-contained module
  • Modules can be composed for different environments
  • No hard-coded values
2

Environment Parity

Consistent configs across dev/staging/prod
  • Same module versions
  • Parameter-driven differences (size, HA)
  • Promote confidence through consistency
3

State Isolation

Separate state per environment
  • Development: dev-tfstate bucket/prefix
  • Staging: staging-tfstate bucket/prefix
  • Production: prod-tfstate bucket/prefix
4

Security by Default

Zero-trust networking and IAM
  • Private clusters (no public IPs)
  • Workload Identity (GCP) / IRSA (AWS)
  • Encryption at rest and in transit

Module Catalog

GCP Infrastructure

Purpose: VPC-native networking for GKE with Cloud NATResources:
  • VPC with custom subnets
  • Secondary IP ranges for pods and services
  • Cloud NAT with static IPs
  • Firewall rules (IAP, health checks)
  • Private Service Connection for Cloud SQL
Outputs: Network ID, subnet names, NAT IPsBest for: All GKE deployments
Purpose: Fully managed Kubernetes clusterFeatures:
  • Pay-per-pod pricing (40-60% savings)
  • Auto-scaling, auto-repair, auto-upgrade
  • Workload Identity enabled
  • Binary Authorization ready
  • Security Posture Dashboard
  • Dataplane V2 (eBPF-based)
Outputs: Cluster name, endpoint, kubectl commandsBest for: Production workloads
Purpose: Managed PostgreSQL with HAFeatures:
  • Regional HA (99.95% SLA)
  • Point-in-time recovery
  • Read replicas
  • Query Insights
  • Automatic backups
Outputs: Connection name, private IP, credentialsBest for: Stateful applications
Purpose: Managed Redis with HAFeatures:
  • STANDARD_HA tier (99.9% SLA)
  • Automatic failover
  • Persistence (RDB + AOF)
  • Cross-region replicas
Outputs: Host, port, auth stringBest for: Session storage, caching
Purpose: IAM for Kubernetes podsFeatures:
  • No service account keys
  • GCP service account per workload
  • Granular IAM bindings
  • Automatic credential injection
Outputs: Service account emails, K8s manifestsBest for: All GKE workloads
Purpose: GCS buckets for Terraform stateFeatures:
  • Versioning enabled
  • Object lifecycle policies
  • Access logging
  • Encryption at rest
Outputs: Bucket names, backend configBest for: Initial setup (run once)

AWS Infrastructure

AWS infrastructure modules for EKS are already at 96/100 maturity and production-ready. See existing documentation for details.
Available modules:
  • VPC with public/private subnets
  • EKS cluster with managed node groups
  • RDS PostgreSQL with Multi-AZ
  • ElastiCache Redis cluster
  • IRSA for pod IAM

Directory Structure

terraform/
├── backend-setup-gcp/          # State bucket bootstrap
│   ├── main.tf                 # GCS buckets with versioning
│   ├── variables.tf            # Project ID, region, bucket prefix
│   └── README.md               # Setup instructions

├── modules/                    # Reusable modules
│   ├── gcp-vpc/                # VPC networking
│   ├── gke-autopilot/          # GKE cluster
│   ├── cloudsql/               # PostgreSQL
│   ├── memorystore/            # Redis
│   └── gke-workload-identity/  # IAM bindings

└── environments/               # Environment configs
    ├── gcp-dev/                # Development
    │   ├── main.tf             # Module composition
    │   ├── variables.tf        # Input variables
    │   ├── terraform.tfvars    # Dev-specific values
    │   └── backend.tf          # State backend config

    ├── gcp-staging/            # Staging (production-like)
    └── gcp-prod/               # Production (HA, multi-region)

State Management

Backend Configuration

  • GCP (GCS)
  • AWS (S3)
terraform {
  backend "gcs" {
    bucket  = "PROJECT-terraform-state"
    prefix  = "environments/production"
  }
}
Features:
  • Automatic state locking
  • Versioning enabled
  • Encrypted at rest
  • Access logs
Setup: See Backend Setup Guide

Environment Strategy

Development

Purpose: Testing and iterationConfiguration:
  • Zonal cluster (1 zone)
  • Smaller instances
  • No read replicas
  • BASIC Redis tier
Cost: ~$100/month

Staging

Purpose: Pre-production validationConfiguration:
  • Regional cluster (3 zones)
  • Production-like sizing
  • HA databases
  • Full monitoring
Cost: ~$310/month

Production

Purpose: Live workloadsConfiguration:
  • Regional cluster (3 zones)
  • HA for all components
  • Read replicas
  • Full observability
  • Disaster recovery
Cost: ~$970/month (Autopilot)

Deployment Workflow

1

Initialize Backend

cd terraform/backend-setup-gcp
terraform init
terraform apply
Creates GCS buckets for state storage.
2

Configure Environment

cd terraform/environments/gcp-prod

# Edit terraform.tfvars
vim terraform.tfvars
Set project ID, region, cluster name, etc.
3

Plan Infrastructure

terraform init
terraform plan -out=tfplan
Review planned changes.
4

Apply Changes

terraform apply tfplan
Provision infrastructure (~15-30 minutes for full stack).
5

Verify Deployment

# Get kubectl credentials
gcloud container clusters get-credentials CLUSTER_NAME \
  --region REGION \
  --project PROJECT_ID

# Verify nodes
kubectl get nodes

Best Practices

Module Design

DO: Use semantic versioning for module releases
DO: Validate all inputs with Terraform validation blocks
DO: Provide comprehensive outputs for module consumers
DO: Document all variables with descriptions and examples
DON’T: Hard-code values inside modules
DON’T: Mix resource types in a single module
DON’T: Use count/for_each for critical resources (hard to change)

State Management

DO: Use separate state per environment
DO: Enable state locking (GCS automatic, S3 with DynamoDB)
DO: Enable versioning on state buckets
DO: Restrict state bucket access (principle of least privilege)
DON’T: Commit state files to Git
DON’T: Share state across unrelated infrastructure

Security

DO: Use Workload Identity (GCP) or IRSA (AWS)
DO: Enable encryption at rest for all data stores
DO: Use private clusters (no public IPs on nodes)
DO: Rotate credentials regularly via Secret Manager/Secrets Manager
DON’T: Use service account keys or IAM access keys
DON’T: Store secrets in Terraform state (use external secret stores)

Cost Optimization

Strategy: Pay-per-pod vs. paying for idle nodesImplementation:
  • Use GKE Autopilot instead of Standard
  • Right-size pod requests (use VPA)
  • Enable autoscaling
Savings: $200-400/month per environment
Strategy: Commit to 1-year or 3-year usageImplementation:
# Apply CUD to all compute resources
# Purchase via GCP Console: Billing → Commitments
Savings: 242/month(25242/month (25%) with 1-year, 466/month (52%) with 3-year
Strategy: Match resources to actual usageImplementation:
  • Monitor actual CPU/memory usage
  • Downgrade oversized Cloud SQL/Redis instances
  • Remove unused read replicas
Savings: $50-150/month
Strategy: Turn off dev environments after hoursImplementation:
# Cloud Scheduler: Scale to 0 at 6 PM, scale up at 6 AM
gcloud scheduler jobs create http scale-down-dev \
  --schedule="0 18 * * 1-5" \
  --uri="..." \
  --message-body='{"desiredNodeCount":0}'
Savings: $50-70/month per dev environment

Migration Paths

From Manual Infrastructure

1

Import Existing Resources

terraform import google_container_cluster.main projects/PROJECT/locations/REGION/clusters/CLUSTER_NAME
2

Generate Terraform Config

Use terraform plan to match existing state
3

Gradually Adopt IaC

Start with new resources, migrate existing over time

From Other IaC Tools

From Pulumi/CDK:
  • Export resource definitions
  • Convert to HCL syntax
  • Import state
From CloudFormation (AWS):
  • Use former2 to generate Terraform
  • Review and customize
  • Import stacks

Troubleshooting

Symptom: Error acquiring the state lockCause: Previous Terraform run didn’t release lock (crash, Ctrl+C)Solution:
# GCS (automatic after 1 minute)
# Wait or manually delete lock in GCS console

# S3 + DynamoDB
aws dynamodb delete-item \
  --table-name terraform-locks \
  --key '{"LockID": {"S": "BUCKET/PATH/terraform.tfstate"}}'
Symptom: Backend configuration changedSolution:
terraform init -reconfigure
Symptom: Module ... does not matchSolution:
terraform init -upgrade
Symptom: Error 403: The caller does not have permissionSolution: Grant required IAM roles
# For GCP
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="user:YOUR_EMAIL" \
  --role="roles/editor"


Next Steps

1

Set Up State Backend

2

Review Module Documentation

3

Choose Environment Strategy

4

Deploy to Production