Table of Contents
- Executive Summary
- Component Resource Requirements
- Deployment Scenarios
- VM Sizing Recommendations
- Storage Requirements
- Network Requirements
- VMware-Specific Optimizations
- Monitoring and Operations
- Scaling Considerations
- Cost Optimization
Executive Summary
This document provides detailed resource estimates for deploying the MCP Server LangGraph application stack on VMware infrastructure. The application is a production-ready MCP server with LangGraph, featuring comprehensive authentication (JWT/Keycloak), fine-grained authorization (OpenFGA), secrets management (Infisical), and OpenTelemetry-based observability.Quick Reference
| Deployment Scenario | VMs | Total vCPUs | Total RAM | Total Storage |
|---|---|---|---|---|
| Development/Testing | 2-3 | 12-16 | 16-24 GB | 100-150 GB |
| Production (Standard) | 4-6 | 32-48 | 48-72 GB | 250-400 GB |
| Production (High Availability) | 6-10 | 64-96 | 96-144 GB | 500-800 GB |
Component Resource Requirements
Based on the Kubernetes manifests and Docker configurations, here are the resource requirements for each component:1. MCP Server LangGraph (Main Application)
Purpose: AI agent server with LangGraph, multi-LLM support, and MCP protocol Container Resources (per pod):- Requests: 500m CPU, 512 Mi RAM
- Limits: 2000m CPU, 2 Gi RAM
- Replicas: 3 (default), 3-10 (autoscaling)
- CPU: 1.5 cores (requests) to 6 cores (limits)
- RAM: 1.5 GB (requests) to 6 GB (limits)
- Python 3.12-based FastAPI application
- Multi-LLM support (Anthropic, OpenAI, Google, Azure, AWS Bedrock, Ollama)
- Stateful agent with checkpointing
- Health checks with startup/liveness/readiness probes
2. PostgreSQL (Shared Database)
Purpose: Database for OpenFGA and Keycloak Container Resources:- Requests: 250m CPU, 512 Mi RAM
- Limits: 2000m CPU, 2 Gi RAM
- Replicas: 1 (StatefulSet)
- Persistent Volume: 10 Gi (default)
- Recommended: 20-50 Gi for production
- CPU: 0.25 to 2 cores
- RAM: 512 MB to 2 GB
- Storage: 10-50 GB persistent
3. OpenFGA (Authorization)
Purpose: Fine-grained authorization (Zanzibar-style) Container Resources (per pod):- Requests: 250m CPU, 256 Mi RAM
- Limits: 1000m CPU, 1 Gi RAM
- Replicas: 2 (HA deployment)
- CPU: 0.5 to 2 cores
- RAM: 512 MB to 2 GB
4. Keycloak (Identity Management)
Purpose: SSO, user management, OAuth2/OIDC Container Resources (per pod):- Requests: 500m CPU, 1 Gi RAM
- Limits: 2000m CPU, 2 Gi RAM
- Replicas: 2 (HA deployment)
- CPU: 1 to 4 cores
- RAM: 2 GB to 4 GB
- Keycloak is Java-based and benefits from higher RAM allocation
- Startup time: 60-120 seconds
5. Redis (Session Management + Checkpoints)
Purpose: Session storage and conversation checkpoints Container Resources:- Requests: 100m CPU, 256 Mi RAM
- Limits: 500m CPU, 1 Gi RAM
- Replicas: 1 (can scale to Redis Sentinel/Cluster for HA)
- Persistent Volume: 5 Gi (default)
- Configuration: AOF + RDB persistence
- CPU: 0.1 to 0.5 cores
- RAM: 256 MB to 1 GB
- Storage: 5-10 GB persistent
6. OpenTelemetry Collector
Purpose: Observability data collection and export Container Resources (per pod):- Requests: 200m CPU, 256 Mi RAM
- Limits: 1000m CPU, 512 Mi RAM
- Replicas: 2 (default), 2-10 (autoscaling)
- CPU: 0.4 to 2 cores
- RAM: 512 MB to 1 GB
7. Monitoring Stack (Optional but Recommended)
Prometheus
- CPU: 500m to 2 cores
- RAM: 2 GB to 4 GB
- Storage: 50-100 GB (time-series data)
Grafana
- CPU: 200m to 1 core
- RAM: 512 MB to 1 GB
- Storage: 5-10 GB
Jaeger
- CPU: 500m to 2 cores
- RAM: 1 GB to 2 GB
- Storage: 20-50 GB (trace data)
Deployment Scenarios
Scenario 1: Development/Testing
Use Case: Local development, testing, CI/CD pipelines Components:- 1x MCP Server LangGraph (single replica)
- 1x PostgreSQL
- 1x OpenFGA (single replica)
- 1x Redis
- Optional: Lightweight monitoring (Prometheus + Grafana)
Option A: Consolidated (2 VMs)
-
Application VM (all services except DB)
- vCPUs: 6-8
- RAM: 8-12 GB
- Storage: 50 GB
-
Database VM (PostgreSQL + Redis)
- vCPUs: 4
- RAM: 6-8 GB
- Storage: 50-100 GB
Option B: Minimal (1 VM)
- vCPUs: 8-12
- RAM: 16-20 GB
- Storage: 100 GB
- No Keycloak (use in-memory auth)
- No autoscaling
- Minimal monitoring
- Suitable for developers and staging environments
Scenario 2: Production (Standard)
Use Case: Production deployment with moderate traffic (< 1000 req/min) Components:- 3x MCP Server LangGraph
- 1x PostgreSQL (with backups)
- 2x OpenFGA
- 2x Keycloak
- 1x Redis (with persistence)
- 2x OpenTelemetry Collector
- Full monitoring stack
-
Control Plane VM (Kubernetes control plane, if using K8s)
- vCPUs: 4
- RAM: 8 GB
- Storage: 100 GB
-
Application VM 1 (Node 1)
- vCPUs: 8
- RAM: 16 GB
- Storage: 100 GB
- Workload: 1x MCP Server, 1x OpenFGA, 1x Keycloak, 1x OTel Collector
-
Application VM 2 (Node 2)
- vCPUs: 8
- RAM: 16 GB
- Storage: 100 GB
- Workload: 1x MCP Server, 1x OpenFGA, 1x Keycloak, 1x OTel Collector
-
Application VM 3 (Node 3)
- vCPUs: 8
- RAM: 16 GB
- Storage: 100 GB
- Workload: 1x MCP Server
-
Database VM
- vCPUs: 4-8
- RAM: 8-16 GB
- Storage: 200 GB (100 GB OS + data, 100 GB backups)
-
Monitoring VM (Optional)
- vCPUs: 4
- RAM: 8 GB
- Storage: 150 GB
- Workload: Prometheus, Grafana, Jaeger
- vCPUs: 32-48
- RAM: 48-72 GB
- Storage: 250-400 GB (excluding long-term backup storage)
- Pod anti-affinity across VMs
- 2 replicas of critical services (OpenFGA, Keycloak)
- Pod Disruption Budgets (minimum 2 available for MCP Server)
- PostgreSQL with daily backups
Scenario 3: Production (High Availability)
Use Case: Production with high traffic (> 1000 req/min), mission-critical Components:- 3-10x MCP Server LangGraph (autoscaling)
- 3x PostgreSQL (HA cluster with replication)
- 2-3x OpenFGA
- 2-3x Keycloak
- 3x Redis (Sentinel/Cluster mode)
- 2-10x OpenTelemetry Collector (autoscaling)
- Full monitoring stack with high availability
-
Control Plane VMs (3 VMs for HA Kubernetes control plane)
- Each: vCPUs: 4, RAM: 8 GB, Storage: 100 GB
-
Application VMs (3-5 VMs for worker nodes)
- Each: vCPUs: 12-16, RAM: 24-32 GB, Storage: 100 GB
-
Database VMs (3 VMs for PostgreSQL HA with replication)
- Primary: vCPUs: 8, RAM: 16 GB, Storage: 200 GB
- Replica 1: vCPUs: 8, RAM: 16 GB, Storage: 200 GB
- Replica 2: vCPUs: 8, RAM: 16 GB, Storage: 200 GB
-
Monitoring VMs (2 VMs for redundant monitoring)
- Each: vCPUs: 4-6, RAM: 8-12 GB, Storage: 150-200 GB
- vCPUs: 64-96
- RAM: 96-144 GB
- Storage: 500-800 GB (excluding long-term storage)
- Multi-master PostgreSQL with synchronous replication
- Redis Sentinel for automatic failover
- Geographic distribution across availability zones/clusters
- Autoscaling based on CPU (70%) and Memory (80%) utilization
- Service mesh (Istio/Linkerd) optional
- External secrets management (HashiCorp Vault, AWS Secrets Manager)
Storage Requirements
Persistent Volume Breakdown
| Component | Type | Size (Dev) | Size (Prod Standard) | Size (Prod HA) | IOPS Requirements |
|---|---|---|---|---|---|
| PostgreSQL Data | RWO | 10 GB | 20-50 GB | 100-200 GB | 1000-3000 |
| PostgreSQL Backups | RWO | N/A | 50-100 GB | 200-500 GB | 500-1000 |
| Redis Data | RWO | 5 GB | 5-10 GB | 10-20 GB | 500-1000 |
| Prometheus TSDB | RWO | 20 GB | 50-100 GB | 100-200 GB | 500-1000 |
| Jaeger Traces | RWO | 10 GB | 20-50 GB | 50-100 GB | 500-1000 |
| Grafana Data | RWO | 5 GB | 5-10 GB | 10 GB | 100-500 |
| Application Logs | RWX | Optional | 10-20 GB | 20-50 GB | 500 |
VMware Storage Recommendations
-
Storage Classes:
- High Performance (SSD/NVMe): PostgreSQL, Redis
- Standard Performance (SAS/SATA): Application containers, logs
- Archive (SATA): Backups, long-term trace storage
-
Storage Protocols:
- VMFS Datastores: Best for RWO (ReadWriteOnce) volumes
- NFS/vSAN: Required for RWX (ReadWriteMany) if using shared logs
- vSphere CSI Driver: Recommended for dynamic provisioning
-
Backup Strategy:
- PostgreSQL: Daily full backup + WAL archiving
- Redis: RDB snapshots + AOF logs
- Application State: Container images in registry
- Retention: 7 days (daily), 4 weeks (weekly), 12 months (monthly)
Network Requirements
Bandwidth Estimates
| Traffic Type | Dev | Prod Standard | Prod HA |
|---|---|---|---|
| External API Traffic | 10-50 Mbps | 100-500 Mbps | 500 Mbps - 2 Gbps |
| LLM API Calls | 10-100 Mbps | 100-500 Mbps | 500 Mbps - 1 Gbps |
| Internal Service Mesh | 50-100 Mbps | 200-500 Mbps | 500 Mbps - 2 Gbps |
| Observability Data | 10-50 Mbps | 50-200 Mbps | 200-500 Mbps |
| Database Replication | N/A | 10-50 Mbps | 50-200 Mbps |
Network Configuration
-
VM Network Adapters:
- Type: VMXNET3 (paravirtualized, best performance)
- NICs per VM:
- Application VMs: 1 NIC (1-10 Gbps)
- Database VMs: 2 NICs (dedicated replication network)
- Control Plane: 1 NIC (1-10 Gbps)
-
Network Segmentation:
- Public Network: External API traffic (load balancer)
- Private Network: Inter-service communication (overlay network)
- Management Network: SSH, monitoring, backups
- Storage Network: iSCSI, NFS (if using network storage)
-
Firewall Rules:
- Inbound: 443/TCP (HTTPS), 80/TCP (HTTP redirect)
- Inter-VM:
- 8000/TCP (MCP Server)
- 5432/TCP (PostgreSQL)
- 6379/TCP (Redis)
- 8080/TCP (OpenFGA, Keycloak)
- 4317/TCP, 4318/TCP (OTLP)
- Outbound:
- 443/TCP (LLM APIs, secrets management)
- 53/UDP (DNS)
-
Load Balancing:
- External LB: VMware NSX-T, HAProxy, or cloud load balancer
- Internal LB: Kubernetes Service (ClusterIP with kube-proxy)
- Session Affinity: Not required (stateless with Redis sessions)
VMware-Specific Optimizations
1. VM Configuration Best Practices
CPU Configuration
- vCPU Allocation: 1:2 to 1:4 overcommitment ratio
- CPU Reservation: Reserve 50% for production VMs
- CPU Shares: High priority for database and application VMs
- NUMA Optimization: Enable vNUMA for VMs with >8 vCPUs
- CPU Hot-Add: Enable for application VMs (not recommended for DB)
Memory Configuration
- Memory Reservation: 100% for database VMs, 50% for application VMs
- Memory Shares: High priority for database and Keycloak VMs
- Balloon Driver: Enabled (VMware Tools required)
- Memory Hot-Add: Enable for application VMs
- Large Memory Pages: Enable for database VMs
Storage Configuration
- Disk Type: Thick Provisioned Eager Zeroed for production
- SCSI Controller: VMware Paravirtual (PVSCSI) for database VMs
- Multi-Writer: Disable (not needed for this workload)
- Disk Shares: High for database VMs, Normal for applications
2. VMware vSphere Features
High Availability (HA)
- VM Restart Priority:
- High: PostgreSQL, Redis
- Medium: MCP Server, Keycloak, OpenFGA
- Low: Monitoring stack
- Host Isolation Response: Leave Powered On
- Datastore Heartbeats: Enable for network isolation detection
Distributed Resource Scheduler (DRS)
- Automation Level: Fully Automated
- VM-VM Anti-Affinity Rules:
- Keep PostgreSQL replicas on different hosts
- Keep MCP Server replicas on different hosts
- VM-Host Affinity Rules: Pin database VMs to high-performance hosts
Storage DRS
- SDRS Mode: Fully Automated
- I/O Metric: Enable SDRS for load balancing
- Affinity Rules: Keep DB and logs on separate datastores
3. Resource Pools
Create resource pools for workload isolation:4. Monitoring and Alerts
vSphere Metrics to Monitor
- CPU Ready Time: < 5% (indicates CPU contention)
- CPU Co-Stop: < 3% (indicates vCPU scheduling issues)
- Memory Ballooning: < 1% (indicates memory pressure)
- Storage Latency: < 20ms for DB, < 50ms for apps
- Network Dropped Packets: 0 (indicates network saturation)
Integration with Application Monitoring
- Use VMware vRealize Operations for VM-level metrics
- Correlate with Prometheus/Grafana for application metrics
- Set up alerts for resource exhaustion before it impacts apps
Monitoring and Operations
Resource Utilization Monitoring
Key Metrics (Prometheus)
Grafana Dashboards
- Infrastructure Dashboard: VM resources, storage, network
- Application Dashboard: MCP Server, LLM calls, agent performance
- Database Dashboard: PostgreSQL queries, connections, replication lag
- Observability Dashboard: OpenTelemetry pipeline health
Capacity Planning Alerts
Scaling Considerations
Horizontal Pod Autoscaling (HPA)
Based on the Kubernetes configurations:MCP Server LangGraph
- CPU > 70% for 30 seconds → scale up (max 4 pods or 100% increase)
- CPU < 40% for 5 minutes → scale down (max 2 pods or 50% decrease)
- Reserve capacity for 2x current load (6-8 pods)
- Add 1 VM for every 3-4 additional pods
OpenTelemetry Collector
- Scales with observability data volume
- 1 collector can handle ~1000 req/sec trace data
Vertical Scaling
When to Scale Up VM Resources
- Consistent CPU > 80% across all pods for 1+ hour
- Memory pressure causing OOM kills or pod evictions
- Disk I/O wait > 10% during normal operations
- Network saturation on VM NICs
Recommended Scaling Steps
| Metric | Current | Step 1 | Step 2 | Step 3 |
|---|---|---|---|---|
| vCPUs | 8 | 12 | 16 | 24 |
| RAM | 16 GB | 24 GB | 32 GB | 48 GB |
| Storage | 100 GB | 200 GB | 400 GB | 800 GB |
- CPU/Memory: Requires VM reboot (plan maintenance window)
- Storage: Can be expanded online (no downtime)
Load Testing Recommendations
Before deploying to production, conduct load testing:- Throughput: 500-1000 req/sec
- Latency (p95): < 5s for agent responses
- Latency (p99): < 10s
- Concurrent Users: 1000-5000
Cost Optimization
Resource Right-Sizing
Development Environment
- Reduce replicas: 1 instance of all services
- Disable monitoring: Use lightweight Prometheus only
- Use in-memory auth: Skip Keycloak
- Smaller VMs: 4 vCPU, 8 GB RAM per VM
Production Environment
- Auto-shutdown: Dev/test environments during off-hours
- Spot Instances: For non-critical monitoring workloads (if using cloud)
- Tiered Storage: Move old logs/traces to cheaper storage after 30 days
VMware License Optimization
| Feature | Required License | Alternative |
|---|---|---|
| vSphere HA | Standard+ | Manual failover |
| DRS | Standard+ | Manual load balancing |
| Storage DRS | Enterprise+ | Manual storage management |
| vRealize Operations | Additional license | Open-source monitoring |
Multi-Tenancy
If running multiple environments (dev, staging, prod):- Option 1: Separate clusters (better isolation, higher cost)
- Option 2: Resource pools on same cluster (lower cost, shared resources)
- Option 3: Namespaces in Kubernetes on same VMs (lowest cost, minimal isolation)
Summary and Recommendations
Recommended Starting Configuration
For a typical production deployment: Infrastructure:- VMware Cluster: 4-6 ESXi hosts (for DRS and HA)
- Total VMs: 6 (1 control plane, 3 workers, 1 database, 1 monitoring)
- Total Resources: 40 vCPUs, 64 GB RAM, 400 GB storage
- Start with Production (Standard) scenario
- Monitor for 30 days
- Scale based on actual utilization patterns
- Plan for 2x capacity headroom for traffic spikes
Key Takeaways
- Start Small: Development scenario can run on 2 VMs (16 vCPUs, 24 GB RAM)
- Plan for HA: Production needs minimum 3 VMs for application redundancy
- Database is Critical: Allocate best storage and CPU to PostgreSQL
- Monitor First: Use Prometheus/Grafana to identify actual bottlenecks
- Autoscaling Works: HPA can handle 3x traffic spikes without manual intervention
- Storage Grows: Plan for 50-100 GB/month growth in observability data
Next Steps
-
Provision Infrastructure:
- Set up VMware resource pools
- Create VM templates (Kubernetes node OS)
- Configure storage classes (SSD, SATA, backup)
-
Deploy Kubernetes:
- Install Kubernetes 1.28+ (kubeadm, Rancher, or Tanzu)
- Configure vSphere CSI driver
- Set up network overlay (Calico, Cilium)
-
Deploy Application:
- Use Helm charts:
helm install langgraph-agent ./deployments/helm/mcp-server-langgraph - Or Kustomize:
kubectl apply -k deployments/kubernetes/overlays/production
- Use Helm charts:
-
Validate and Monitor:
- Run load tests
- Tune resource requests/limits based on actual usage
- Set up alerting and runbooks
Support
For deployment assistance:- Documentation: Kubernetes Deployment Guide
- Issues: https://github.com/vishnu2kmohan/mcp-server-langgraph/issues
- Discussions: https://github.com/vishnu2kmohan/mcp-server-langgraph/discussions
Document Version: 1.0 Last Updated: 2025-10-16 Maintained By: MCP Server LangGraph Contributors