Deployment Issues Prevention Guide
Last Updated: 2025-11-02 Applies To: All deployment types (Docker, GKE, EKS, AKS, Rancher, on-premises Kubernetes)Overview
This document catalogues all deployment issues encountered and provides universal prevention strategies to ensure they never occur again, regardless of deployment platform.Issue Catalog
Issue #1: Docker Editable Install Incompatibility
Commit: Introduced ina0ba7a1 (Oct 31), Fixed in 3833ae6 (Nov 2)
Platform: Universal (all Docker-based deployments)
Severity: Critical (prevents container startup)
Problem:
-e) creates .pth pointer files in site-packages that reference /app/src/. When only site-packages is copied (not the full venv), the pointers break.
Prevention:
- Never use editable install in multi-stage Docker builds
- Use regular install:
uv pip install --no-deps . - Add validation check:
Issue #2: Cloud SQL shared_buffers Configuration
Commit: Fixed in9a7b84f (Nov 2)
Platform: GCP Cloud SQL, AWS RDS, Azure Database for PostgreSQL
Severity: Critical (prevents database creation)
Problem:
-
Calculate from RAM:
shared_buffers = RAM * 0.25 / 8KB- 4GB RAM → 1GB * 0.25 = 256MB → 256MB / 8KB = 32768 pages → TOO LOW
- Recommendation: Use 512MB for 4GB RAM → 65536 pages
- Provider-specific validation:
| Provider | Instance Size | Recommended shared_buffers | Min | Max |
|---|---|---|---|---|
| GCP Cloud SQL | 4GB RAM | 65536 (512MB) | 52428 | 314572 |
| AWS RDS | db.t3.medium (4GB) | 131072 (1GB) | Variable | Variable |
| Azure Database | B1ms (4GB) | 65536 (512MB) | Variable | Variable |
Issue #3: VPC Peering for Private Services
Commit: Fixed in616f81b (Nov 2)
Platform: GCP, AWS, Azure private cloud services
Severity: Critical (prevents private resource creation)
Problem:
- Always configure VPC peering BEFORE creating managed services
Issue #4: Pod Security Context Missing
Commit: Fixed in previous session Platform: Universal (all Kubernetes platforms with Pod Security Standards) Severity: High (pods fail admission) Problem:- Always define security context for ALL containers:
- Pod-level security context:
Issue #5: RBAC Resources Not Needed
Commit: Fixed inb97567c (Nov 2)
Platform: Universal (all Kubernetes platforms)
Severity: Medium (requires elevated permissions, violates least privilege)
Problem:
- Audit RBAC necessity:
- Keep ONLY ServiceAccount with workload identity:
- Use External Secrets Operator instead of in-cluster RBAC for secret access
Issue #6: kubectl Client-Side vs Server-Side Validation
Commit: Documented in EKS/AKS guide Platform: Universal (all Kubernetes platforms) Severity: High (false validation success) Problem:- Always use server-side validation:
- Verify CRDs exist first:
Issue #7: Namespace Creation Timing
Commit: Documented in EKS/AKS guide Platform: Universal (all Kubernetes platforms) Severity: Medium (prevents rollback, complicates validation) Problem:- Create namespace BEFORE validation:
- Make namespace creation idempotent (using
--dry-run=client | kubectl apply)
Issue #8: Environment Variable value/valueFrom Conflict
Commit: Documented in EKS/AKS guide Platform: Universal (all Kustomize deployments) Severity: High (invalid Kubernetes manifest) Problem:value and valueFrom (invalid).
Prevention:
- Patch ConfigMap data, not Deployment env:
- Use strategic merge patch for env array replacement:
Universal Prevention Checklist
Use this checklist for ALL new deployment configurations:✅ Docker Configuration
- Use regular install, NOT editable (
-e) in multi-stage builds - Validate package import after build
- Use distroless or minimal base images
- Test image locally before CI/CD
✅ Managed Database Configuration
- Calculate
shared_buffersfrom instance RAM (25% of RAM) - Validate configuration against provider limits
- Configure VPC peering/subnet groups BEFORE creation
- Test configuration with smallest instance size first
✅ Kubernetes Manifests
- Security contexts on ALL containers (main + init + sidecar)
- Pod-level security context with seccomp profile
- Explicit resource requests/limits (min 500m CPU for safety)
- Remove unused RBAC resources (audit with code search)
- Use ConfigMap patches for env var overrides
- Strategic merge patches for namespace resources
✅ CI/CD Workflows
- Server-side kubectl validation (
--dry-run=server) - Create namespace before validation
- Verify CRDs exist before validation
- Validate security contexts in manifests
- Check for env value/valueFrom conflicts
- Validate RBAC necessity
✅ Infrastructure Setup
- VPC peering configured before managed services
- Database configuration validated before creation
- Secrets exist before deployment
- Network connectivity tested
Platform-Specific Quick Reference
GCP / GKE
Minimum CPU: 500m with pod anti-affinity, 250m without Workload Identity:iam.gke.io/gcp-service-account
Managed Database: Cloud SQL (requires VPC peering)
Validation: Server-side dry-run
AWS / EKS
Minimum CPU: 250m (Fargate), flexible (EC2) Workload Identity:eks.amazonaws.com/role-arn (IRSA)
Managed Database: RDS (requires DB subnet group)
Validation: Server-side dry-run
Azure / AKS
Minimum CPU: Flexible (check node pool quotas) Workload Identity:azure.workload.identity/client-id
Managed Database: Azure Database (requires service endpoint)
Validation: Server-side dry-run
Rancher / On-Premises
Minimum CPU: Depends on cluster configuration Workload Identity: Platform-specific or service account tokens Managed Database: Self-hosted PostgreSQL Validation: Server-side dry-runAutomated Validation Script
Create.github/workflows/validate-deployment.yaml:
Testing Methodology
Pre-Deployment Testing
Post-Deployment Validation
Conclusion
By following this prevention guide and implementing the automated validation checks, all 8 critical deployment issues are prevented across:- ✅ Docker builds (all platforms)
- ✅ GCP (GKE, Cloud SQL, VPC)
- ✅ AWS (EKS, RDS, VPC)
- ✅ Azure (AKS, Azure Database, VNet)
- ✅ Rancher / On-Premises Kubernetes
- ✅ Any Kubernetes platform with Pod Security Standards
- Always use server-side kubectl validation
- Never use editable install in multi-stage Docker builds
- Configure VPC peering before creating managed databases
- Security contexts required on ALL containers
- Remove unused RBAC resources
- Calculate database
shared_buffersfrom instance RAM - Create namespaces before validation
- Patch ConfigMaps, not Deployment env arrays
- Implement automated validation workflow
- Add validation to existing CI/CD pipelines
- Test on fresh environments to verify prevention
- Update deployment documentation with checklist