Overview
This guide covers setting up Vertex AI with Workload Identity Federation on GKE for secure, keyless authentication. This approach eliminates the need for service account keys and provides automatic credential rotation.Workload Identity is the recommended way to access Google Cloud services from GKE. It provides better security than service account keys and simplifies credential management.
Architecture
Workload Identity Integration Flow
This diagram shows the complete authentication flow from a GKE pod to GCP services using Workload Identity Federation, eliminating the need for credential files:Key Benefits of Workload Identity:
- No credential files: Kubernetes tokens are automatically exchanged for GCP credentials
- Automatic rotation: GCP handles credential rotation without manual intervention
- Fine-grained access: Each Kubernetes service account maps to a specific GCP service account with minimal permissions
- Audit trail: All GCP API calls are attributed to the specific service account identity
Prerequisites
Before starting, ensure you have:- ✅ GKE cluster with Workload Identity enabled
- ✅ Staging infrastructure deployed (run
setup-staging-infrastructure.shfirst) - ✅ gcloud CLI installed and authenticated
- ✅ kubectl installed and configured
- ✅ Billing enabled on your GCP project
Quick Setup
Automated Setup (Recommended)
The easiest way to set up Vertex AI with Workload Identity:- ✅ Enable Vertex AI API
- ✅ Create
vertex-ai-stagingservice account - ✅ Grant necessary IAM permissions
- ✅ Bind Kubernetes SA to GCP SA
- ✅ Annotate Kubernetes service account
- ✅ Verify configuration
Manual Setup Steps (Advanced)
Manual Setup Steps (Advanced)
Configuration
Environment Variables
The deployment is configured with the following environment variables indeployments/overlays/staging-gke/deployment-patch.yaml:
Service Account Annotation
The Kubernetes service account is annotated to use Workload Identity:Deployment
Deploy to Staging
After running the setup script, deploy the updated configuration:Verification
1. Verify Workload Identity Binding
2. Verify Kubernetes Annotation
3. Test Authentication from Pod
4. Test Vertex AI Access
5. Test with LiteLLM
Troubleshooting
Pod cannot authenticate
Pod cannot authenticate
Error:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentialsSolutions:- Verify Workload Identity is enabled on cluster:
- Check service account annotation:
- Verify IAM binding:
- Restart pods to pick up new annotation:
Permission denied errors
Permission denied errors
Error:
403 Permission denied on resource projectSolutions:- Verify service account has required roles:
- Grant missing permissions:
Model not found errors
Model not found errors
Error:
404 The model requested does not existSolutions:- Verify model name format for Vertex AI:
- Check available models:
- Use supported model names:
gemini-2.5-flashgemini-2.5-pro
Workload Identity not working
Workload Identity not working
Error: Pod uses default compute service account instead of vertex-ai-stagingSolutions:
- Ensure pod spec uses correct service account:
- Check if annotation was applied before pod creation:
- Verify namespace has Workload Identity enabled:
Security Considerations
Benefits of Workload Identity
No Key Management
No service account keys to create, rotate, or secure. Authentication is handled automatically by GKE.
Automatic Rotation
Credentials are automatically rotated by Google Cloud. No manual intervention required.
Least Privilege
Each pod gets only the permissions it needs via IAM bindings. No shared credentials.
Audit Trail
All API calls are logged with the service account identity. Easy to audit and monitor.
IAM Permissions
Thevertex-ai-staging service account has been granted these roles:
| Role | Purpose | Permissions |
|---|---|---|
roles/aiplatform.user | Vertex AI API access | Call Vertex AI APIs, use models |
roles/aiplatform.developer | Model management | Deploy models, manage endpoints |
roles/logging.logWriter | Cloud Logging | Write logs to Cloud Logging |
roles/monitoring.metricWriter | Monitoring | Write custom metrics |
Best Practices
- ✅ Do:
- Use Workload Identity for all GCP service access
- Grant minimum required permissions
- Monitor API usage and costs
- Set up quota alerts
- ❌ Don’t:
- Create service account keys
- Grant overly broad permissions
- Share service accounts across environments
- Ignore quota warnings
Cost Management
Vertex AI Pricing
Vertex AI charges per 1,000 characters (roughly equivalent to tokens):| Model | Input (per 1M chars) | Output (per 1M chars) | Status |
|---|---|---|---|
| gemini-2.5-flash | $0.075 | $0.30 | ✅ Production-ready |
| gemini-2.5-pro | $0.625 | $5.00 | ✅ Production-ready |
gemini-2.5-flash and gemini-2.5-pro are production-grade models recommended for enterprise deployments. They offer stable performance, SLA guarantees, and are suitable for production workloads. Other Gemini 2.5 variants (if any) may be experimental or preview releases.
Setting Quotas
Monitoring Costs
Migration from Google AI Studio
If you’re currently using Google AI Studio API keys, here’s how to migrate:1
Update Environment Variables
Change from API key to Vertex AI configuration:Before (Google AI Studio):After (Vertex AI):
2
Run Setup Script
3
Deploy Updated Configuration
4
Verify Migration
Next Steps
Google Gemini Guide
Learn about Gemini model features and capabilities
GKE Staging Deployment
Complete GKE staging deployment guide
Observability
Monitor Vertex AI usage and performance
Production Checklist
Prepare for production deployment
Vertex AI with Workload Identity Configured! Your staging environment now uses keyless authentication for secure, scalable AI access.