Google Vertex AI Setup
This guide covers how to use both Anthropic Claude and Google Gemini models via Google Cloud’s Vertex AI platform.Overview
Vertex AI provides enterprise-grade access to multiple LLM providers through a unified API, offering:- Unified Billing: Single GCP invoice for all model usage
- Workload Identity: Keyless authentication on GKE (most secure)
- Enterprise Features: VPC-SC, audit logging, IAM integration
- Multi-Provider: Access both Anthropic Claude AND Google Gemini models
Supported Models
Anthropic Claude (via Vertex AI)
Latest models (November 2025):Google Gemini (via Vertex AI)
Latest models (November 2025):Prerequisites
- GCP Project with Vertex AI API enabled
- Service Account with
Vertex AI Userrole (for local development) - Workload Identity configured (for GKE deployments)
Setup Options
Option 1: Workload Identity on GKE (Recommended)
Most secure - No API keys, automatic credential rotation, follows Google Cloud best practices.Step 1: Enable Workload Identity on Your GKE Cluster
Step 2: Create GCP Service Account
Step 3: Bind Kubernetes Service Account
Step 4: Annotate Kubernetes Service Account
Step 5: Configure Environment Variables
Option 2: Service Account Key (Local Development)
For local development or non-GKE environments.Step 1: Create Service Account
Step 2: Download Service Account Key
- ⚠️ Security Warning: Service account keys are long-lived credentials. Protect them like passwords!
Step 3: Configure Environment Variables
Usage Examples
Example 1: Claude Sonnet 4.5 via Vertex AI
Example 2: Gemini 3.0 Pro via Vertex AI
Example 3: Mixed Providers with Fallback
Configuration Reference
Environment Variables
| Variable | Required | Description | Example |
|---|---|---|---|
LLM_PROVIDER | Yes | Set to vertex_ai | vertex_ai |
MODEL_NAME | Yes | Vertex AI model identifier | vertex_ai/gemini-3-pro-preview |
VERTEX_PROJECT | Yes | GCP project ID | my-gcp-project |
VERTEX_LOCATION | Yes | Vertex AI region | us-central1 |
GOOGLE_APPLICATION_CREDENTIALS | No* | Path to service account key | /path/to/key.json |
Available Regions
Common Vertex AI regions:us-central1(Iowa, USA)us-east4(Northern Virginia, USA)europe-west1(Belgium)asia-southeast1(Singapore)
Cost Optimization
1. Use Appropriate Model Sizes
2. Enable Prompt Caching (Claude Models)
Claude models on Vertex AI support prompt caching for up to 90% cost savings on repeated prompts.3. Use Dedicated Models
Configure cheaper models for specific tasks:Troubleshooting
Error: “Permission denied”
Problem: Service account lacks Vertex AI permissions Solution:Error: “Model not found”
Problem: Model not available in your region or incorrect model name Solution:- Verify model name format:
vertex_ai/claude-sonnet-4-5@20250929 - Check model availability in your region
- Try different region:
VERTEX_LOCATION=us-east4
Error: “Workload Identity not working”
Problem: Kubernetes SA not properly linked to GCP SA Solution:Error: “Quota exceeded”
Problem: Exceeded Vertex AI quota limits Solution:- Check quotas: GCP Console > IAM & Admin > Quotas
- Request quota increase for Vertex AI
- Use fallback models:
ENABLE_FALLBACK=true
Security Best Practices
1. Use Workload Identity (GKE)
- ✅ Best: Workload Identity (keyless authentication)
- ⚠️ Acceptable: Service Account Key (local dev only)
- ❌ Avoid: Committing keys to git
2. Principle of Least Privilege
Grant minimum required permissions:3. Audit Logging
Enable Cloud Audit Logs for Vertex AI:4. Rotate Keys Regularly
For service account keys (local dev):Monitoring & Observability
View Vertex AI Metrics
Enable LangSmith Tracing
Next Steps
- Multi-LLM Setup - Configure fallbacks across providers
- Cost Optimization - Advanced cost reduction strategies
- Observability - Monitor LLM performance and costs