Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Google Vertex AI Setup
This guide covers how to use both Anthropic Claude and Google Gemini models via Google Cloud’s Vertex AI platform.
Overview
Vertex AI provides enterprise-grade access to multiple LLM providers through a unified API, offering:
- Unified Billing: Single GCP invoice for all model usage
- Workload Identity: Keyless authentication on GKE (most secure)
- Enterprise Features: VPC-SC, audit logging, IAM integration
- Multi-Provider: Access both Anthropic Claude AND Google Gemini models
Supported Models
Anthropic Claude (via Vertex AI)
Latest models (November 2025):
# Claude Sonnet 4.5 (Balanced performance)
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
# Pricing: $3/1M input tokens, $15/1M output tokens
# Claude Haiku 4.5 (Fast, cost-effective)
MODEL_NAME=vertex_ai/claude-haiku-4-5@20251001
# Pricing: $1/1M input tokens, $5/1M output tokens
# Claude Opus 4.1 (Most powerful)
MODEL_NAME=vertex_ai/claude-opus-4-1@20250805
# Pricing: $15/1M input tokens, $75/1M output tokens
Google Gemini (via Vertex AI)
Latest models (November 2025):
# Gemini 3.0 Pro (Latest, 1M context window)
MODEL_NAME=vertex_ai/gemini-3-pro-preview
# Pricing: $2/1M input tokens, $12/1M output tokens
# Gemini 2.5 Flash (Fast, cost-effective)
MODEL_NAME=vertex_ai/gemini-2.5-flash
# Pricing: $0.15/1M input tokens, $0.60/1M output tokens
# Gemini 2.5 Pro (Stable production)
MODEL_NAME=vertex_ai/gemini-2.5-pro
# Pricing: $1.25/1M input tokens, $10/1M output tokens (≤200K context)
Prerequisites
- GCP Project with Vertex AI API enabled
- Service Account with
Vertex AI User role (for local development)
- Workload Identity configured (for GKE deployments)
Setup Options
Option 1: Workload Identity on GKE (Recommended)
Most secure - No API keys, automatic credential rotation, follows Google Cloud best practices.
Step 1: Enable Workload Identity on Your GKE Cluster
# If creating a new cluster
gcloud container clusters create my-cluster \
--workload-pool=PROJECT_ID.svc.id.goog \
--region=us-central1
# If updating existing cluster
gcloud container clusters update my-cluster \
--workload-pool=PROJECT_ID.svc.id.goog \
--region=us-central1
Step 2: Create GCP Service Account
# Create service account
gcloud iam service-accounts create vertex-ai-user \
--display-name="Vertex AI User for MCP Server"
# Grant Vertex AI User role
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
Step 3: Bind Kubernetes Service Account
# Allow Kubernetes SA to impersonate GCP SA
gcloud iam service-accounts add-iam-policy-binding \
vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:PROJECT_ID.svc.id.goog[default/mcp-server]"
Step 4: Annotate Kubernetes Service Account
# kubernetes/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: mcp-server
annotations:
iam.gke.io/gcp-service-account: vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com
# .env or Kubernetes ConfigMap
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929 # Or any Vertex AI model
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-central1
# No GOOGLE_APPLICATION_CREDENTIALS needed - Workload Identity handles auth!
Option 2: Service Account Key (Local Development)
For local development or non-GKE environments.
Step 1: Create Service Account
gcloud iam service-accounts create vertex-ai-dev \
--display-name="Vertex AI Development"
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
Step 2: Download Service Account Key
gcloud iam service-accounts keys create ~/vertex-ai-key.json \
--iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com
- ⚠️ Security Warning: Service account keys are long-lived credentials. Protect them like passwords!
# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/gemini-3-pro-preview # Or any Vertex AI model
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=/path/to/vertex-ai-key.json
Usage Examples
Example 1: Claude Sonnet 4.5 via Vertex AI
# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1
# Python code (automatic via LLMFactory)
from mcp_server_langgraph.llm.factory import LLMFactory
llm = LLMFactory(
provider="vertex_ai",
model_name="vertex_ai/claude-sonnet-4-5@20250929",
vertex_project="my-gcp-project",
vertex_location="us-central1",
)
response = await llm.ainvoke([{"role": "user", "content": "Hello!"}])
print(response.content)
Example 2: Gemini 3.0 Pro via Vertex AI
# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/gemini-3-pro-preview
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1
Example 3: Mixed Providers with Fallback
# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
ENABLE_FALLBACK=true
FALLBACK_MODELS=["vertex_ai/claude-haiku-4-5@20251001","vertex_ai/gemini-2.5-flash","gpt-5.1"]
# Configure both Vertex AI and fallback providers
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1
OPENAI_API_KEY=sk-... # For GPT-5 fallback
Configuration Reference
Environment Variables
| Variable | Required | Description | Example |
|---|
LLM_PROVIDER | Yes | Set to vertex_ai | vertex_ai |
MODEL_NAME | Yes | Vertex AI model identifier | vertex_ai/gemini-3-pro-preview |
VERTEX_PROJECT | Yes | GCP project ID | my-gcp-project |
VERTEX_LOCATION | Yes | Vertex AI region | us-central1 |
GOOGLE_APPLICATION_CREDENTIALS | No* | Path to service account key | /path/to/key.json |
*Not required on GKE with Workload Identity
Available Regions
Common Vertex AI regions:
us-central1 (Iowa, USA)
us-east4 (Northern Virginia, USA)
europe-west1 (Belgium)
asia-southeast1 (Singapore)
Check Vertex AI locations for full list.
Cost Optimization
1. Use Appropriate Model Sizes
# Development/Testing
MODEL_NAME=vertex_ai/gemini-2.5-flash # $0.15/$0.60 per 1M tokens
# Production (balanced)
MODEL_NAME=vertex_ai/claude-haiku-4-5@20251001 # $1/$5 per 1M tokens
# Complex tasks only
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929 # $3/$15 per 1M tokens
2. Enable Prompt Caching (Claude Models)
Claude models on Vertex AI support prompt caching for up to 90% cost savings on repeated prompts.
3. Use Dedicated Models
Configure cheaper models for specific tasks:
# Main model (for chat)
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
# Summarization (lighter/cheaper)
USE_DEDICATED_SUMMARIZATION_MODEL=true
SUMMARIZATION_MODEL_NAME=vertex_ai/gemini-2.5-flash
SUMMARIZATION_MODEL_PROVIDER=vertex_ai
Troubleshooting
Error: “Permission denied”
Problem: Service account lacks Vertex AI permissions
Solution:
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:YOUR-SA@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
Error: “Model not found”
Problem: Model not available in your region or incorrect model name
Solution:
- Verify model name format:
vertex_ai/claude-sonnet-4-5@20250929
- Check model availability in your region
- Try different region:
VERTEX_LOCATION=us-east4
Error: “Workload Identity not working”
Problem: Kubernetes SA not properly linked to GCP SA
Solution:
# Verify annotation
kubectl get serviceaccount mcp-server -o yaml
# Check IAM binding
gcloud iam service-accounts get-iam-policy \
vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com
# Test from pod
kubectl run -it test --image=google/cloud-sdk:slim \
--serviceaccount=mcp-server \
--rm -- gcloud auth list
Error: “Quota exceeded”
Problem: Exceeded Vertex AI quota limits
Solution:
- Check quotas: GCP Console > IAM & Admin > Quotas
- Request quota increase for Vertex AI
- Use fallback models:
ENABLE_FALLBACK=true
Security Best Practices
1. Use Workload Identity (GKE)
- ✅ Best: Workload Identity (keyless authentication)
- ⚠️ Acceptable: Service Account Key (local dev only)
- ❌ Avoid: Committing keys to git
2. Principle of Least Privilege
Grant minimum required permissions:
# Good: Specific Vertex AI role
--role="roles/aiplatform.user"
# Bad: Overly broad permissions
--role="roles/owner"
3. Audit Logging
Enable Cloud Audit Logs for Vertex AI:
gcloud projects get-iam-policy PROJECT_ID \
--flatten="bindings[].members" \
--format="table(bindings.role)" \
--filter="bindings.members:serviceAccount:vertex-ai-user@*"
4. Rotate Keys Regularly
For service account keys (local dev):
# List keys
gcloud iam service-accounts keys list \
--iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com
# Delete old keys (older than 90 days)
gcloud iam service-accounts keys delete KEY_ID \
--iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com
Monitoring & Observability
View Vertex AI Metrics
# View API requests
gcloud logging read "resource.type=aiplatform.googleapis.com/Endpoint" \
--limit=10 \
--format=json
# View costs
gcloud billing accounts describe ACCOUNT_ID
Enable LangSmith Tracing
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-api-key
LANGSMITH_PROJECT=mcp-server-langgraph
Next Steps