Google Vertex AI Setup

This guide covers how to use both Anthropic Claude and Google Gemini models via Google Cloud’s Vertex AI platform.

Overview

Vertex AI provides enterprise-grade access to multiple LLM providers through a unified API, offering:

Unified Billing: Single GCP invoice for all model usage
Workload Identity: Keyless authentication on GKE (most secure)
Enterprise Features: VPC-SC, audit logging, IAM integration
Multi-Provider: Access both Anthropic Claude AND Google Gemini models

Supported Models

Anthropic Claude (via Vertex AI)

Latest models (November 2025):

# Claude Sonnet 4.5 (Balanced performance)
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
# Pricing: $3/1M input tokens, $15/1M output tokens

# Claude Haiku 4.5 (Fast, cost-effective)
MODEL_NAME=vertex_ai/claude-haiku-4-5@20251001
# Pricing: $1/1M input tokens, $5/1M output tokens

# Claude Opus 4.1 (Most powerful)
MODEL_NAME=vertex_ai/claude-opus-4-1@20250805
# Pricing: $15/1M input tokens, $75/1M output tokens

Google Gemini (via Vertex AI)

Latest models (November 2025):

# Gemini 3.0 Pro (Latest, 1M context window)
MODEL_NAME=vertex_ai/gemini-3-pro-preview
# Pricing: $2/1M input tokens, $12/1M output tokens

# Gemini 2.5 Flash (Fast, cost-effective)
MODEL_NAME=vertex_ai/gemini-2.5-flash
# Pricing: $0.15/1M input tokens, $0.60/1M output tokens

# Gemini 2.5 Pro (Stable production)
MODEL_NAME=vertex_ai/gemini-2.5-pro
# Pricing: $1.25/1M input tokens, $10/1M output tokens (≤200K context)

Prerequisites

GCP Project with Vertex AI API enabled
Service Account with Vertex AI User role (for local development)
Workload Identity configured (for GKE deployments)

Setup Options

Option 1: Workload Identity on GKE (Recommended)

Most secure - No API keys, automatic credential rotation, follows Google Cloud best practices.

Step 1: Enable Workload Identity on Your GKE Cluster

# If creating a new cluster
gcloud container clusters create my-cluster \
  --workload-pool=PROJECT_ID.svc.id.goog \
  --region=us-central1

# If updating existing cluster
gcloud container clusters update my-cluster \
  --workload-pool=PROJECT_ID.svc.id.goog \
  --region=us-central1

Step 2: Create GCP Service Account

# Create service account
gcloud iam service-accounts create vertex-ai-user \
  --display-name="Vertex AI User for MCP Server"

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Step 3: Bind Kubernetes Service Account

# Allow Kubernetes SA to impersonate GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:PROJECT_ID.svc.id.goog[default/mcp-server]"

Step 4: Annotate Kubernetes Service Account

# kubernetes/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mcp-server
  annotations:
    iam.gke.io/gcp-service-account: vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com

Step 5: Configure Environment Variables

# .env or Kubernetes ConfigMap
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929  # Or any Vertex AI model
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-central1

# No GOOGLE_APPLICATION_CREDENTIALS needed - Workload Identity handles auth!

Option 2: Service Account Key (Local Development)

For local development or non-GKE environments.

Step 1: Create Service Account

gcloud iam service-accounts create vertex-ai-dev \
  --display-name="Vertex AI Development"

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Step 2: Download Service Account Key

gcloud iam service-accounts keys create ~/vertex-ai-key.json \
  --iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com

⚠️ Security Warning: Service account keys are long-lived credentials. Protect them like passwords!

Step 3: Configure Environment Variables

# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/gemini-3-pro-preview  # Or any Vertex AI model
VERTEX_PROJECT=your-gcp-project-id
VERTEX_LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=/path/to/vertex-ai-key.json

Usage Examples

Example 1: Claude Sonnet 4.5 via Vertex AI

# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1

# Python code (automatic via LLMFactory)
from mcp_server_langgraph.llm.factory import LLMFactory

llm = LLMFactory(
    provider="vertex_ai",
    model_name="vertex_ai/claude-sonnet-4-5@20250929",
    vertex_project="my-gcp-project",
    vertex_location="us-central1",
)

response = await llm.ainvoke([{"role": "user", "content": "Hello!"}])
print(response.content)

Example 2: Gemini 3.0 Pro via Vertex AI

# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/gemini-3-pro-preview
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1

Example 3: Mixed Providers with Fallback

# .env
LLM_PROVIDER=vertex_ai
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929
ENABLE_FALLBACK=true
FALLBACK_MODELS=["vertex_ai/claude-haiku-4-5@20251001","vertex_ai/gemini-2.5-flash","gpt-5.1"]

# Configure both Vertex AI and fallback providers
VERTEX_PROJECT=my-gcp-project
VERTEX_LOCATION=us-central1
OPENAI_API_KEY=sk-...  # For GPT-5 fallback

Configuration Reference

Environment Variables

Variable	Required	Description	Example
`LLM_PROVIDER`	Yes	Set to `vertex_ai`	`vertex_ai`
`MODEL_NAME`	Yes	Vertex AI model identifier	`vertex_ai/gemini-3-pro-preview`
`VERTEX_PROJECT`	Yes	GCP project ID	`my-gcp-project`
`VERTEX_LOCATION`	Yes	Vertex AI region	`us-central1`
`GOOGLE_APPLICATION_CREDENTIALS`	No*	Path to service account key	`/path/to/key.json`

*Not required on GKE with Workload Identity

Available Regions

Common Vertex AI regions:

us-central1 (Iowa, USA)
us-east4 (Northern Virginia, USA)
europe-west1 (Belgium)
asia-southeast1 (Singapore)

Check Vertex AI locations for full list.

Cost Optimization

1. Use Appropriate Model Sizes

# Development/Testing
MODEL_NAME=vertex_ai/gemini-2.5-flash  # $0.15/$0.60 per 1M tokens

# Production (balanced)
MODEL_NAME=vertex_ai/claude-haiku-4-5@20251001  # $1/$5 per 1M tokens

# Complex tasks only
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929  # $3/$15 per 1M tokens

2. Enable Prompt Caching (Claude Models)

Claude models on Vertex AI support prompt caching for up to 90% cost savings on repeated prompts.

3. Use Dedicated Models

Configure cheaper models for specific tasks:

# Main model (for chat)
MODEL_NAME=vertex_ai/claude-sonnet-4-5@20250929

# Summarization (lighter/cheaper)
USE_DEDICATED_SUMMARIZATION_MODEL=true
SUMMARIZATION_MODEL_NAME=vertex_ai/gemini-2.5-flash
SUMMARIZATION_MODEL_PROVIDER=vertex_ai

Troubleshooting

Error: “Permission denied”

Problem: Service account lacks Vertex AI permissions Solution:

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:YOUR-SA@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Error: “Model not found”

Problem: Model not available in your region or incorrect model name Solution:

Verify model name format: vertex_ai/claude-sonnet-4-5@20250929
Check model availability in your region
Try different region: VERTEX_LOCATION=us-east4

Error: “Workload Identity not working”

Problem: Kubernetes SA not properly linked to GCP SA Solution:

# Verify annotation
kubectl get serviceaccount mcp-server -o yaml

# Check IAM binding
gcloud iam service-accounts get-iam-policy \
  vertex-ai-user@PROJECT_ID.iam.gserviceaccount.com

# Test from pod
kubectl run -it test --image=google/cloud-sdk:slim \
  --serviceaccount=mcp-server \
  --rm -- gcloud auth list

Error: “Quota exceeded”

Problem: Exceeded Vertex AI quota limits Solution:

Check quotas: GCP Console > IAM & Admin > Quotas
Request quota increase for Vertex AI
Use fallback models: ENABLE_FALLBACK=true

Security Best Practices

1. Use Workload Identity (GKE)

✅ Best: Workload Identity (keyless authentication)
⚠️ Acceptable: Service Account Key (local dev only)
❌ Avoid: Committing keys to git

2. Principle of Least Privilege

Grant minimum required permissions:

# Good: Specific Vertex AI role
--role="roles/aiplatform.user"

# Bad: Overly broad permissions
--role="roles/owner"

3. Audit Logging

Enable Cloud Audit Logs for Vertex AI:

gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --format="table(bindings.role)" \
  --filter="bindings.members:serviceAccount:vertex-ai-user@*"

4. Rotate Keys Regularly

For service account keys (local dev):

# List keys
gcloud iam service-accounts keys list \
  --iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com

# Delete old keys (older than 90 days)
gcloud iam service-accounts keys delete KEY_ID \
  --iam-account=vertex-ai-dev@PROJECT_ID.iam.gserviceaccount.com

Monitoring & Observability

View Vertex AI Metrics

# View API requests
gcloud logging read "resource.type=aiplatform.googleapis.com/Endpoint" \
  --limit=10 \
  --format=json

# View costs
gcloud billing accounts describe ACCOUNT_ID

Enable LangSmith Tracing

LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-api-key
LANGSMITH_PROJECT=mcp-server-langgraph

Next Steps

Multi-LLM Setup - Configure fallbacks across providers
Cost Optimization - Advanced cost reduction strategies
Observability - Monitor LLM performance and costs

Quick Start

Developer Tools

LLM Providers

Production

Migration

Authorization

Enterprise Identity & Access

Secrets Management

Sessions & Storage

Observability

​Google Vertex AI Setup

​Overview

​Supported Models

​Anthropic Claude (via Vertex AI)

​Google Gemini (via Vertex AI)

​Prerequisites

​Setup Options

​Option 1: Workload Identity on GKE (Recommended)

​Step 1: Enable Workload Identity on Your GKE Cluster

​Step 2: Create GCP Service Account

​Step 3: Bind Kubernetes Service Account

​Step 4: Annotate Kubernetes Service Account

​Step 5: Configure Environment Variables

​Option 2: Service Account Key (Local Development)

​Step 1: Create Service Account

​Step 2: Download Service Account Key

​Step 3: Configure Environment Variables

​Usage Examples

​Example 1: Claude Sonnet 4.5 via Vertex AI

​Example 2: Gemini 3.0 Pro via Vertex AI

​Example 3: Mixed Providers with Fallback

​Configuration Reference

​Environment Variables

​Available Regions

​Cost Optimization

​1. Use Appropriate Model Sizes

​2. Enable Prompt Caching (Claude Models)

​3. Use Dedicated Models

​Troubleshooting

​Error: “Permission denied”

​Error: “Model not found”

​Error: “Workload Identity not working”

​Error: “Quota exceeded”

​Security Best Practices

​1. Use Workload Identity (GKE)

​2. Principle of Least Privilege

​3. Audit Logging

​4. Rotate Keys Regularly

​Monitoring & Observability

​View Vertex AI Metrics

​Enable LangSmith Tracing

​Next Steps

​Related Resources