Vertex AI with Workload Identity - MCP Server with LangGraph

Overview

This guide covers setting up Vertex AI with Workload Identity Federation on GKE for secure, keyless authentication. This approach eliminates the need for service account keys and provides automatic credential rotation.

Workload Identity is the recommended way to access Google Cloud services from GKE. It provides better security than service account keys and simplifies credential management.

Architecture

Workload Identity Integration Flow

This diagram shows the complete authentication flow from a GKE pod to GCP services using Workload Identity Federation, eliminating the need for credential files:

Key Benefits of Workload Identity:

No credential files: Kubernetes tokens are automatically exchanged for GCP credentials
Automatic rotation: GCP handles credential rotation without manual intervention
Fine-grained access: Each Kubernetes service account maps to a specific GCP service account with minimal permissions
Audit trail: All GCP API calls are attributed to the specific service account identity

Prerequisites

Before starting, ensure you have:

✅ GKE cluster with Workload Identity enabled
✅ Staging infrastructure deployed (run setup-staging-infrastructure.sh first)
✅ gcloud CLI installed and authenticated
✅ kubectl installed and configured
✅ Billing enabled on your GCP project

Quick Setup

Automated Setup (Recommended)

The easiest way to set up Vertex AI with Workload Identity:

## Run the automated setup script
./scripts/gcp/setup-vertex-ai-staging.sh

This script will:

✅ Enable Vertex AI API
✅ Create vertex-ai-staging service account
✅ Grant necessary IAM permissions
✅ Bind Kubernetes SA to GCP SA
✅ Annotate Kubernetes service account
✅ Verify configuration

Manual Setup Steps (Advanced)

If you prefer manual setup or need to understand each step:

Step 1: Enable Vertex AI API

gcloud services enable aiplatform.googleapis.com \
  --project=vishnu-sandbox-20250310

Step 2: Create Service Account

## Create service account
gcloud iam service-accounts create vertex-ai-staging \
  --display-name="Vertex AI Staging" \
  --description="Service account for Vertex AI access in staging" \
  --project=vishnu-sandbox-20250310

Step 3: Grant Permissions

PROJECT_ID="vishnu-sandbox-20250310"
SA_EMAIL="vertex-ai-staging@${PROJECT_ID}.iam.gserviceaccount.com"

## Vertex AI User (API access)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/aiplatform.user"

## Vertex AI Developer (model management)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/aiplatform.developer"

## Logging
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/logging.logWriter"

## Monitoring
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/monitoring.metricWriter"

Step 4: Setup Workload Identity Binding

## Allow Kubernetes SA to impersonate GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:vishnu-sandbox-20250310.svc.id.goog[mcp-staging/mcp-server-langgraph]"

Step 5: Annotate Kubernetes Service Account

kubectl annotate serviceaccount mcp-server-langgraph \
  -n staging-mcp-server-langgraph \
  iam.gke.io/gcp-service-account=vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com \
  --overwrite

Configuration

Environment Variables

The deployment is configured with the following environment variables in deployments/overlays/preview-gke/deployment-patch.yaml:

env:
  # LLM Provider
  - name: LLM_PROVIDER
    value: "google"  # Use "google" for both AI Studio and Vertex AI

  # Vertex AI Configuration
  - name: VERTEX_PROJECT
    value: "vishnu-sandbox-20250310"
  - name: VERTEX_LOCATION
    value: "us-central1"

  # Model Selection
  - name: MODEL_NAME
    value: "gemini-2.5-flash"  # or gemini-2.5-pro

Service Account Annotation

The Kubernetes service account is annotated to use Workload Identity:

## deployments/overlays/preview-gke/serviceaccount-patch.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mcp-server-langgraph
  annotations:
    iam.gke.io/gcp-service-account: vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com

Deployment

Deploy to Staging

After running the setup script, deploy the updated configuration:

## Apply the updated manifests
kubectl apply -k deployments/overlays/preview-gke

## Wait for rollout
kubectl rollout status deployment/mcp-server-langgraph -n staging-mcp-server-langgraph

## Verify pods are running
kubectl get pods -n staging-mcp-server-langgraph

Verification

1. Verify Workload Identity Binding

## Check GCP service account IAM policy
gcloud iam service-accounts get-iam-policy \
  vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com

## Should include binding for:
## serviceAccount:vishnu-sandbox-20250310.svc.id.goog[mcp-staging/mcp-server-langgraph]

2. Verify Kubernetes Annotation

## Check service account annotation
kubectl get serviceaccount mcp-server-langgraph -n staging-mcp-server-langgraph \
  -o jsonpath='{.metadata.annotations.iam\.gke\.io/gcp-service-account}'

## Expected output:
## vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com

3. Test Authentication from Pod

## Get a shell in the pod
POD_NAME=$(kubectl get pods -n staging-mcp-server-langgraph -l app=mcp-server-langgraph -o jsonpath='{.items[0].metadata.name}')

## Check active service account
kubectl exec -it $POD_NAME -n staging-mcp-server-langgraph -- gcloud auth list

## Expected output:
## ACTIVE  ACCOUNT
## *       vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com

4. Test Vertex AI Access

## Test Vertex AI API access
kubectl exec -it $POD_NAME -n staging-mcp-server-langgraph -- python3 << 'EOF'
from google.cloud import aiplatform

aiplatform.init(
    project="vishnu-sandbox-20250310",
    location="us-central1"
)

## List models (should work without errors)
models = aiplatform.Model.list()
print(f"✓ Vertex AI access working! Found {len(models)} models")
EOF

5. Test with LiteLLM

## Test LLM integration
kubectl exec -it $POD_NAME -n staging-mcp-server-langgraph -- python3 << 'EOF'
from litellm import completion

response = completion(
    model="vertex_ai/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello from Vertex AI via Workload Identity!"}],
    vertex_project="vishnu-sandbox-20250310",
    vertex_location="us-central1"
)

print(f"✓ LiteLLM integration working!")
print(f"Response: {response.choices[0].message.content}")
EOF

Troubleshooting

Pod cannot authenticate

Error: google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentialsSolutions:

Verify Workload Identity is enabled on cluster:

gcloud container clusters describe staging-mcp-server-langgraph-gke \
  --region=us-central1 \
  --format="value(workloadIdentityConfig.workloadPool)"

Check service account annotation:

kubectl describe sa mcp-server-langgraph -n staging-mcp-server-langgraph | grep Annotations

Verify IAM binding:

gcloud iam service-accounts get-iam-policy \
  vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com

Restart pods to pick up new annotation:

kubectl rollout restart deployment/mcp-server-langgraph -n staging-mcp-server-langgraph

Permission denied errors

Error: 403 Permission denied on resource projectSolutions:

Verify service account has required roles:

gcloud projects get-iam-policy vishnu-sandbox-20250310 \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com"

Grant missing permissions:

# Vertex AI User
gcloud projects add-iam-policy-binding vishnu-sandbox-20250310 \
  --member="serviceAccount:vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

# Vertex AI Developer
gcloud projects add-iam-policy-binding vishnu-sandbox-20250310 \
  --member="serviceAccount:vertex-ai-staging@vishnu-sandbox-20250310.iam.gserviceaccount.com" \
  --role="roles/aiplatform.developer"

Model not found errors

Error: 404 The model requested does not existSolutions:

Verify model name format for Vertex AI:

# Correct format
model="vertex_ai/gemini-2.5-flash"

# Also works (LiteLLM auto-detects)
model="gemini-2.5-flash"

Check available models:

gcloud ai models list --region=us-central1

Use supported model names:
- gemini-2.5-flash
- gemini-2.5-pro

Workload Identity not working

Error: Pod uses default compute service account instead of vertex-ai-stagingSolutions:

Ensure pod spec uses correct service account:

spec:
  serviceAccountName: mcp-server-langgraph  # Must match annotated SA

Check if annotation was applied before pod creation:

# Delete and recreate pod
kubectl delete pod $POD_NAME -n staging-mcp-server-langgraph
# Pod will be recreated by deployment

Verify namespace has Workload Identity enabled:

kubectl get namespace staging-mcp-server-langgraph -o yaml | grep workload-identity

Security Considerations

Benefits of Workload Identity

No Key Management

No service account keys to create, rotate, or secure. Authentication is handled automatically by GKE.

Automatic Rotation

Credentials are automatically rotated by Google Cloud. No manual intervention required.

Least Privilege

Each pod gets only the permissions it needs via IAM bindings. No shared credentials.

Audit Trail

All API calls are logged with the service account identity. Easy to audit and monitor.

IAM Permissions

The vertex-ai-staging service account has been granted these roles:

Role	Purpose	Permissions
`roles/aiplatform.user`	Vertex AI API access	Call Vertex AI APIs, use models
`roles/aiplatform.developer`	Model management	Deploy models, manage endpoints
`roles/logging.logWriter`	Cloud Logging	Write logs to Cloud Logging
`roles/monitoring.metricWriter`	Monitoring	Write custom metrics

Best Practices

✅ Do:
Use Workload Identity for all GCP service access
Grant minimum required permissions
Monitor API usage and costs
Set up quota alerts
❌ Don’t:
Create service account keys
Grant overly broad permissions
Share service accounts across environments
Ignore quota warnings

Cost Management

Vertex AI Pricing

Vertex AI charges per 1,000 characters (roughly equivalent to tokens):

Model	Input (per 1M chars)	Output (per 1M chars)	Status
gemini-2.5-flash	$0.075	$0.30	✅ Production-ready
gemini-2.5-pro	$0.625	$5.00	✅ Production-ready

gemini-2.5-flash and gemini-2.5-pro are production-grade models recommended for enterprise deployments. They offer stable performance, SLA guarantees, and are suitable for production workloads. Other Gemini 2.5 variants (if any) may be experimental or preview releases.

Setting Quotas

## Check current quota
gcloud ai quota list --region=us-central1

## Request quota increase (if needed)
gcloud ai quota update \
  --region=us-central1 \
  --model=gemini-2.5-flash \
  --limit=1000000  # 1M requests per day

Monitoring Costs

## View Vertex AI usage in Cloud Console
gcloud logging read \
  "resource.type=aiplatform.googleapis.com/Endpoint
   AND jsonPayload.request_response_time > 0" \
  --limit=50 \
  --format=json

Migration from Google AI Studio

If you’re currently using Google AI Studio API keys, here’s how to migrate:

Update Environment Variables

Change from API key to Vertex AI configuration:Before (Google AI Studio):

env:
  - name: LLM_PROVIDER
    value: "google"
  - name: GOOGLE_API_KEY
    valueFrom:
      secretKeyRef:
        name: secrets
        key: google-api-key

After (Vertex AI):

env:
  - name: LLM_PROVIDER
    value: "google"
  - name: VERTEX_PROJECT
    value: "vishnu-sandbox-20250310"
  - name: VERTEX_LOCATION
    value: "us-central1"
  # No API key needed!

Run Setup Script

./scripts/gcp/setup-vertex-ai-staging.sh

Deploy Updated Configuration

kubectl apply -k deployments/overlays/preview-gke

Verify Migration

# Check authentication
kubectl exec -it $POD_NAME -n staging-mcp-server-langgraph -- gcloud auth list

# Test Vertex AI
kubectl logs -n staging-mcp-server-langgraph $POD_NAME | grep "Vertex AI"

Next Steps

Google Gemini Guide

Learn about Gemini model features and capabilities

GKE Preview Deployment

Complete GKE preview deployment guide

Observability

Monitor Vertex AI usage and performance

Production Checklist

Prepare for production deployment

Vertex AI with Workload Identity Configured! Your staging environment now uses keyless authentication for secure, scalable AI access.

Getting Started

Deployment Options

LangGraph Platform

Kubernetes - GKE

Kubernetes - EKS & AKS

Kubernetes - Best Practices

Infrastructure as Code

Monitoring & Observability

Advanced Deployment

Configuration

Operations

​Overview

​Architecture

​Workload Identity Integration Flow

​Prerequisites

​Quick Setup

​Automated Setup (Recommended)

​Step 1: Enable Vertex AI API

​Step 2: Create Service Account

​Step 3: Grant Permissions

​Step 4: Setup Workload Identity Binding

​Step 5: Annotate Kubernetes Service Account

​Configuration

​Environment Variables

​Service Account Annotation

​Deployment

​Deploy to Staging

​Verification

​1. Verify Workload Identity Binding

​2. Verify Kubernetes Annotation

​3. Test Authentication from Pod

​4. Test Vertex AI Access

​5. Test with LiteLLM

​Troubleshooting

​Security Considerations

​Benefits of Workload Identity

No Key Management

Automatic Rotation

Least Privilege

Audit Trail

​IAM Permissions

​Best Practices

​Cost Management

​Vertex AI Pricing

​Setting Quotas

​Monitoring Costs

​Migration from Google AI Studio

​Next Steps

Google Gemini Guide

GKE Preview Deployment

Observability

Production Checklist

Overview

Architecture

Workload Identity Integration Flow

Prerequisites

Quick Setup

Automated Setup (Recommended)

Step 1: Enable Vertex AI API

Step 2: Create Service Account

Step 3: Grant Permissions

Step 4: Setup Workload Identity Binding

Step 5: Annotate Kubernetes Service Account

Configuration

Environment Variables

Service Account Annotation

Deployment

Deploy to Staging

Verification

1. Verify Workload Identity Binding

2. Verify Kubernetes Annotation

3. Test Authentication from Pod

4. Test Vertex AI Access

5. Test with LiteLLM

Troubleshooting

Security Considerations

Benefits of Workload Identity

IAM Permissions

Best Practices

Cost Management

Vertex AI Pricing

Setting Quotas

Monitoring Costs

Migration from Google AI Studio

Next Steps