Google Gemini 2.5 Setup Guide

Quick setup guide for using Google Gemini 2.5 models (default configuration).

Why Gemini 2.5?

The MCP Server with LangGraph defaults to Gemini 2.5 Flash for several reasons:

✅ Latest Technology - Google’s newest model family (2025)
✅ Fastest Performance - Sub-second response times
✅ Cost Efficient - More affordable than GPT-4 or Claude
✅ Multimodal - Native support for text, images, video, audio
✅ Large Context - 1M+ token context window
✅ High Quality - Competitive with GPT-4o and Claude 3.5 Sonnet

Quick Start

1. Get API Key

Visit: https://aistudio.google.com/apikey
Sign in with your Google account
Click “Get API key”
Create a new API key or use existing one
Copy the key (starts with AI...)

2. Configure Environment

# Copy example environment file
cp .env.example .env

# Edit .env file
nano .env  # or vim .env, or use your favorite editor

Update these values:

# LLM Provider (already set to google by default)
LLM_PROVIDER=google

# Google API Key (paste your key here)
GOOGLE_API_KEY=AIza...your-key-here

# Model (already set to latest Gemini 2.5 Flash)
MODEL_NAME=gemini-2.5-flash

3. Test Connection

# Test Gemini connection
python examples/test_llm.py

# Expected output:
# Provider: google
# Model: gemini-2.5-flash
# ✓ LLM initialized successfully
# ✓ Response: 4

4. Run MCP Server

# Run StreamableHTTP server
python -m mcp_server_langgraph.mcp.server_streamable

# Or run stdio server
python -m mcp_server_langgraph.mcp.server_stdio

Gemini 2.5 Models

gemini-2.5-flash (Default - Recommended)

Speed: Fastest Gemini model (sub-second responses)
Cost: Most cost-effective
Context: 1M+ tokens
Use Cases: Production applications, chatbots, real-time apps
Best For: 95% of use cases

MODEL_NAME=gemini-2.5-flash
MODEL_MAX_TOKENS=8192

gemini-2.5-pro (Most Capable)

Speed: Slower but more capable
Cost: Higher cost, premium quality
Context: 1M+ tokens
Use Cases: Complex reasoning, research, analysis
Best For: High-complexity tasks requiring deep reasoning

MODEL_NAME=gemini-2.5-pro
MODEL_MAX_TOKENS=8192

Model Comparison

Model	Speed	Cost	Quality	Context	Best For
Gemini 2.5 Flash	⚡⚡⚡	💰	⭐⭐⭐⭐	1M+	Production (Default)
Gemini 2.5 Pro	⚡⚡	💰💰💰	⭐⭐⭐⭐⭐	1M+	Complex reasoning
Claude 3.5 Sonnet	⚡⚡	💰💰	⭐⭐⭐⭐⭐	200K	Coding, analysis
GPT-4o	⚡⚡	💰💰	⭐⭐⭐⭐	128K	General purpose

Pricing (Approximate)

Gemini 2.5 Flash:

Input: $0.075 per 1M tokens
Output: $0.30 per 1M tokens
~4x cheaper than GPT-4o
~3x cheaper than Claude 3.5

Gemini 2.5 Pro:

Input: $1.25 per 1M tokens
Output: $5.00 per 1M tokens
Comparable to GPT-4o pricing

Fallback Configuration

The default configuration includes automatic fallback:

# Primary: Gemini 2.5 Flash
MODEL_NAME=gemini-2.5-flash

# Fallbacks (in order):
ENABLE_FALLBACK=true
FALLBACK_MODELS=["gemini-2.5-pro","claude-sonnet-4-5","gpt-5.1"]

Fallback triggers:

Rate limits
API errors
Timeouts
Model unavailability

Advanced Configuration

Increase Context Window

# Use full 1M context (if needed)
MODEL_MAX_TOKENS=1000000

# Note: Very large context increases latency and cost

Adjust Temperature

# More creative (0.8-1.0)
MODEL_TEMPERATURE=0.9

# More deterministic (0.0-0.3)
MODEL_TEMPERATURE=0.2

# Default balanced (0.7)
MODEL_TEMPERATURE=0.7

Timeout Settings

# Default timeout (60 seconds)
MODEL_TIMEOUT=60

# Longer timeout for complex queries
MODEL_TIMEOUT=120

Multimodal Capabilities

Gemini 2.5 natively supports:

✅ Text - Natural language
✅ Images - Image understanding and generation
✅ Video - Video analysis
✅ Audio - Speech and audio processing
✅ Code - Programming languages

Example: Image Analysis

from langchain_core.messages import HumanMessage

messages = [
    HumanMessage(content=[
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": "https://example.com/image.jpg"}
    ])
]

response = await llm.ainvoke(messages)

Rate Limits

Free Tier:

15 requests per minute
1 million tokens per day
1,500 requests per day

Paid Tier:

360 requests per minute
4 million tokens per minute
No daily limits

Tip: Enable fallback models to handle rate limits gracefully.

Troubleshooting

API Key Not Working

# Verify key is set
echo $GOOGLE_API_KEY

# Test directly
curl -X POST https://generativelanguage.googleapis.com/v1/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY \
  -H 'Content-Type: application/json' \
  -d '{"contents":[{"parts":[{"text":"Hello"}]}]}'

Rate Limit Errors

# Enable fallback to handle rate limits
ENABLE_FALLBACK=true

# Or upgrade to paid tier
# Visit: https://console.cloud.google.com/billing

Model Not Found

# Verify model name is correct
MODEL_NAME=gemini-2.5-flash  # ✅ Correct

# Common mistakes:
# MODEL_NAME=gemini-flash-2.5     # ❌ Wrong order
# MODEL_NAME=gemini-flash         # ❌ Missing version

Slow Responses

# Use Flash model (default)
MODEL_NAME=gemini-2.5-flash

# Reduce max tokens
MODEL_MAX_TOKENS=4096

# Increase timeout
MODEL_TIMEOUT=90

Monitoring

Gemini usage is automatically tracked:

# View traces in Jaeger
open http://localhost:16686

# View metrics in Prometheus
open http://localhost:9090

# Search for:
# - llm.invoke (successful calls)
# - llm.failed (failed calls)
# - llm.fallback (fallback usage)

Switching to Other Providers

Switch to Anthropic

export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
export MODEL_NAME=claude-sonnet-4-5

Switch to OpenAI

export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export MODEL_NAME=gpt-5.1

Switch to Local (Ollama)

# Install Ollama first
curl -fsSL https://ollama.com/install.sh | sh

# Pull model
ollama pull llama3.1:8b

# Configure
export LLM_PROVIDER=ollama
export MODEL_NAME=ollama/llama3.1:8b

Resources

API Documentation: https://ai.google.dev/docs
Get API Key: https://aistudio.google.com/apikey
Pricing: https://ai.google.dev/pricing
Model Info: https://ai.google.dev/models/gemini
AI Studio: https://aistudio.google.com

Support

Google AI Forum: https://discuss.ai.google.dev
GitHub Issues: Report issues with the agent
Documentation: See integrations/litellm.md for all providers

Default Configuration Summary:

Provider: Google
Model: gemini-2.5-flash
Fallback: gemini-2.5-pro, claude-sonnet-4-5, gpt-5.1
Cost: ~75% cheaper than alternatives
Speed: Fastest available

You’re ready to use Gemini 2.5! 🚀

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

​Google Gemini 2.5 Setup Guide

​Why Gemini 2.5?

​Quick Start

​1. Get API Key

​2. Configure Environment

​3. Test Connection

​4. Run MCP Server

​Gemini 2.5 Models

​gemini-2.5-flash (Default - Recommended)

​gemini-2.5-pro (Most Capable)

​Model Comparison

​Pricing (Approximate)

​Fallback Configuration

​Advanced Configuration

​Increase Context Window

​Adjust Temperature

​Timeout Settings

​Multimodal Capabilities

​Example: Image Analysis

​Rate Limits

​Troubleshooting

​API Key Not Working

​Rate Limit Errors

​Model Not Found

​Slow Responses

​Monitoring

​Switching to Other Providers

​Switch to Anthropic

​Switch to OpenAI

​Switch to Local (Ollama)

​Resources

​Support

Google Gemini 2.5 Setup Guide

Why Gemini 2.5?

Quick Start

1. Get API Key

2. Configure Environment

3. Test Connection

4. Run MCP Server

Gemini 2.5 Models

gemini-2.5-flash (Default - Recommended)

gemini-2.5-pro (Most Capable)

Model Comparison

Pricing (Approximate)

Fallback Configuration

Advanced Configuration

Increase Context Window

Adjust Temperature

Timeout Settings

Multimodal Capabilities

Example: Image Analysis

Rate Limits

Troubleshooting

API Key Not Working

Rate Limit Errors

Model Not Found

Slow Responses

Monitoring

Switching to Other Providers

Switch to Anthropic

Switch to OpenAI

Switch to Local (Ollama)

Resources

Support