Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
LiteLLM Integration Guide
Complete guide for using multiple LLM providers with the MCP Server with LangGraph.
Table of Contents
Overview
The MCP Server with LangGraph uses LiteLLM to support 100+ LLM providers with a unified interface. This allows you to:
- ✅ Switch between providers without code changes
- ✅ Use open-source models (Llama, Qwen, Mistral, etc.)
- ✅ Implement automatic fallback between models
- ✅ Optimize costs by provider/model selection
- ✅ Test locally with Ollama before deploying
Supported Providers
Cloud Providers
| Provider | Models | Configuration Required |
|---|
| Anthropic | Claude Sonnet 4.5, Claude Opus 4.1, Claude Haiku 4.5 | ANTHROPIC_API_KEY |
| OpenAI | GPT-5, GPT-5 Pro, GPT-5 Mini, GPT-5 Nano | OPENAI_API_KEY |
| Google | Gemini 2.5 Flash, Gemini 2.5 Pro, Gemini 2.0 Pro | GOOGLE_API_KEY |
| Azure OpenAI | GPT-4, GPT-3.5 | AZURE_API_KEY, AZURE_API_BASE |
| AWS Bedrock | Claude, Llama, Titan | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
Open-Source (Ollama)
| Model Family | Models | Local Setup |
|---|
| Llama | Llama 3.1, Llama 2 (7B-70B) | Install Ollama |
| Qwen | Qwen 2.5 (0.5B-72B) | Install Ollama |
| Mistral | Mistral 7B, Mixtral 8x7B | Install Ollama |
| DeepSeek | DeepSeek Coder, DeepSeek LLM | Install Ollama |
| Others | Phi-3, Gemma, Yi, etc. | Install Ollama |
Configuration
Environment Variables
Create or update .env:
# Choose your primary provider (default: google)
LLM_PROVIDER=google # google, anthropic, openai, azure, bedrock, ollama
# Model name (provider-specific format)
# Default: Gemini 2.5 Flash (latest, fastest)
MODEL_NAME=gemini-2.5-flash
# Model parameters
MODEL_TEMPERATURE=0.7
MODEL_MAX_TOKENS=8192
MODEL_TIMEOUT=60
# Fallback configuration
ENABLE_FALLBACK=true
FALLBACK_MODELS=["gemini-2.5-flash", "claude-sonnet-4-5", "gpt-5.1"]
API Keys
# Google Gemini (Primary - Get from: https://aistudio.google.com/apikey)
GOOGLE_API_KEY=...
# Anthropic (Fallback)
ANTHROPIC_API_KEY=sk-ant-...
# OpenAI (Fallback)
OPENAI_API_KEY=sk-...
# Azure OpenAI
AZURE_API_KEY=...
AZURE_API_BASE=https://your-resource.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview
AZURE_DEPLOYMENT_NAME=gpt-4
# AWS Bedrock
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
# Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434
Provider Setup
2. Anthropic (Claude)
# Get API key from https://console.anthropic.com/
# Configure
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
export MODEL_NAME=claude-sonnet-4-5
# Available models:
# - claude-sonnet-4-5 (excellent all-around)
# - claude-opus-4-1 (most capable, extended reasoning)
# - claude-haiku-4-5 (fastest, cost-effective)
3. OpenAI
# Get API key from https://platform.openai.com/
# Configure
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export MODEL_NAME=gpt-5.1
# Available models:
# - gpt-5.1 (flagship)
# - gpt-5.1-pro (most capable)
# - gpt-5-mini (fast, cost-effective)
# - gpt-5.1-nano (smallest, fastest)
1. Google Gemini (Default - Recommended)
# Get API key from https://aistudio.google.com/apikey
# Configure
export LLM_PROVIDER=google
export GOOGLE_API_KEY=...
export MODEL_NAME=gemini-2.5-flash
# Production-grade Gemini models (officially supported):
# - gemini-2.5-flash (Fast, efficient, production-ready - RECOMMENDED)
# - gemini-2.5-pro (Most capable for complex reasoning, production-ready)
#
# Note: Only these two models are production-grade. Other Gemini variants
# may be experimental or preview releases not suitable for production use.
4. Azure OpenAI
# Deploy model in Azure Portal
# Configure
export LLM_PROVIDER=azure
export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.openai.azure.com
export AZURE_DEPLOYMENT_NAME=gpt-4
export MODEL_NAME=azure/gpt-4
# Model format: azure/<deployment-name>
5. AWS Bedrock
# Configure AWS credentials
# Configure
export LLM_PROVIDER=bedrock
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
export MODEL_NAME=anthropic.claude-3-sonnet-20240229-v1:0
# Available models:
# - anthropic.claude-sonnet-4-5-20250929-v2:0
# - anthropic.claude-opus-4-5-20251101-v1:0
# - meta.llama3-1-70b-instruct-v1:0
# - amazon.titan-text-premier-v1:0
6. Ollama (Local/Open-Source)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull mistral:7b
ollama pull deepseek-coder:6.7b
# Configure
export LLM_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export MODEL_NAME=ollama/llama3.1:8b
# Model format: ollama/<model-name>:<tag>
Model Examples
Anthropic Models
# Claude Sonnet 4.5 (Best overall, 200K context)
MODEL_NAME=claude-sonnet-4-5
# Claude Opus 4.1 (Most capable, 200K context with extended reasoning)
MODEL_NAME=claude-opus-4-1
# Claude Haiku 4.5 (Fastest, 200K context, cost-effective)
MODEL_NAME=claude-haiku-4-5
OpenAI Models
# GPT-5 (Flagship, 128K context)
MODEL_NAME=gpt-5.1
# GPT-5 Pro (Most capable, 128K context)
MODEL_NAME=gpt-5.1-pro
# GPT-5 Mini (Fast and cost-effective, 128K context)
MODEL_NAME=gpt-5-mini
# GPT-5 Nano (Smallest, fastest, 128K context)
MODEL_NAME=gpt-5.1-nano
Google Gemini Models (Default/Recommended)
# Gemini 2.5 Flash (Production-grade: fast, efficient - RECOMMENDED)
MODEL_NAME=gemini-2.5-flash
# Gemini 2.5 Pro (Production-grade: most capable for complex tasks)
MODEL_NAME=gemini-2.5-pro
Ollama (Open-Source)
# Llama 3.1 (Meta's latest)
MODEL_NAME=ollama/llama3.1:8b # 8B parameters
MODEL_NAME=ollama/llama3.1:70b # 70B parameters
# Qwen 2.5 (Alibaba, multilingual)
MODEL_NAME=ollama/qwen2.5:7b # 7B parameters
MODEL_NAME=ollama/qwen2.5:32b # 32B parameters
# Mistral (Open, efficient)
MODEL_NAME=ollama/mistral:7b # 7B base
MODEL_NAME=ollama/mixtral:8x7b # 8x7B MoE
# DeepSeek Coder (Code specialist)
MODEL_NAME=ollama/deepseek-coder:6.7b # Code generation
# Phi-3 (Microsoft, small but capable)
MODEL_NAME=ollama/phi3:mini # 3.8B parameters
MODEL_NAME=ollama/phi3:medium # 14B parameters
Fallback Strategy
The agent automatically falls back to alternative models if the primary fails:
# Configure fallback models
ENABLE_FALLBACK=true
FALLBACK_MODELS=["gpt-5.1", "gemini-2.5-flash", "claude-sonnet-4-5"]
Fallback Order Example
# Primary: Claude Sonnet 4.5
LLM_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-5
# Fallbacks (in order):
FALLBACK_MODELS=[
"gpt-5.1", # Try OpenAI GPT-5
"gemini-2.5-pro", # Try Google Gemini
"ollama/llama3.1:8b" # Try local Llama
]
Fallback Behavior
- Primary model fails → Try first fallback
- First fallback fails → Try second fallback
- All fallbacks fail → Return error
Fallback triggers on:
- API rate limits
- Model unavailability
- Network errors
- Timeout errors
Best Practices
1. Cost Optimization
# Development: Use cheaper/local models
LLM_PROVIDER=ollama
MODEL_NAME=ollama/llama3.1:8b
# Staging: Use fast, cost-effective models
LLM_PROVIDER=openai
MODEL_NAME=gpt-5.1-nano
# Production: Use best models with fallback
LLM_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-5
FALLBACK_MODELS=["gpt-5.1", "gemini-2.5-flash"]
2. Latency Optimization
Fastest models:
# Cloud (sub-second)
- claude-haiku-4-5
- gpt-5.1-nano
- gpt-5-mini
- gemini-2.5-flash
# Local (depends on hardware)
- ollama/phi3:mini
- ollama/llama3.1:8b
- ollama/mistral:7b
3. Context Length
Large context needs:
# 1M+ tokens
- gemini-2.5-pro (2M)
- gemini-2.5-flash (1M)
# 200K tokens
- claude-sonnet-4-5 (200K)
- claude-opus-4-1 (200K)
# 128K tokens
- gpt-5.1 (128K)
- gpt-5.1-pro (128K)
4. Multilingual Support
Best for non-English:
- qwen2.5:7b (70+ languages)
- gemini-2.5-pro (100+ languages)
- claude-sonnet-4-5 (excellent multilingual)
5. Code Generation
Best for coding:
- deepseek-coder:6.7b (specialized)
- claude-sonnet-4-5 (excellent)
- gpt-5.1 (very good)
Testing Different Providers
Quick Test Script
# Test Anthropic
export LLM_PROVIDER=anthropic MODEL_NAME=claude-sonnet-4-5
python examples/test_llm.py
# Test OpenAI
export LLM_PROVIDER=openai MODEL_NAME=gpt-5.1
python examples/test_llm.py
# Test Google
export LLM_PROVIDER=google MODEL_NAME=gemini-2.5-pro
python examples/test_llm.py
# Test Ollama
export LLM_PROVIDER=ollama MODEL_NAME=ollama/llama3.1:8b
python examples/test_llm.py
Test with MCP Server
# Update .env with desired provider
vim .env
# Run MCP server
python -m mcp_server_langgraph.mcp.server_streamable
# Test with example client
python examples/streamable_http_client.py
Monitoring
LiteLLM usage is automatically tracked with OpenTelemetry:
# Metrics collected:
- llm.invoke (successful calls by model)
- llm.fallback (fallback usage by model)
- llm.failed (failed calls by model)
# Traces include:
- Provider name
- Model name
- Token usage
- Latency
- Error details
View in Jaeger: http://localhost:16686
Troubleshooting
API Key Not Working
# Verify key is set
echo $ANTHROPIC_API_KEY
# Test key directly
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01"
Ollama Connection Failed
# Check Ollama is running
ollama serve
# Test connection
curl http://localhost:11434/api/tags
# Verify model is pulled
ollama list
Model Not Found
# LiteLLM uses specific formats:
- ✅ claude-sonnet-4-5
❌ claude-3.5-sonnet
- ✅ ollama/llama3.1:8b
❌ llama3.1
- ✅ azure/gpt-4
❌ gpt-4 (when using Azure)
Resources
Support
For LiteLLM issues:
Last Updated: 2025-01-10