Skip to main content

LiteLLM Integration Guide

Complete guide for using multiple LLM providers with the MCP Server with LangGraph.

Table of Contents

Overview

The MCP Server with LangGraph uses LiteLLM to support 100+ LLM providers with a unified interface. This allows you to:
  • ✅ Switch between providers without code changes
  • ✅ Use open-source models (Llama, Qwen, Mistral, etc.)
  • ✅ Implement automatic fallback between models
  • ✅ Optimize costs by provider/model selection
  • ✅ Test locally with Ollama before deploying

Supported Providers

Cloud Providers

ProviderModelsConfiguration Required
AnthropicClaude Sonnet 4.5, Claude Opus 4.1, Claude Haiku 4.5ANTHROPIC_API_KEY
OpenAIGPT-5, GPT-5 Pro, GPT-5 Mini, GPT-5 NanoOPENAI_API_KEY
GoogleGemini 2.5 Flash, Gemini 2.5 Pro, Gemini 2.0 ProGOOGLE_API_KEY
Azure OpenAIGPT-4, GPT-3.5AZURE_API_KEY, AZURE_API_BASE
AWS BedrockClaude, Llama, TitanAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY

Open-Source (Ollama)

Model FamilyModelsLocal Setup
LlamaLlama 3.1, Llama 2 (7B-70B)Install Ollama
QwenQwen 2.5 (0.5B-72B)Install Ollama
MistralMistral 7B, Mixtral 8x7BInstall Ollama
DeepSeekDeepSeek Coder, DeepSeek LLMInstall Ollama
OthersPhi-3, Gemma, Yi, etc.Install Ollama

Configuration

Environment Variables

Create or update .env:
# Choose your primary provider (default: google)
LLM_PROVIDER=google  # google, anthropic, openai, azure, bedrock, ollama

# Model name (provider-specific format)
# Default: Gemini 2.5 Flash (latest, fastest)
MODEL_NAME=gemini-2.5-flash

# Model parameters
MODEL_TEMPERATURE=0.7
MODEL_MAX_TOKENS=8192
MODEL_TIMEOUT=60

# Fallback configuration
ENABLE_FALLBACK=true
FALLBACK_MODELS=["gemini-2.5-flash", "claude-sonnet-4-5", "gpt-5.1"]

API Keys

# Google Gemini (Primary - Get from: https://aistudio.google.com/apikey)
GOOGLE_API_KEY=...

# Anthropic (Fallback)
ANTHROPIC_API_KEY=sk-ant-...

# OpenAI (Fallback)
OPENAI_API_KEY=sk-...

# Azure OpenAI
AZURE_API_KEY=...
AZURE_API_BASE=https://your-resource.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview
AZURE_DEPLOYMENT_NAME=gpt-4

# AWS Bedrock
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1

# Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434

Provider Setup

2. Anthropic (Claude)

# Get API key from https://console.anthropic.com/

# Configure
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
export MODEL_NAME=claude-sonnet-4-5

# Available models:
# - claude-sonnet-4-5 (excellent all-around)
# - claude-opus-4-1 (most capable, extended reasoning)
# - claude-haiku-4-5 (fastest, cost-effective)

3. OpenAI

# Get API key from https://platform.openai.com/

# Configure
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
export MODEL_NAME=gpt-5.1

# Available models:
# - gpt-5.1 (flagship)
# - gpt-5.1-pro (most capable)
# - gpt-5-mini (fast, cost-effective)
# - gpt-5.1-nano (smallest, fastest)
# Get API key from https://aistudio.google.com/apikey

# Configure
export LLM_PROVIDER=google
export GOOGLE_API_KEY=...
export MODEL_NAME=gemini-2.5-flash

# Production-grade Gemini models (officially supported):
# - gemini-2.5-flash (Fast, efficient, production-ready - RECOMMENDED)
# - gemini-2.5-pro (Most capable for complex reasoning, production-ready)
#
# Note: Only these two models are production-grade. Other Gemini variants
# may be experimental or preview releases not suitable for production use.

4. Azure OpenAI

# Deploy model in Azure Portal

# Configure
export LLM_PROVIDER=azure
export AZURE_API_KEY=...
export AZURE_API_BASE=https://your-resource.openai.azure.com
export AZURE_DEPLOYMENT_NAME=gpt-4
export MODEL_NAME=azure/gpt-4

# Model format: azure/<deployment-name>

5. AWS Bedrock

# Configure AWS credentials

# Configure
export LLM_PROVIDER=bedrock
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
export MODEL_NAME=anthropic.claude-3-sonnet-20240229-v1:0

# Available models:
# - anthropic.claude-sonnet-4-5-20250929-v2:0
# - anthropic.claude-opus-4-5-20251101-v1:0
# - meta.llama3-1-70b-instruct-v1:0
# - amazon.titan-text-premier-v1:0

6. Ollama (Local/Open-Source)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull mistral:7b
ollama pull deepseek-coder:6.7b

# Configure
export LLM_PROVIDER=ollama
export OLLAMA_BASE_URL=http://localhost:11434
export MODEL_NAME=ollama/llama3.1:8b

# Model format: ollama/<model-name>:<tag>

Model Examples

Anthropic Models

# Claude Sonnet 4.5 (Best overall, 200K context)
MODEL_NAME=claude-sonnet-4-5

# Claude Opus 4.1 (Most capable, 200K context with extended reasoning)
MODEL_NAME=claude-opus-4-1

# Claude Haiku 4.5 (Fastest, 200K context, cost-effective)
MODEL_NAME=claude-haiku-4-5

OpenAI Models

# GPT-5 (Flagship, 128K context)
MODEL_NAME=gpt-5.1

# GPT-5 Pro (Most capable, 128K context)
MODEL_NAME=gpt-5.1-pro

# GPT-5 Mini (Fast and cost-effective, 128K context)
MODEL_NAME=gpt-5-mini

# GPT-5 Nano (Smallest, fastest, 128K context)
MODEL_NAME=gpt-5.1-nano

Google Gemini Models (Default/Recommended)

# Gemini 2.5 Flash (Production-grade: fast, efficient - RECOMMENDED)
MODEL_NAME=gemini-2.5-flash

# Gemini 2.5 Pro (Production-grade: most capable for complex tasks)
MODEL_NAME=gemini-2.5-pro

Ollama (Open-Source)

# Llama 3.1 (Meta's latest)
MODEL_NAME=ollama/llama3.1:8b          # 8B parameters
MODEL_NAME=ollama/llama3.1:70b         # 70B parameters

# Qwen 2.5 (Alibaba, multilingual)
MODEL_NAME=ollama/qwen2.5:7b           # 7B parameters
MODEL_NAME=ollama/qwen2.5:32b          # 32B parameters

# Mistral (Open, efficient)
MODEL_NAME=ollama/mistral:7b           # 7B base
MODEL_NAME=ollama/mixtral:8x7b         # 8x7B MoE

# DeepSeek Coder (Code specialist)
MODEL_NAME=ollama/deepseek-coder:6.7b  # Code generation

# Phi-3 (Microsoft, small but capable)
MODEL_NAME=ollama/phi3:mini            # 3.8B parameters
MODEL_NAME=ollama/phi3:medium          # 14B parameters

Fallback Strategy

The agent automatically falls back to alternative models if the primary fails:
# Configure fallback models
ENABLE_FALLBACK=true
FALLBACK_MODELS=["gpt-5.1", "gemini-2.5-flash", "claude-sonnet-4-5"]

Fallback Order Example

# Primary: Claude Sonnet 4.5
LLM_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-5

# Fallbacks (in order):
FALLBACK_MODELS=[
    "gpt-5.1",                               # Try OpenAI GPT-5
    "gemini-2.5-pro",                      # Try Google Gemini
    "ollama/llama3.1:8b"                   # Try local Llama
]

Fallback Behavior

  1. Primary model fails → Try first fallback
  2. First fallback fails → Try second fallback
  3. All fallbacks fail → Return error
Fallback triggers on:
  • API rate limits
  • Model unavailability
  • Network errors
  • Timeout errors

Best Practices

1. Cost Optimization

# Development: Use cheaper/local models
LLM_PROVIDER=ollama
MODEL_NAME=ollama/llama3.1:8b

# Staging: Use fast, cost-effective models
LLM_PROVIDER=openai
MODEL_NAME=gpt-5.1-nano

# Production: Use best models with fallback
LLM_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-5
FALLBACK_MODELS=["gpt-5.1", "gemini-2.5-flash"]

2. Latency Optimization

Fastest models:
# Cloud (sub-second)
- claude-haiku-4-5
- gpt-5.1-nano
- gpt-5-mini
- gemini-2.5-flash

# Local (depends on hardware)
- ollama/phi3:mini
- ollama/llama3.1:8b
- ollama/mistral:7b

3. Context Length

Large context needs:
# 1M+ tokens
- gemini-2.5-pro (2M)
- gemini-2.5-flash (1M)

# 200K tokens
- claude-sonnet-4-5 (200K)
- claude-opus-4-1 (200K)

# 128K tokens
- gpt-5.1 (128K)
- gpt-5.1-pro (128K)

4. Multilingual Support

Best for non-English:
- qwen2.5:7b (70+ languages)
- gemini-2.5-pro (100+ languages)
- claude-sonnet-4-5 (excellent multilingual)

5. Code Generation

Best for coding:
- deepseek-coder:6.7b (specialized)
- claude-sonnet-4-5 (excellent)
- gpt-5.1 (very good)

Testing Different Providers

Quick Test Script

# Test Anthropic
export LLM_PROVIDER=anthropic MODEL_NAME=claude-sonnet-4-5
python examples/test_llm.py

# Test OpenAI
export LLM_PROVIDER=openai MODEL_NAME=gpt-5.1
python examples/test_llm.py

# Test Google
export LLM_PROVIDER=google MODEL_NAME=gemini-2.5-pro
python examples/test_llm.py

# Test Ollama
export LLM_PROVIDER=ollama MODEL_NAME=ollama/llama3.1:8b
python examples/test_llm.py

Test with MCP Server

# Update .env with desired provider
vim .env

# Run MCP server
python -m mcp_server_langgraph.mcp.server_streamable

# Test with example client
python examples/streamable_http_client.py

Monitoring

LiteLLM usage is automatically tracked with OpenTelemetry:
# Metrics collected:
- llm.invoke (successful calls by model)
- llm.fallback (fallback usage by model)
- llm.failed (failed calls by model)

# Traces include:
- Provider name
- Model name
- Token usage
- Latency
- Error details
View in Jaeger: http://localhost:16686

Troubleshooting

API Key Not Working

# Verify key is set
echo $ANTHROPIC_API_KEY

# Test key directly
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"

Ollama Connection Failed

# Check Ollama is running
ollama serve

# Test connection
curl http://localhost:11434/api/tags

# Verify model is pulled
ollama list

Model Not Found

# LiteLLM uses specific formats:
- claude-sonnet-4-5
 claude-3.5-sonnet

- ollama/llama3.1:8b
 llama3.1

- azure/gpt-4
 gpt-4 (when using Azure)

Resources

Support

For LiteLLM issues:
Last Updated: 2025-01-10