Overview
The MCP Server with LangGraph exposes a RESTful API following the Model Context Protocol (MCP) specification with additional custom endpoints for health checks and metrics.
Base URL
Local Development
Kubernetes
http://localhost:8000
``` yaml
``` bash Production
https://your-domain.com
Authentication
All API requests require JWT authentication.
Getting a Token
from mcp_server_langgraph.auth.middleware import AuthMiddleware
auth = AuthMiddleware( secret_key = "your-secret-key" )
token = auth.create_token( "username" , expires_in = 3600 )
Using the Token
Include the token in the Authorization header:
Authorization : Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Rate Limiting
Rate limits are enforced via Kong API Gateway when enabled.
Tier Requests/Minute Burst Free 60 10 Standard 300 50 Premium 1000 100
When rate limited, you’ll receive a 429 Too Many Requests response:
{
"error" : "rate_limit_exceeded" ,
"message" : "Too many requests" ,
"retry_after" : 60
}
All successful responses follow this structure:
{
"content" : "Response content" ,
"role" : "assistant" ,
"model" : "gemini-2.5-flash-002" ,
"usage" : {
"prompt_tokens" : 25 ,
"completion_tokens" : 150 ,
"total_tokens" : 175
},
"trace_id" : "abc123-def456-ghi789" ,
"authorized" : true
}
Error Handling
Error responses include detailed information:
{
"error" : "authentication_failed" ,
"message" : "Invalid or expired token" ,
"trace_id" : "abc123-def456" ,
"timestamp" : "2025-10-10T12:34:56Z"
}
HTTP Status Codes
Code Meaning Description 200OK Request successful 400Bad Request Invalid request parameters 401Unauthorized Missing or invalid token 403Forbidden Insufficient permissions 404Not Found Resource not found 429Too Many Requests Rate limit exceeded 500Internal Server Error Server error 503Service Unavailable Service temporarily unavailable
API Endpoints
SDK Libraries
from langgraph_mcp import MCPClient
client = MCPClient(
base_url = "http://localhost:8000" ,
api_key = "your-token"
)
response = await client.send_message(
"Hello, how can you help me?"
)
Python SDK is coming soon. Use the HTTP API directly for now.
OpenAPI Specification
Access the OpenAPI spec interactively:
http://localhost:8000/docs
``` yaml
``` bash ReDoc
http://localhost:8000/redoc
Versioning
The API uses URL versioning:
/v1/message
/v1/tools
/v1/health
Current version: v1
Version 1 is stable and production-ready. Breaking changes will increment the version number.
Common Patterns
Streaming Responses
async with client.stream_message( "Tell me a long story" ) as stream:
async for chunk in stream:
print (chunk.content, end = '' , flush = True )
Batch Requests
messages = [
"What is AI?" ,
"Explain machine learning" ,
"What is deep learning?"
]
responses = await client.batch_send(messages)
Context Management
## Maintain conversation context
context = {}
response1 = await client.send_message(
"My name is Alice" ,
context =context
)
context.update(response1.context)
response2 = await client.send_message(
"What's my name?" ,
context =context
)
## Response: "Your name is Alice"
Next Steps