36. Hybrid Session Model for Long-Running Tasks
Date: 2025-01-28Status
AcceptedCategory
Authentication & AuthorizationContext
JWT-based authentication (ADR-0032) uses short-lived access tokens (15 minutes) for security, but this conflicts with long-running tasks:- Batch ETL jobs running 6-12 hours
- Streaming WebSocket connections
- Background processors with persistent queues
- Scheduled reports generated over hours
- Real-time data pipelines
- 15-minute tokens inadequate for multi-hour tasks
- Client-side refresh complex for background processes
- Purely stateless JWT can’t be revoked mid-session
- Service principals need 30-day authentication
Decision
Implement a Hybrid Session Model where standard users use stateless JWTs with client-side refresh, while service principals optionally use server-side sessions with automatic token refresh for long-running tasks.Architecture
Distributed Session Consistency Flow
Mode 1: Stateless JWT (Default for Users):- 15-minute access tokens
- 30-minute refresh tokens
- Client-side refresh logic
- No server-side storage
- 15-minute access tokens (still short)
- 30-day refresh tokens stored in Redis
- Server-side automatic refresh before expiration
- Session TTL: 30 days with sliding window
Core Principles
- JWT First: Always use JWTs for authentication (not opaque session IDs)
- Selective Sessions: Only service principals use server-side sessions
- Automatic Refresh: Server refreshes tokens before expiration
- Sliding Window: Session TTL extends on activity
- Instant Revocation: Server-side sessions can be revoked immediately
- Redis Storage: Distributed session store for multi-replica support
Configuration
Consequences
Positive Consequences
- Long-running task support (30-day sessions)
- Automatic refresh (no client complexity)
- Instant revocation capability
- Security maintained (still use short-lived access tokens)
- Audit trail (session activity logged)
Negative Consequences
- Redis dependency (infrastructure overhead)
- Complexity (two modes to maintain)
- Storage cost (session data in Redis)
- Potential session leakage if not cleaned up
Mitigation Strategies
- Redis HA setup (cluster mode), automatic TTL expiration
- Clear documentation on when to use each mode
- Monitoring: session count, memory usage, refresh failures
- Cleanup job for orphaned sessions
Alternatives Considered
- Long-Lived JWTs: Rejected - cannot revoke, security risk
- Sessions for All: Rejected - stateless architecture preferred for users
- Stateless Only: Rejected - inadequate for long-running tasks
- Token Exchange: Rejected - complex, requires admin credentials
Implementation
Session Data Structure:src/mcp_server_langgraph/auth/session.py):
src/mcp_server_langgraph/auth/middleware.py):
References
- Session Store:
src/mcp_server_langgraph/auth/session.py(existing, to be enhanced) - Middleware:
src/mcp_server_langgraph/auth/middleware.py(to be updated) - Related ADRs: ADR-0006, ADR-0032, ADR-0033
- External: Redis TTL, OAuth2 Refresh