19. Async-First Architecture
Date: 2025-10-13Status
AcceptedCategory
Core ArchitectureContext
Modern AI agents require numerous I/O-bound operations:- LLM API calls: 1-30 seconds per request (network latency)
- Database queries: Redis sessions, OpenFGA authorization checks
- External APIs: Keycloak authentication, Infisical secrets retrieval
- Concurrent requests: Multiple users, parallel tool executions
- Low throughput: Each worker handles only 1 request at a time
- Poor scalability: Need 100 workers to handle 20 concurrent users
- Resource waste: Workers idle during I/O waits
- Timeout risk: Long chains of I/O operations easily exceed timeouts
Decision
We will adopt an async-first architecture using Python’sasyncio throughout the codebase.
Core Principle
All I/O operations MUST be async. Pure CPU work MAY be sync.Async Everywhere
All layers of the application use async:Async Libraries
We use async-compatible libraries:| Component | Async Library | Sync Alternative (Rejected) |
|---|---|---|
| HTTP Client | httpx.AsyncClient | requests |
| Redis | redis.asyncio | redis |
| LLM Providers | litellm.acompletion | litellm.completion |
| Web Framework | FastAPI | Flask |
| OpenFGA | openfga_sdk (async) | N/A |
| Keycloak | python-keycloak (async methods) | N/A |
Async/Await Patterns
Pattern 1: Concurrent Execution
Pattern 2: Async Iteration
Pattern 3: Timeout Management
Consequences
Positive Consequences
-
High Throughput: Single worker handles 100+ concurrent requests
- Example: 100 LLM calls in progress, not blocking each other
- Throughput: 10-50x improvement over sync
-
Resource Efficiency: Far fewer workers needed
- Sync: 100 workers for 100 concurrent requests
- Async: 4-8 workers for 100 concurrent requests
-
Better User Experience: Lower latency for concurrent operations
- Parallel API calls complete in max(t1, t2, …) not sum(t1, t2, …)
- Scalability: Handle 1000+ concurrent connections per instance
- Cost Savings: Fewer servers/pods required for same load
Negative Consequences
-
Complexity: Async code is harder to write and debug
- Must understand event loops, coroutines, async context
- Stack traces can be confusing
-
Library Constraints: Must use async-compatible libraries
- Some libraries only have sync versions (workaround: run in executor)
- Mixing sync/async requires careful handling
-
Testing Challenges: Async tests require
pytest-asyncio -
Blocking Pitfalls: Accidentally using sync I/O blocks event loop
Neutral Consequences
- Learning Curve: Team must learn async patterns
- Migration Effort: Existing sync code requires refactoring
- Debugging Tools: Need async-aware profilers (e.g.,
aiomonitor)
Implementation Details
Async Codebase Statistics
- LLM Factory (
llm/factory.py):async def acall() - Session Management (
auth/session.py): All methods async - User Provider (
auth/user_provider.py):async def authenticate() - OpenFGA Client (
auth/openfga.py):async def check() - MCP Server (
mcp/server_streamable.py,mcp/server_stdio.py): Async handlers - Agent Graph (
core/agent.py):async def ainvoke()
FastAPI Integration
Async Context Managers
Running Sync Code in Async Context
When unavoidable sync code must run:Async Testing
Alternatives Considered
1. Synchronous Architecture (Traditional)
Description: Use synchronous Python with threaded workers (e.g., Gunicorn with threads) Pros:- Simpler code (no async/await)
- Easier debugging (linear stack traces)
- More library compatibility
- Low throughput (~1-5 req/s per worker)
- High memory usage (each thread = ~8MB stack)
- GIL contention (threads compete for Global Interpreter Lock)
- Poor scalability (need 100+ workers for moderate load)
2. Sync with Celery (Task Queue)
Description: Sync API offloads long tasks to Celery workers Pros:- Sync API code (simpler)
- Background processing
- Retry logic built-in
- Additional infrastructure (Redis/RabbitMQ for queue)
- Complexity (task serialization, result backends)
- Latency (queueing overhead)
- Not suitable for request-response (user waits for task)
3. Threading (threading.Thread)
Description: Use Python threads for concurrency Pros:- Familiar threading model
- Standard library support
- GIL bottleneck (only one thread runs Python code at a time)
- No benefit for CPU-bound tasks
- High memory (stack per thread)
- Complex synchronization (locks, deadlocks, race conditions)
4. Multiprocessing
Description: Usemultiprocessing to fork worker processes
Pros:
- True parallelism (no GIL)
- Good for CPU-bound tasks
- High memory (full process copy per worker)
- Slow startup (process forking overhead)
- IPC complexity (sharing data between processes)
- Not suitable for I/O-bound (overkill)
5. Hybrid (Sync API + Async I/O)
Description: Expose sync API but use async I/O internally Pros:- Sync API (easier for users)
- Async benefits internally
- Complexity (mixing paradigms)
- Event loop management (who runs the loop?)
- Testing confusion (sync tests calling async code)
Performance Benchmarks
Throughput Comparison
Scenario: 100 concurrent LLM API calls (each takes 2 seconds)| Architecture | Total Time | Throughput | Workers Needed |
|---|---|---|---|
| Sync (threads) | 200 seconds | 0.5 req/s | 100 |
| Async | 2 seconds | 50 req/s | 1 |
Memory Usage
| Architecture | Memory per Worker | 100 Concurrent Requests |
|---|---|---|
| Sync (threads) | 8 MB × 100 threads | 800 MB |
| Async | 50 MB (single process) | 50 MB |
Real-World Metrics
From production deployments:Integration Points
Uvicorn ASGI Server
LangGraph Async Support
Redis Async Client
OpenFGA Async Client
Best Practices
1. Always Await Async Functions
2. Use asyncio.gather() for Concurrent I/O
3. Set Timeouts for I/O Operations
4. Use Async Context Managers
Future Enhancements
- Async Streaming: Stream LLM responses token-by-token
- Async Background Tasks: Scheduled jobs with
apschedulerasync support - Async Batch Processing: Process large datasets with async workers
- Structured Concurrency: Explore
anyiofor cleaner async patterns
References
- Python asyncio Documentation: https://docs.python.org/3/library/asyncio.html
- FastAPI Async Support: https://fastapi.tiangolo.com/async/
- Uvicorn ASGI Server: https://www.uvicorn.org/
- Redis Async Client: https://redis-py.readthedocs.io/en/stable/
- Async Code Locations:
- LLM Factory:
src/mcp_server_langgraph/llm/factory.py - Session Store:
src/mcp_server_langgraph/auth/session.py - User Provider:
src/mcp_server_langgraph/auth/user_provider.py - OpenFGA:
src/mcp_server_langgraph/auth/openfga.py - Agent:
src/mcp_server_langgraph/core/agent.py
- LLM Factory: