42. Dependency Injection Configuration Fixes
Date: 2025-01-28Status
AcceptedCategory
Core ArchitectureContext
OpenAI Codex identified three critical runtime failures and two high-priority configuration issues in our dependency injection system that would cause production outages:Critical Risks Identified
-
Keycloak Admin Credentials Not Wired (
dependencies.py:31-40)- Admin username/password from settings were not passed to KeycloakConfig
- Caused all admin API operations to fail with 400/401 errors
- Affected: API key CRUD, service principal creation, user management
-
OpenFGA Client Always Instantiated Despite Missing Config (
dependencies.py:45-59)- Client created even when
store_id=Noneandmodel_id=None - OpenFGA SDK raises errors on first
check_permission()call - Caused confusing 500 errors, broke graceful degradation
- Client created even when
-
Service Principal Manager Crashes When OpenFGA Disabled (
service_principal.py:197-221)_sync_to_openfga()assumedself.openfgais always usable- Caused
AttributeError: 'NoneType' object has no attribute 'write_tuples' - Broke service principal workflows in environments without OpenFGA
High Priority Issues
-
L2 Cache Ignores Secure Redis Settings (
cache.py:94-120)- Used
redis.Redis(host=..., port=...)instead ofredis.from_url() - Ignored
settings.redis_url,settings.redis_password,settings.redis_ssl - Caused silent fallback to L1-only in production, degrading performance
- Used
-
Missing Startup/Integration Test Coverage
- No tests validating dependency factory wiring
- Bugs only discovered at runtime in production
- No smoke tests for FastAPI/MCP server startup with default settings
Decision
We implement defensive configuration with fail-fast validation and graceful degradation for all dependency factories.1. Fix Keycloak Admin Credentials
Changed:src/mcp_server_langgraph/core/dependencies.py:34-40
- ✅ Admin API operations now authenticate correctly
- ✅ API key manager can create/delete keys
- ✅ Service principal creation works
- ✅ User management operations succeed
2. Add OpenFGA Configuration Validation
Changed:src/mcp_server_langgraph/core/dependencies.py:47-76
- ✅ Returns
Nonewhen config incomplete instead of broken client - ✅ Logs clear warning about missing configuration
- ✅ Enables graceful degradation in non-production environments
- ✅ Prevents confusing 500 errors from OpenFGA SDK
3. Add OpenFGA Guards in Service Principal Manager
Changed:src/mcp_server_langgraph/auth/service_principal.py
3a. Update Constructor Type Hint
3b. Guard _sync_to_openfga Method
3c. Guard associate_with_user Method
3d. Guard delete_service_principal Method
- ✅ No more AttributeError crashes when OpenFGA disabled
- ✅ Service principal operations work in fallback mode
- ✅ Keycloak operations succeed independently of OpenFGA
- ✅ Clear separation of concerns (identity vs. authorization)
4. Fix L2 Cache Redis Configuration
Changed:src/mcp_server_langgraph/core/cache.py:73-138
get_cache() function to pass all settings:
- ✅ L2 cache now works with secure Redis deployments
- ✅ Honors
REDIS_URL,REDIS_PASSWORD,REDIS_SSLsettings - ✅ Consistent pattern with API key manager
- ✅ Production performance restored (L1+L2 instead of L1-only)
5. Add Comprehensive Test Coverage
Created:tests/unit/test_dependencies_wiring.py
Tests added (following TDD):
-
Keycloak admin credential wiring
- Validates
admin_usernameandadmin_passwordare passed - Documents failure mode without credentials
- Validates
-
OpenFGA config validation
- Tests
Nonereturned whenstore_id/model_idmissing - Tests client created when config complete
- Tests warning logged for incomplete config
- Tests
-
Service principal OpenFGA guards
- Tests creation succeeds with
NoneOpenFGA client - Tests deletion succeeds with
NoneOpenFGA client - Tests user association succeeds with
NoneOpenFGA client
- Tests creation succeeds with
-
Integration smoke tests
- Tests Keycloak client factory with real settings
- Tests OpenFGA client factory with incomplete config
- Tests service principal manager with disabled OpenFGA
tests/unit/test_cache_redis_config.py
Tests added:
-
Cache Redis configuration
- Tests
redis.from_url()pattern is used - Tests password and SSL settings honored
- Compares with correct API key manager pattern
- Tests
-
Graceful degradation
- Tests fallback to L1 when Redis unavailable
- Tests production Redis URL scenarios
Consequences
Positive
- ✅ Production Stability: All critical runtime failures fixed
- ✅ Graceful Degradation: System works in partial-config scenarios
- ✅ Clear Error Messages: Warnings explain missing configuration
- ✅ Test Coverage: Comprehensive tests prevent regressions
- ✅ Consistent Patterns: All Redis clients use
from_url()pattern - ✅ Security: Secure Redis settings (password, SSL) now honored
Negative
- ⚠️ API Breaking Change:
get_openfga_client()now returnsOptional[OpenFGAClient]- Callers must handle
Nonecase - Mitigated by: Service principal manager already handles this
- Callers must handle
- ⚠️ Increased Verbosity: More parameters to
CacheService.__init__- Mitigated by: Parameters have sensible defaults
Neutral
🔄 Configuration Required: OpenFGA now requires explicit configuration- Production: Must set
OPENFGA_STORE_IDandOPENFGA_MODEL_ID - Development: Falls back gracefully with warning
Implementation Notes
TDD Process Followed
All fixes followed strict TDD:- ✅ RED: Wrote failing tests first
- ✅ GREEN: Implemented minimal fix to pass tests
- ✅ REFACTOR: Improved code quality while keeping tests green
Migration Guide
For Keycloak Admin Operations
Ensure environment variables are set:For OpenFGA Authorization
Either configure fully or accept degraded mode:For Secure Redis Caching
Configure Redis with credentials:Rollout Plan
-
Phase 1: Deploy to development environment
- Verify warnings for incomplete OpenFGA config
- Verify Keycloak admin operations work
- Verify Redis cache with credentials
-
Phase 2: Deploy to staging environment
- Run full test suite
- Verify service principal workflows
- Verify L2 cache performance metrics
-
Phase 3: Deploy to production
- Monitor error rates (should drop to zero)
- Monitor cache hit rates (should increase with L2)
- Monitor OpenFGA operation success rate
References
- OpenAI Codex Security Review (2025-01-28)
- ADR-0034: API Key JWT Exchange
- ADR-0033: Service Principal Design
- Production Incident: Revision 758b8f744 (Redis password encoding)
Verification
Pre-Deployment Checklist
- All tests pass (
pytest tests/unit/test_dependencies_wiring.py tests/unit/test_cache_redis_config.py) - Keycloak admin credentials wired in dependencies.py
- OpenFGA client validates config and returns None when incomplete
- Service principal manager guards all OpenFGA operations
- Cache service uses redis.from_url() with password/SSL
- ADR document created and reviewed
Post-Deployment Validation
- No 400/401 errors from Keycloak admin operations
- No AttributeError crashes from service principal manager
- No 500 errors from OpenFGA SDK when disabled
- L2 cache hit rate > 0% in production (was 0% before fix)
- Redis connection uses TLS in production metrics
Conclusion
These fixes address 5 critical production failures identified by OpenAI Codex. All fixes follow defensive programming principles:- Fail-fast validation (OpenFGA config check)
- Graceful degradation (OpenFGA returns None)
- Guard clauses (Service principal OpenFGA guards)
- Secure defaults (Redis password/SSL support)
- Comprehensive testing (100% coverage of bug scenarios)