60. Database Architecture and Naming Convention
Date: 2025-11-18Status
ProposedCategory
Data & StorageContext
The project currently uses PostgreSQL for multiple purposes across different environments:- GDPR compliance storage (user profiles, consents, audit logs, conversations, preferences)
- OpenFGA authorization (relationship tuples, authorization model)
- Keycloak authentication (user accounts, realms, clients, sessions)
Current Problems
1. Inconsistent Database Naming Across Environments- Test environment correctly separates databases (
gdpr_test,openfga_test,keycloak_test) - Dev environment uses single
gdprdatabase (no separation!) - Main environment uses multi-database script but inconsistent naming
- Production uses external managed databases with undefined structure
- Test environment: Uses
migrations/000_init_databases.sh(creates 3 databases) - Main environment: Uses
scripts/postgres/init-multiple-databases.sh(different script!) - Dev environment: No migration support (single database)
- No validation that databases were created correctly
- No ADR defining database architecture principles
- No naming convention standard
- No guidance for adding new databases
- No validation/testing of database initialization
Why This Matters
Security: Mixing GDPR-regulated data with authorization data in single database violates data separation principles. Compliance: GDPR Article 25 (Data protection by design) requires appropriate technical measures including data segregation. Scalability: Single database limits ability to:- Scale different services independently
- Implement different backup strategies (GDPR data: 7 years, sessions: ephemeral)
- Use different PostgreSQL extensions per service
Decision
We will implement a consistent multi-database architecture with a clear naming convention and automated validation.Architecture Principles
- One Database Per Service - Each logical service gets its own PostgreSQL database
- Environment Suffixes - Clear suffix naming:
{service}(dev/prod),{service}_test(test) - Consistent Initialization - Single migration script used across all environments
- Automated Validation - Health checks verify correct database structure
- Documentation - ADR + diagrams + configuration examples
Database Naming Convention
| Service | Development | Test | Staging | Production |
|---|---|---|---|---|
| GDPR Compliance | gdpr | gdpr_test | gdpr | gdpr |
| OpenFGA Authorization | openfga | openfga_test | openfga | openfga |
| Keycloak Authentication | keycloak | keycloak_test | keycloak | keycloak |
{service}_test suffix. All other environments use {service}.
Rationale:
- Test suffix prevents accidental connection to dev/prod databases
- No suffix in prod/staging keeps database names simple
- Consistent across environments reduces cognitive load
Database Responsibilities
gdpr / gdpr_test
Purpose: GDPR/HIPAA/SOC2 compliance data storage
Schema:
user_profiles- User account data (GDPR Article 15: Right to access)user_preferences- User settings and preferencesconsent_records- Consent tracking (GDPR Article 7, 7-year retention)conversations- Chat history (90-day retention, GDPR Article 5(1)(e))audit_logs- Compliance audit trail (HIPAA §164.312(b), 7-year retention)
migrations/001_gdpr_schema.sql
Data Classification: PII, Regulated (GDPR/HIPAA)
Retention:
- Consent records: 7 years (compliance requirement)
- Audit logs: 7 years (compliance requirement)
- Conversations: 90 days (operational)
- User profiles: Until deletion request (GDPR Article 17)
openfga / openfga_test
Purpose: Fine-grained authorization (relationship tuples)
Schema: Managed by OpenFGA migrations (openfga migrate)
tuple- Authorization relationships (user:X can view:resource:Y)authorization_model- Permission model versionsstore- Multi-tenancy stores
openfga-migrate)
Data Classification: Authorization metadata (non-PII)
Retention: Operational (no compliance requirements)
keycloak / keycloak_test
Purpose: SSO, authentication, user identity
Schema: Managed by Keycloak
user_entity- User accounts, credentialsrealm- Multi-tenancy realmsclient- OAuth2/OIDC client applicationsuser_session- Active authentication sessions
PostgreSQL Instance Strategy
Local Development / Testing
Strategy: Single PostgreSQL instance with multiple databases Rationale:- Simple setup (one container)
- Faster development (no multi-instance coordination)
- Resource efficient (single PostgreSQL process)
- Test isolation via database separation (sufficient for unit/integration tests)
Staging / Production
Strategy: External managed PostgreSQL (Cloud SQL, RDS, etc.) with multiple databases Rationale:- Managed service benefits (backups, HA, monitoring)
- Production-grade performance
- Database-level separation for security
- Different retention policies per database
- Independent scaling (future: separate instances if needed)
Migration Script Architecture
Single Source of Truth:migrations/000_init_databases.sh
Used by:
docker-compose.test.yml- Test environment (createsgdpr_test,openfga_test,keycloak_test)docker-compose.yml- Development environment (createsgdpr,openfga,keycloak)docker/docker-compose.dev.yml- Alternative dev setup
- Single script reduces maintenance burden
- Environment detection (
_testsuffix) ensures correct naming - Idempotent (
WHERE NOT EXISTS) - safe to re-run - Error handling (
set -e,ON_ERROR_STOP=1) - Clear logging for debugging
Validation Strategy
Health Check Queries:Environment Variable Configuration
Standard Environment Variables:Consequences
Positive Consequences
✅ Consistency: Same database structure across dev/test/staging/prod ✅ Clarity: Clear naming convention removes ambiguity ✅ Security: Data separation aligns with GDPR data protection by design ✅ Testability: Test databases clearly isolated from development ✅ Maintainability: Single migration script, single source of truth ✅ Validation: Automated checks catch configuration errors early ✅ Documentation: ADR provides clear guidance for future changes ✅ Compliance: Supports GDPR Article 25 (data protection by design)Negative Consequences
❌ Migration Effort: Need to update existing docker-compose files, helm values, docs ❌ Backward Compatibility: Existing dev environments may need database rename ❌ Complexity: Multiple databases adds operational complexity (backups, monitoring) ❌ Resource Usage: Multiple databases consume more memory than single databaseNeutral Consequences
→ Learning Curve: Developers need to understand multi-database architecture → Testing Changes: Test fixtures need updated database connection stringsAlternatives Considered
1. Single Database for All Services
Description: Use one PostgreSQL database with table prefixes (e.g.,gdpr_user_profiles, openfga_tuple)
Pros:
- Simplest setup (one database)
- Easier backups (single database dump)
- No multi-database coordination
- Violates data separation principle (GDPR compliance risk)
- Cannot implement different retention policies per service
- Risk of table name collisions
- Cannot scale services independently
- Mixing regulated (GDPR) and non-regulated data (authorization)
2. Separate PostgreSQL Instances Per Service
Description: Run separate PostgreSQL containers/instances for each service Pros:- Maximum isolation (network-level)
- Independent scaling
- Different PostgreSQL versions per service
- Easier to migrate to managed services
- High resource usage (3x PostgreSQL processes)
- Complex local development setup
- More infrastructure to manage
- Network coordination overhead
3. Schema-Based Separation
Description: Use PostgreSQL schemas (gdpr.user_profiles, openfga.tuple)
Pros:
- Single database connection
- Logical separation
- Namespacing prevents collisions
- OpenFGA and Keycloak don’t support schema configuration (expect full database access)
- Still mixing data in single database (compliance concern)
- Complex permission management
- Cannot implement different backup strategies
Implementation Plan
Phase 1: Update Migration Scripts (Week 1)- ✅ Already completed:
migrations/000_init_databases.shcreatesgdpr_test,openfga_test,keycloak_test - Remove deprecated
scripts/postgres/init-multiple-databases.sh - Update main
docker-compose.ymlto usemigrations/000_init_databases.sh - Update
docker-compose.dev.ymlto use consistent multi-database pattern
- Update test fixtures to use
os.getenv("POSTGRES_DB", "gdpr_test")consistently - Add
get_database_name()helper tocore/config.py - Update application code to use helper function
- Update docker-compose environment variables
- Create
health/database_checks.pyvalidation module - Add pre-commit hook to validate database names in config files
- Add CI/CD integration test for database structure
- Write unit tests for
get_database_name()helper
- Update Helm values for production/staging with database configuration
- Document Cloud SQL setup requirements
- Update production deployment playbooks
- Add Terraform/Pulumi scripts for managed PostgreSQL setup
- Update README.md with database architecture section
- Create deployment guide with database setup instructions
- Update contributing guide with local database setup
- Create architecture diagram showing database relationships
Migration Guide
For Existing Local Development Environments
Option 1: Recreate Databases (Recommended)For Test Environment
No action required - Test environment already uses correct naming (gdpr_test, openfga_test, keycloak_test)
For Production / Staging
Cloud SQL Setup:- Create Cloud SQL PostgreSQL instance (if not exists)
- Create databases:
gdpr,openfga,keycloak - Run
migrations/001_gdpr_schema.sqlongdprdatabase - Let OpenFGA and Keycloak handle their schema migrations
- Update Helm values with database configuration
- Deploy via Helm
Performance Characteristics
| Metric | Single Database | Multi-Database (Same Instance) | Multi-Instance |
|---|---|---|---|
| Connection Pool Overhead | 1x | 3x | 3x |
| Memory Usage | 1x | 1.2x | 3x |
| Backup Complexity | Simple | Medium | High |
| Query Performance | Same | Same | Same |
| Operational Complexity | Low | Medium | High |
Security Considerations
Database-Level Isolation:- GDPR data in separate database prevents accidental leakage to authorization logs
- Each service can have different PostgreSQL users with limited permissions
- Audit logs cannot be tampered with by authorization service
Monitoring & Alerting
Health Checks (Kubernetes liveness/readiness):postgres_database_size_bytes{database="gdpr"}- Database size per servicepostgres_connections_active{database="gdpr"}- Active connections per databasegdpr_schema_version- Schema migration version
- Missing required tables (schema migration failure)
- Database size exceeding retention policy limits
- Connection pool exhaustion per database
References
- Implementation:
migrations/000_init_databases.sh - GDPR Schema:
migrations/001_gdpr_schema.sql - Configuration:
src/mcp_server_langgraph/core/config.py - Test Fixtures:
tests/integration/compliance/test_postgres_*.py - Docker Compose:
docker-compose.yml,docker-compose.test.yml - Helm Values:
deployments/helm/values-*.yaml