41. Pure PostgreSQL for GDPR/HIPAA/SOC2 Compliance Storage
Date: 2025-11-02Status
AcceptedCategory
Security & ComplianceContext
We need to implement production-ready persistent storage for GDPR/HIPAA/SOC2 compliance data, including:- User profiles
- User preferences
- Consent records (7-year retention)
- Conversations (90-day retention)
- Audit logs (7-year retention)
- Not persistent (data lost on restart)
- Not ACID-compliant (critical for GDPR Article 17 - Right to Erasure)
- Not suitable for compliance audits
- Blocked production deployment of GDPR endpoints
Storage Requirements
Compliance-Driven Retention Policies:| Data Type | Retention | Compliance Basis |
|---|---|---|
| Audit Logs | 7 years | HIPAA §164.316(b)(2)(i), SOC2 CC6.6, GDPR Article 5(2) |
| Consent Records | 7 years | GDPR Article 7 (legal requirement) |
| Conversations | 90 days | GDPR Article 5(1)(e) (storage limitation) |
| User Profiles | Until deletion | GDPR Article 17 (right to erasure) |
| Preferences | Until deletion | GDPR Article 17 (right to erasure) |
Options Considered
Option 1: Redis Only
Pros:- Fast (< 1ms latency)
- Already in stack (sessions, checkpoints)
- Simple to implement (4-6 hours)
- In-memory storage (expensive for 7-year data: 50/month for PostgreSQL)
- Limited querying (no time-series queries for audit logs)
- Not designed for compliance data (cache, not system of record)
- No ACID guarantees across multiple keys
- Complex backup/restore for compliance audits
Option 2: PostgreSQL Only ✅ CHOSEN
Pros:- ACID compliance (critical for GDPR Article 17 atomic deletions)
- Time-series queries for audit logs (HIPAA/SOC2 requirement)
- Cost-effective (14x cheaper than Redis for 7-year storage)
- Already in stack (Keycloak, OpenFGA use PostgreSQL)
- Single source of truth (simple architecture)
- Proven database for compliance data
- Excellent backup/restore capabilities
- Slower than Redis (5-10ms vs
<1ms) - Requires schema migrations
- New dependency (asyncpg)
- GDPR endpoints are user-initiated, infrequent operations (not performance-critical)
- Data export: Maybe once per user per lifetime
- Data deletion: Once per user
- Consent updates: Occasionally
- 5-10ms latency is acceptable for these operations
Option 3: Hybrid (Redis + PostgreSQL)
Pros:- Best performance (Redis) for hot data
- Best cost (PostgreSQL) for cold data
- Architectural complexity (two systems to maintain)
- No ACID across systems (cannot atomically delete from Redis + PostgreSQL)
- Backup coordination (must ensure consistent snapshots)
- Auditor complexity (data scattered across two systems)
- GDPR Article 17 risk: Right to Erasure requires atomic deletion across all systems
Decision
Use Pure PostgreSQL for all GDPR/HIPAA/SOC2 compliance data storage.Rationale
-
ACID Guarantees Critical for GDPR
- Article 17 (Right to Erasure) requires atomic deletion across all data
- Single transaction ensures consistency
-
Compliance Audit Simplicity
- Auditors need to verify data retention and deletion
- Single system = simple to audit
- Complete audit trail in one place
-
Cost-Effectiveness
- 7 years of audit logs in PostgreSQL: ~$50/month (disk storage)
- 7 years of audit logs in Redis: ~$720/month (memory storage)
- 14x cost reduction
-
Already in Stack
- Keycloak uses PostgreSQL
- OpenFGA uses PostgreSQL
- Team has PostgreSQL operational expertise
- No new infrastructure needed
-
Performance is Acceptable
- 5-10ms for GDPR operations (user-initiated, infrequent)
- Not a bottleneck for compliance endpoints
- Auth sessions still use Redis (fast path preserved)
-
Future-Proof
- Complex audit queries:
WHERE timestamp BETWEEN ... AND action = 'phi.access' - Full-text search on conversations (if needed)
- Compliance reporting with SQL
- Machine learning on historical data
- Complex audit queries:
Implementation
Database Schema:- 5 tables:
user_profiles,user_preferences,consent_records,conversations,audit_logs - Optimized indexes for GDPR query patterns
- Triggers for automatic timestamp updates
- Views for retention policy enforcement
- CASCADE deletes for referential integrity
- In-memory → PostgreSQL (zero downtime, new data goes to PostgreSQL)
- No hybrid complexity
- No data migration needed (clean start)
Consequences
Positive
-
✅ Production-ready GDPR endpoints
- Can deploy to production immediately
- Full compliance with GDPR Articles 5, 15, 17
- ACID guarantees for atomic operations
-
✅ HIPAA compliance
- 7-year audit log retention (§164.316(b)(2)(i))
- Tamper-proof audit trail (§164.312(b))
- Encryption at rest (§164.312(a)(2)(iv))
-
✅ SOC2 compliance
- Automated evidence collection (CC6.6)
- Data retention policies (PI1.4)
- Audit log queries for compliance reports
-
✅ Simple architecture
- One database for all compliance data
- Single backup/restore process
- Clear audit trail
-
✅ Cost-effective
- 14x cheaper than Redis for long-term storage
- Disk-based storage for 7-year data
-
✅ Type-safe implementation
- Strict mypy enforcement (Phase 4)
- Comprehensive test coverage (50+ tests)
- Production-grade error handling
Negative
-
⚠️ Slightly slower than Redis
- 5-10ms vs
<1msfor Redis - Acceptable for infrequent GDPR operations
- Not a performance bottleneck
- 5-10ms vs
-
⚠️ Schema migrations required
- Need Alembic or manual migrations for schema changes
- Mitigated by: Well-designed schema with JSONB flexibility
-
⚠️ New dependency
- Added
asyncpgto core dependencies - Mitigated by: asyncpg is battle-tested, widely used
- Added
Trade-Offs Accepted
- Performance for simplicity: 5-10ms latency acceptable for compliance operations
- Migration overhead for correctness: Schema migrations worth it for data integrity
- Dependency for compliance: asyncpg dependency justified for production-ready compliance
Compliance Mapping
GDPR
| Article | Requirement | PostgreSQL Implementation |
|---|---|---|
| Article 5(1)(e) | Storage limitation | 90-day retention view for conversations |
| Article 7 | Consent conditions | 7-year consent record retention |
| Article 15 | Right to access | SQL queries for complete data export |
| Article 17 | Right to erasure | ACID transaction for atomic deletion |
| Article 20 | Data portability | JSON export from single database |
HIPAA
| Requirement | PostgreSQL Implementation | |
|---|---|---|
| §164.312(b) | Audit controls | Comprehensive audit log table |
| §164.316(b)(2)(i) | 7-year retention | Index-optimized for 7-year queries |
| §164.312(a)(2)(iv) | Encryption at rest | PostgreSQL TDE + application-level encryption |
SOC2
| Control | PostgreSQL Implementation | |
|---|---|---|
| CC6.6 | Audit logging | Tamper-proof append-only audit logs |
| PI1.4 | Data retention | Automated retention views and cleanup |
| A1.2 | System availability | 99.9% SLA with PostgreSQL clustering |
Implementation Notes
Database Configuration
Development:Operational Procedures
Backup:Security Considerations
- Encryption at Rest: Enable PostgreSQL Transparent Data Encryption (TDE)
- Encryption in Transit: Require TLS 1.3 for all connections
- Access Control: Dedicated GDPR database user with minimal permissions
- Audit Logging: PostgreSQL audit extension (
pgaudit) for database-level audit trail - Network Isolation: Deploy in private subnet with network policies
References
- GDPR Text
- HIPAA Security Rule
- SOC 2 Trust Service Criteria
- PostgreSQL Documentation
- asyncpg Documentation
Related ADRs
- ADR-0002: Fine-Grained Authorization (OpenFGA) - Uses PostgreSQL
- ADR-0003: Dual Observability Strategy - Audit logs feed both systems
- ADR-0006: Session Storage Architecture - Redis for sessions, PostgreSQL for compliance
- ADR-0030: Resilience Patterns - Applies to database connections
- ADR-0032: JWT Standardization - Authentication separate from compliance data
- ADR-0037: Identity Federation - Users in Keycloak PostgreSQL, compliance data in separate DB
Review and Approval
Author: Claude Code Assistant Date: 2025-11-02 Status: Accepted Reviewers: [To be filled by team]Next Review Date: 2026-11-02 (Annual review) Change History:
- 2025-11-02: Initial ADR created (v1.0)