29. Custom Exception Hierarchy
Date: 2025-10-20Status
AcceptedCategory
Core ArchitectureContext
The MCP server currently uses generic Python exceptions (Exception, ValueError, RuntimeError) throughout the codebase. This leads to:
Current Problems:
- Poor Error Handling: Cannot distinguish between different error types
- Vague Error Messages: Generic exceptions don’t provide context
- Difficult Debugging: Hard to trace error source in logs
- Lost Context: Stack traces don’t show business logic errors clearly
- Poor Observability: Cannot filter errors by type in metrics
- Inconsistent HTTP Status Codes: Same exception → different status codes
- Error handling is reactive, not proactive
- Difficult to implement proper retry logic (which errors to retry?)
- Metrics are generic (all errors lumped together)
- User-facing errors lack clarity
Decision
Implement a comprehensive custom exception hierarchy that:- Provides clear error semantics
- Includes rich error context (metadata, trace IDs)
- Maps to HTTP status codes automatically
- Enables fine-grained error handling
- Improves observability and debugging
Exception Hierarchy
Architecture
New Module: src/mcp_server_langgraph/core/exceptions.py
FastAPI Integration
Exception Handler
Migration Guide
Before (Generic Exceptions)
After (Custom Exceptions)
Consequences
Positive
-
Better Error Handling
- Can catch specific exceptions:
except TokenExpiredError: - Implement custom retry logic per exception type
- Different handling for client vs server errors
- Can catch specific exceptions:
-
Improved Observability
- Error metrics by category/code
- Trace IDs for debugging
- Rich metadata in logs
-
Better User Experience
- Clear, actionable error messages
- Appropriate HTTP status codes
- Retry hints (Retry-After header)
-
Easier Debugging
- Stack traces show business logic errors
- Metadata provides context
- Trace IDs link to distributed traces
-
Compliance
- Specific exceptions for GDPR/HIPAA/SOC2 violations
- Audit trail of compliance errors
Negative
-
More Code
- 300+ lines for exception hierarchy
- Need to update all exception raises
-
Learning Curve
- Developers need to learn exception hierarchy
- Need to choose correct exception type
-
Migration Effort
- Update ~100+ exception raises across codebase
- Test all error paths
Implementation Plan
Week 1: Foundation
- Create ADR-0029 (this document)
- Create
core/exceptions.pywith hierarchy - Add FastAPI exception handlers
- Write 50+ unit tests for exceptions
- Update developer documentation
Week 2: Migration - Auth & Core
- Migrate
auth/module to custom exceptions - Migrate
core/module to custom exceptions - Update error handling tests
- Verify metrics are collected correctly
Week 3: Migration - LLM & External Services
- Migrate
llm/module to custom exceptions - Add retry logic based on
retry_policy - Wrap external service errors properly
- Test circuit breaker integration
Week 4: Migration - Remaining Modules
- Migrate
api/,mcp/,compliance/modules - Update all integration tests
- Verify HTTP status codes are correct
- Performance test (exception overhead)
Week 5: Documentation & Rollout
- Update API documentation with error codes
- Create error code reference guide
- Deploy to staging, test error flows
- Deploy to production
References
- ADR-0017: Error Handling Strategy: ./adr-0017-error-handling-strategy.md
- ADR-0030: Resilience Patterns: ./adr-0026-resilience-patterns.md
- Python Exception Best Practices: https://docs.python.org/3/tutorial/errors.html
- HTTP Status Codes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
- REST API Error Handling: https://www.baeldung.com/rest-api-error-handling-best-practices
Last Updated: 2025-10-20 Next Review: 2025-11-20