Testing Guide
This document describes the testing strategy, conventions, and best practices for the MCP Server LangGraph project.Table of Contents
- Test Organization
- Test Categories
- Fixture Standards
- Identity & Authentication
- Running Tests
- TDD Best Practices
- Deployment Testing
Test Organization
Test Categories
Unit Tests (@pytest.mark.unit)
- Purpose: Test individual components in isolation
- Dependencies: Mocked
- Speed: Fast (<1s per test)
- Examples: Model validation, business logic, utility functions
API Tests (@pytest.mark.api)
- Purpose: Test REST API endpoints with mocked dependencies
- Dependencies: FastAPI TestClient with dependency injection overrides
- Speed: Fast (<1s per test)
- Examples: API key endpoints, service principal endpoints, GDPR endpoints
Integration Tests (@pytest.mark.integration)
- Purpose: Test multiple components working together
- Dependencies: Some real, some mocked
- Speed: Medium (1-5s per test)
- Examples: Database + business logic, external API integrations
E2E Tests (@pytest.mark.e2e)
- Purpose: Test complete user journeys
- Infrastructure: Requires
docker-compose.test.ymlservices - Speed: Slow (5-30s per test)
- Current Status: Transitioning from mocks to real infrastructure
E2E Test Strategy Decision (2025-01-05)
Decision: Migrate E2E tests from mocks to real infrastructure Rationale:- Infrastructure validation already exists (
test_infrastructure_check) - Mock-based E2E tests provide false confidence
- Real E2E tests catch integration issues that unit tests miss
- ✅ Phase 1: Document current state (this file)
- 🔄 Phase 2: Remove mock dependencies from E2E tests (future)
- 🔄 Phase 3: Implement real Keycloak/OpenFGA integration (future)
- 🔄 Phase 4: Remove
@pytest.mark.skipfrom journey tests (future)
- Tests marked as
@pytest.mark.e2ebut use mocks (lines 116-131 intest_full_user_journey.py) - Infrastructure check validates real services (PostgreSQL, Redis, OpenFGA, Keycloak, Qdrant)
- Comment on line 115: “Use HTTP mock until real Keycloak is implemented”
Deployment Configuration Tests (@pytest.mark.deployment)
- Purpose: Validate Helm charts, Kustomize overlays, and deployment manifests
- Dependencies: File system, helm/kustomize CLI tools (optional)
- Speed: Fast (<1s per test)
- Examples: Secret key alignment, CORS security, version consistency
- Location:
tests/deployment/
- ✅ Helm secret template validation (missing keys detection)
- ✅ CORS security (prevents wildcard + credentials vulnerability)
- ✅ Hard-coded credential detection
- ✅ Placeholder validation (YOUR_PROJECT_ID, REPLACE_ME, example.com)
- ✅ ExternalSecrets key alignment
- ✅ Namespace consistency across overlays
- ✅ Version consistency across deployment methods
- ✅ Resource limits and security contexts
- ✅ Pod security standards compliance
validate-deployment-secrets- Secret key alignmentvalidate-cors-security- CORS configuration safetycheck-hardcoded-credentials- Credential exposure preventionvalidate-redis-password-required- Redis authentication enforcementcheck-dangerous-placeholders- Placeholder leak detection
.github/workflows/validate-deployments.yml) runs on every PR:
- Helm chart linting and template rendering
- Kustomize build validation across 5 environments (matrix)
- YAML syntax validation
- Security scanning (gitleaks, CORS, placeholders)
- Version consistency checks
adr/adr-0046-deployment-configuration-tdd-infrastructure.md for full details.
Performance & Benchmark Tests (@pytest.mark.benchmark)
- Purpose: Validate system performance and detect regressions
- Dependencies: pytest-benchmark plugin
- Speed: Slow (100 iterations, ~30s per benchmark suite)
- Examples: JWT encoding/decoding, OpenFGA authorization, LLM request handling
- Location:
tests/performance/
- Problem: Timeout tests used real asyncio.sleep(5-10s), burning ~15s per run
- Solution: Reduced to 0.05-0.3s sleeps (100x faster, same behavior)
- Impact: Tests now complete in ~6s instead of 15s (60% speedup)
- Validation: Meta-tests prevent regression (test_performance_regression.py)
- Problem: Benchmarks ran by default, slowing everyday test runs
- Solution: Benchmarks now skip unless explicitly requested
- Impact: 90% faster test runs (benchmarks skipped in 0.57s vs 30+ seconds)
- Usage: See “Run Performance Benchmarks” section above
- More stable than mean (resistant to outliers)
- Better reflects user experience (p95/p99 SLA targets)
- Industry standard for performance testing
Fixture Standards
Shared Fixtures (tests/conftest.py)
mock_current_user (Function-scoped)
Standard authenticated user fixture for API tests.
test_container (Session-scoped)
Dependency injection container for test environment.
- No-op telemetry (no output)
- No-op auth (accepts any token)
- In-memory storage
- No global side effects
container (Function-scoped)
Per-test container for isolated testing.
Identity & Authentication
User Identity Formats
The system uses dual identity formats for compatibility:1. OpenFGA Format (Authorization)
- Used for: OpenFGA tuples, API responses, authorization checks
- Best Practice: Always use this format for
user_idfields in API responses
2. Keycloak UUID Format (Authentication)
- Used for: Keycloak Admin API calls, internal database keys
- Best Practice: Use this format for
keycloak_idwhen interacting with Keycloak
Examples
✅ Correct Usage
❌ Incorrect Usage
Test Fixture Patterns
API Endpoint Tests
Admin Permission Tests
Running Tests
Run All Tests
Run by Category
Run Performance Benchmarks (CODEX Finding #4)
- Benchmarks run 100 iterations for statistical accuracy (~30s per suite)
- Normal development doesn’t need benchmark validation
- CI explicitly runs benchmarks in dedicated job
- 90% faster test runs for everyday development
Run Tests Requiring CLI Tools (CODEX Finding #1)
Run by File
Run Specific Test
Useful Flags
E2E Infrastructure Setup
Run Meta-Validation (CODEX Finding #5)
Git Hooks and Validation
Updated: 2025-11-13 - Reorganized for developer productivity and CI parity This project uses a two-stage validation strategy to balance speed and comprehensiveness:Pre-commit Hooks (Fast - < 30 seconds)
Runs on changed files only at commit time. Optimized for rapid iteration. What runs:- Auto-fixers: black, isort, trailing-whitespace, etc.
- Fast linters: flake8, bandit, shellcheck
- Critical safety: memory safety, fixture organization, async mock usage
- File-specific validators: workflow syntax, MDX frontmatter
Pre-push Hooks (Comprehensive - 8-12 minutes)
Runs on all files before push. Matches CI validation exactly to eliminate surprises. 4-Phase Validation: Phase 1: Fast Checks (< 30s)- Lockfile validation (
uv lock --check) - Workflow validation tests
- MyPy type checking (non-blocking)
- Unit tests:
pytest tests/ -m unit -x --tb=short - Smoke tests:
pytest tests/smoke/ -v --tb=short - Integration tests (last failed):
pytest tests/integration/ -x --tb=short --lf - Property tests:
HYPOTHESIS_PROFILE=ci pytest -m property -x --tb=short
- All comprehensive validators (documentation, deployment, etc.)
- Runs with
--hook-stage pre-pushflag
Performance Monitoring
Expected Performance
| Stage | Target | Files | Description |
|---|---|---|---|
| Pre-commit | < 30s | Changed only | Fast feedback for commits |
| Pre-push | 8-12 min | All files | Comprehensive CI-equivalent validation |
Benefits
- Fast commits: 80-90% faster than before (2-5 min → 15-30s)
- Zero surprises: Pre-push matches CI exactly
- Early detection: Catches issues before push, not in CI
- CI reliability: Expected 80%+ reduction in CI failures
Troubleshooting
If pre-commit hooks fail:Documentation
- Categorization:
docs-internal/HOOK_CATEGORIZATION.md - Migration guide:
docs-internal/PRE_COMMIT_PRE_PUSH_REORGANIZATION.md - Performance monitoring:
scripts/measure_hook_performance.py
TDD Best Practices
1. Red-Green-Refactor Cycle
2. Test One Thing at a Time
3. Use Exact Mock Assertions
4. Arrange-Act-Assert Pattern
5. Test Error Cases
6. Use Descriptive Test Names
Regression Test Patterns
Overview
Regression tests prevent fixed bugs from reoccurring by encoding the fix as a permanent test case. The project maintains a comprehensive suite of regression tests intests/regression/ that encode fixes for real Codex findings and production issues.
Key Regression Test Categories
| Category | Location | Purpose |
|---|---|---|
| pytest-xdist Isolation | tests/regression/test_pytest_xdist_*.py | Prevent parallel test pollution |
| Service Principal | tests/regression/test_service_principal_*.py | Auth isolation between tests |
| LangGraph Types | tests/regression/test_langgraph_return_types.py | Type safety for agent responses |
| FastAPI Auth | tests/regression/test_fastapi_auth_override_*.py | Dependency injection correctness |
| GDPR Isolation | tests/regression/test_gdpr_singleton_*.py | Privacy compliance isolation |
Running Regression Tests
tests/regression/ which document the original issue and fix in their docstrings.
Last Updated: 2025-11-10 Status: ✅ Complete and current Recent Updates:
- 2025-11-10: Added comprehensive regression test patterns section documenting fixes for Codex findings
- 2025-01-05: Initial comprehensive testing guide