Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Testing Guide
This document describes the testing strategy, conventions, and best practices for the MCP Server LangGraph project.Table of Contents
- Test Organization
- Test Categories
- Fixture Standards
- Identity & Authentication
- Running Tests
- TDD Best Practices
- Deployment Testing
Test Organization
Test Categories
Unit Tests (@pytest.mark.unit)
- Purpose: Test individual components in isolation
- Dependencies: Mocked
- Speed: Fast (<1s per test)
- Examples: Model validation, business logic, utility functions
API Tests (@pytest.mark.api)
- Purpose: Test REST API endpoints with mocked dependencies
- Dependencies: FastAPI TestClient with dependency injection overrides
- Speed: Fast (<1s per test)
- Examples: API key endpoints, service principal endpoints, GDPR endpoints
Integration Tests (@pytest.mark.integration)
- Purpose: Test multiple components working together
- Dependencies: Some real, some mocked
- Speed: Medium (1-5s per test)
- Examples: Database + business logic, external API integrations
E2E Tests (@pytest.mark.e2e)
- Purpose: Test complete user journeys
- Infrastructure: Requires
docker-compose.test.ymlservices - Speed: Slow (5-30s per test)
- Current Status: Transitioning from mocks to real infrastructure
E2E Test Strategy Decision (2025-01-05)
Decision: Migrate E2E tests from mocks to real infrastructure Rationale:- Infrastructure validation already exists (
test_infrastructure_check) - Mock-based E2E tests provide false confidence
- Real E2E tests catch integration issues that unit tests miss
- ✅ Phase 1: Document current state (this file)
- 🔄 Phase 2: Remove mock dependencies from E2E tests (future)
- 🔄 Phase 3: Implement real Keycloak/OpenFGA integration (future)
- 🔄 Phase 4: Remove
@pytest.mark.skipfrom journey tests (future)
- Tests marked as
@pytest.mark.e2ebut use mocks (lines 116-131 intest_full_user_journey.py) - Infrastructure check validates real services (PostgreSQL, Redis, OpenFGA, Keycloak, Qdrant)
- Comment on line 115: “Use HTTP mock until real Keycloak is implemented”
Deployment Configuration Tests (@pytest.mark.deployment)
- Purpose: Validate Helm charts, Kustomize overlays, and deployment manifests
- Dependencies: File system, helm/kustomize CLI tools (optional)
- Speed: Fast (<1s per test)
- Examples: Secret key alignment, CORS security, version consistency
- Location:
tests/deployment/
- ✅ Helm secret template validation (missing keys detection)
- ✅ CORS security (prevents wildcard + credentials vulnerability)
- ✅ Hard-coded credential detection
- ✅ Placeholder validation (YOUR_PROJECT_ID, REPLACE_ME, example.com)
- ✅ ExternalSecrets key alignment
- ✅ Namespace consistency across overlays
- ✅ Version consistency across deployment methods
- ✅ Resource limits and security contexts
- ✅ Pod security standards compliance
validate-deployment-secrets- Secret key alignmentvalidate-cors-security- CORS configuration safetycheck-hardcoded-credentials- Credential exposure preventionvalidate-redis-password-required- Redis authentication enforcementcheck-dangerous-placeholders- Placeholder leak detection
.github/workflows/validate-deployments.yml) runs on every PR:
- Helm chart linting and template rendering
- Kustomize build validation across 5 environments (matrix)
- YAML syntax validation
- Security scanning (gitleaks, CORS, placeholders)
- Version consistency checks
adr/adr-0046-deployment-configuration-tdd-infrastructure.md for full details.
Performance & Benchmark Tests (@pytest.mark.benchmark)
- Purpose: Validate system performance and detect regressions
- Dependencies: pytest-benchmark plugin
- Speed: Slow (100 iterations, ~30s per benchmark suite)
- Examples: JWT encoding/decoding, OpenFGA authorization, LLM request handling
- Location:
tests/performance/
- Problem: Timeout tests used real asyncio.sleep(5-10s), burning ~15s per run
- Solution: Reduced to 0.05-0.3s sleeps (100x faster, same behavior)
- Impact: Tests now complete in ~6s instead of 15s (60% speedup)
- Validation: Meta-tests prevent regression (test_performance_regression.py)
- Problem: Benchmarks ran by default, slowing everyday test runs
- Solution: Benchmarks now skip unless explicitly requested
- Impact: 90% faster test runs (benchmarks skipped in 0.57s vs 30+ seconds)
- Usage: See “Run Performance Benchmarks” section above
- More stable than mean (resistant to outliers)
- Better reflects user experience (p95/p99 SLA targets)
- Industry standard for performance testing
Fixture Standards
Shared Fixtures (tests/conftest.py)
mock_current_user (Function-scoped)
Standard authenticated user fixture for API tests.
test_container (Session-scoped)
Dependency injection container for test environment.
- No-op telemetry (no output)
- No-op auth (accepts any token)
- In-memory storage
- No global side effects
container (Function-scoped)
Per-test container for isolated testing.
Identity & Authentication
User Identity Formats
The system uses dual identity formats for compatibility:1. OpenFGA Format (Authorization)
- Used for: OpenFGA tuples, API responses, authorization checks
- Best Practice: Always use this format for
user_idfields in API responses
2. Keycloak UUID Format (Authentication)
- Used for: Keycloak Admin API calls, internal database keys
- Best Practice: Use this format for
keycloak_idwhen interacting with Keycloak
Examples
✅ Correct Usage
❌ Incorrect Usage
Test Fixture Patterns
API Endpoint Tests
Admin Permission Tests
Running Tests
Run All Tests
Run by Category
Run Performance Benchmarks (CODEX Finding #4)
- Benchmarks run 100 iterations for statistical accuracy (~30s per suite)
- Normal development doesn’t need benchmark validation
- CI explicitly runs benchmarks in dedicated job
- 90% faster test runs for everyday development
Run Tests Requiring CLI Tools (CODEX Finding #1)
Run by File
Run Specific Test
Useful Flags
E2E Infrastructure Setup
Run Meta-Validation (CODEX Finding #5)
Git Hooks and Validation
Updated: 2025-11-13 - Reorganized for developer productivity and CI parity This project uses a two-stage validation strategy to balance speed and comprehensiveness:Pre-commit Hooks (Fast - < 30 seconds)
Runs on changed files only at commit time. Optimized for rapid iteration. What runs:- Auto-fixers: black, isort, trailing-whitespace, etc.
- Fast linters: flake8, bandit, shellcheck
- Critical safety: memory safety, fixture organization, async mock usage
- File-specific validators: workflow syntax, MDX frontmatter
Pre-push Hooks (Comprehensive - 8-12 minutes)
Runs on all files before push. Matches CI validation exactly to eliminate surprises. 4-Phase Validation: Phase 1: Fast Checks (< 30s)- Lockfile validation (
uv lock --check) - Workflow validation tests
- MyPy type checking (non-blocking)
- Unit tests:
pytest tests/ -m unit -x --tb=short - Smoke tests:
pytest tests/smoke/ -v --tb=short - Integration tests (last failed):
pytest tests/integration/ -x --tb=short --lf - Property tests:
HYPOTHESIS_PROFILE=ci pytest -m property -x --tb=short
- All comprehensive validators (documentation, deployment, etc.)
- Runs with
--hook-stage pre-pushflag
Performance Monitoring
Expected Performance
| Stage | Target | Files | Description |
|---|---|---|---|
| Pre-commit | < 30s | Changed only | Fast feedback for commits |
| Pre-push | 8-12 min | All files | Comprehensive CI-equivalent validation |
Benefits
- Fast commits: 80-90% faster than before (2-5 min → 15-30s)
- Zero surprises: Pre-push matches CI exactly
- Early detection: Catches issues before push, not in CI
- CI reliability: Expected 80%+ reduction in CI failures
Troubleshooting
If pre-commit hooks fail:Documentation
- Categorization:
docs-internal/HOOK_CATEGORIZATION.md - Migration guide:
docs-internal/PRE_COMMIT_PRE_PUSH_REORGANIZATION.md - Performance monitoring:
scripts/measure_hook_performance.py
TDD Best Practices
1. Red-Green-Refactor Cycle
2. Test One Thing at a Time
3. Use Exact Mock Assertions
4. Arrange-Act-Assert Pattern
5. Test Error Cases
6. Use Descriptive Test Names
Regression Test Patterns
Overview
Regression tests prevent fixed bugs from reoccurring by encoding the fix as a permanent test case. The project maintains a comprehensive suite of regression tests intests/regression/ that encode fixes for real Codex findings and production issues.
Key Regression Test Categories
| Category | Location | Purpose |
|---|---|---|
| pytest-xdist Isolation | tests/regression/test_pytest_xdist_*.py | Prevent parallel test pollution |
| Service Principal | tests/regression/test_service_principal_*.py | Auth isolation between tests |
| LangGraph Types | tests/regression/test_langgraph_return_types.py | Type safety for agent responses |
| FastAPI Auth | tests/regression/test_fastapi_auth_override_*.py | Dependency injection correctness |
| GDPR Isolation | tests/regression/test_gdpr_singleton_*.py | Privacy compliance isolation |
Running Regression Tests
tests/regression/ which document the original issue and fix in their docstrings.
Last Updated: 2025-11-10 Status: ✅ Complete and current Recent Updates:
- 2025-11-10: Added comprehensive regression test patterns section documenting fixes for Codex findings
- 2025-01-05: Initial comprehensive testing guide