Testing Guide

This document describes the testing strategy, conventions, and best practices for the MCP Server LangGraph project.

Test Organization
Test Categories
- Deployment Configuration Tests
Fixture Standards
Identity & Authentication
Running Tests
TDD Best Practices
Deployment Testing

Test Organization

tests/
├── api/                        # API endpoint tests
│   ├── test_api_keys_endpoints.py
│   ├── test_service_principals_endpoints.py
│   └── test_openapi_compliance.py
├── e2e/                        # End-to-end journey tests
│   ├── test_full_user_journey.py
│   ├── test_scim_provisioning.py
│   └── helpers.py              # E2E test helpers
├── conftest.py                 # Shared fixtures
├── test_gdpr.py               # GDPR compliance tests
└── test_auth.py               # Authentication tests

Test Categories

Unit Tests (`@pytest.mark.unit`)

Purpose: Test individual components in isolation
Dependencies: Mocked
Speed: Fast (<1s per test)
Examples: Model validation, business logic, utility functions

API Tests (`@pytest.mark.api`)

Purpose: Test REST API endpoints with mocked dependencies
Dependencies: FastAPI TestClient with dependency injection overrides
Speed: Fast (<1s per test)
Examples: API key endpoints, service principal endpoints, GDPR endpoints

Integration Tests (`@pytest.mark.integration`)

Purpose: Test multiple components working together
Dependencies: Some real, some mocked
Speed: Medium (1-5s per test)
Examples: Database + business logic, external API integrations

E2E Tests (`@pytest.mark.e2e`)

Purpose: Test complete user journeys
Infrastructure: Requires docker-compose.test.yml services
Speed: Slow (5-30s per test)
Current Status: Transitioning from mocks to real infrastructure

E2E Test Strategy Decision (2025-01-05)

Decision: Migrate E2E tests from mocks to real infrastructure Rationale:

Infrastructure validation already exists (test_infrastructure_check)
Mock-based E2E tests provide false confidence
Real E2E tests catch integration issues that unit tests miss

Implementation Plan:

✅ Phase 1: Document current state (this file)
🔄 Phase 2: Remove mock dependencies from E2E tests (future)
🔄 Phase 3: Implement real Keycloak/OpenFGA integration (future)
🔄 Phase 4: Remove @pytest.mark.skip from journey tests (future)

Current State:

Tests marked as @pytest.mark.e2e but use mocks (lines 116-131 in test_full_user_journey.py)
Infrastructure check validates real services (PostgreSQL, Redis, OpenFGA, Keycloak, Qdrant)
Comment on line 115: “Use HTTP mock until real Keycloak is implemented”

Recommendation: Keep mocks for now, migrate incrementally as infrastructure matures

Deployment Configuration Tests (`@pytest.mark.deployment`)

Purpose: Validate Helm charts, Kustomize overlays, and deployment manifests
Dependencies: File system, helm/kustomize CLI tools (optional)
Speed: Fast (<1s per test)
Examples: Secret key alignment, CORS security, version consistency
Location: tests/deployment/

Test Coverage (11 tests, 91% coverage):

✅ Helm secret template validation (missing keys detection)
✅ CORS security (prevents wildcard + credentials vulnerability)
✅ Hard-coded credential detection
✅ Placeholder validation (YOUR_PROJECT_ID, REPLACE_ME, example.com)
✅ ExternalSecrets key alignment
✅ Namespace consistency across overlays
✅ Version consistency across deployment methods
✅ Resource limits and security contexts
✅ Pod security standards compliance

Running Deployment Tests:

# Run all deployment configuration tests
pytest tests/deployment/ -v

# Run unit tests (no helm/kustomize required)
pytest tests/deployment/test_helm_configuration.py -v

# Run E2E tests (requires helm/kustomize installed)
pytest tests/deployment/test_deployment_e2e.py -v

# Validate all deployment configs (comprehensive script)
./scripts/validate-deployments.sh

# Check deployed cluster health (requires kubectl)
./scripts/check-deployment-health.sh production-mcp-server-langgraph mcp

Pre-commit Validation: Deployment tests run automatically on commit via pre-commit hooks:

validate-deployment-secrets - Secret key alignment
validate-cors-security - CORS configuration safety
check-hardcoded-credentials - Credential exposure prevention
validate-redis-password-required - Redis authentication enforcement
check-dangerous-placeholders - Placeholder leak detection

CI/CD Integration: GitHub Actions workflow (.github/workflows/validate-deployments.yml) runs on every PR:

Helm chart linting and template rendering
Kustomize build validation across 5 environments (matrix)
YAML syntax validation
Security scanning (gitleaks, CORS, placeholders)
Version consistency checks

See adr/adr-0046-deployment-configuration-tdd-infrastructure.md for full details.

Performance & Benchmark Tests (`@pytest.mark.benchmark`)

Purpose: Validate system performance and detect regressions
Dependencies: pytest-benchmark plugin
Speed: Slow (100 iterations, ~30s per benchmark suite)
Examples: JWT encoding/decoding, OpenFGA authorization, LLM request handling
Location: tests/performance/

CODEX FINDINGS #2 & #4: Performance Optimizations Finding #2: Timeout Test Performance

Problem: Timeout tests used real asyncio.sleep(5-10s), burning ~15s per run
Solution: Reduced to 0.05-0.3s sleeps (100x faster, same behavior)
Impact: Tests now complete in ~6s instead of 15s (60% speedup)
Validation: Meta-tests prevent regression (test_performance_regression.py)

Finding #4: Benchmark Opt-In Model

Problem: Benchmarks ran by default, slowing everyday test runs
Solution: Benchmarks now skip unless explicitly requested
Impact: 90% faster test runs (benchmarks skipped in 0.57s vs 30+ seconds)
Usage: See “Run Performance Benchmarks” section above

Running Performance Tests:

# Run benchmarks (opt-in)
pytest --run-benchmarks

# Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Exclude benchmarks (default behavior)
pytest  # Benchmarks automatically skipped

# Compare benchmark results over time
pytest-benchmark compare 0001 0002

# View benchmark history
pytest-benchmark list

Benchmark Test Structure:

from tests.performance.conftest import PercentileBenchmark

@pytest.mark.benchmark
class TestMyBenchmarks:
    def test_operation_performance(self, percentile_benchmark):
        """
        Benchmark operation with percentile-based assertions.

        Uses 100 iterations for statistical accuracy.
        """
        result = percentile_benchmark(my_operation, arg1, arg2)

        # Assert p95 latency < 10ms (more stable than mean)
        percentile_benchmark.assert_percentile(95, 0.010)

        # Assert p99 latency < 15ms
        percentile_benchmark.assert_percentile(99, 0.015)

Why percentile-based assertions?

More stable than mean (resistant to outliers)
Better reflects user experience (p95/p99 SLA targets)
Industry standard for performance testing

Fixture Standards

Shared Fixtures (`tests/conftest.py`)

`mock_current_user` (Function-scoped)

Standard authenticated user fixture for API tests.

{
    "user_id": "user:alice",              # OpenFGA format
    "keycloak_id": "8c7b4e5d-...",       # Keycloak UUID
    "username": "alice",
    "email": "alice@example.com"
}

Usage:

def test_my_endpoint(test_client, mock_current_user):
    # Test client already has auth override
    response = test_client.get("/api/v1/resource")
    assert response.status_code == 200

`test_container` (Session-scoped)

Dependency injection container for test environment.

@pytest.fixture(scope="session")
def test_container():
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()

Features:

No-op telemetry (no output)
No-op auth (accepts any token)
In-memory storage
No global side effects

`container` (Function-scoped)

Per-test container for isolated testing.

@pytest.fixture
def container(test_container):
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()

Identity & Authentication

User Identity Formats

The system uses dual identity formats for compatibility:

1. OpenFGA Format (Authorization)

"user:alice"  # Format: user:{username}

Used for: OpenFGA tuples, API responses, authorization checks
Best Practice: Always use this format for user_id fields in API responses

2. Keycloak UUID Format (Authentication)

"8c7b4e5d-1234-5678-abcd-ef1234567890"

Used for: Keycloak Admin API calls, internal database keys
Best Practice: Use this format for keycloak_id when interacting with Keycloak

Examples

✅ Correct Usage

# API endpoint handler
await api_key_manager.create_api_key(
    user_id=current_user.get("keycloak_id"),  # UUID for database
    name=request.name,
)

# API response
return {
    "user_id": "user:alice",  # OpenFGA format for client
    "username": "alice"
}

❌ Incorrect Usage

# Don't use plain usernames without prefix
user_id = "alice"  # Wrong!

# Don't use wrong format for Keycloak
await keycloak.get_user(user_id="user:alice")  # Should use UUID

# Don't use UUID for OpenFGA
await openfga.check(user="8c7b4e5d-...")  # Should use user:alice format

Test Fixture Patterns

API Endpoint Tests

@pytest.fixture
def test_client(mock_current_user):
    from fastapi import FastAPI
    from fastapi.testclient import TestClient

    app = FastAPI()
    app.include_router(router)

    # Override auth dependency
    app.dependency_overrides[get_current_user] = lambda: mock_current_user

    return TestClient(app)

def test_endpoint(test_client, mock_current_user):
    response = test_client.get("/api/v1/resource")
    assert response.json()["user_id"] == mock_current_user["user_id"]

Admin Permission Tests

@pytest.fixture
def mock_admin_user(mock_current_user):
    """User with elevated permissions"""
    admin_user = mock_current_user.copy()
    admin_user["roles"] = ["admin"]
    return admin_user

@pytest.fixture
def admin_test_client(mock_sp_manager, mock_admin_user):
    # ... setup with mock_admin_user
    return TestClient(app)

Running Tests

Run All Tests

pytest

Run by Category

pytest -m unit           # Unit tests only
pytest -m api            # API tests only
pytest -m integration    # Integration tests only
pytest -m e2e            # E2E tests only (requires infrastructure)

Run Performance Benchmarks (CODEX Finding #4)

# Benchmarks are SKIPPED by default for faster iteration

# Option 1: Run with custom flag
pytest --run-benchmarks

# Option 2: Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Option 3: Exclude benchmarks explicitly
pytest -m "not benchmark"

# Compare benchmark results
pytest-benchmark compare 0001 0002

Why benchmarks are opt-in:

Benchmarks run 100 iterations for statistical accuracy (~30s per suite)
Normal development doesn’t need benchmark validation
CI explicitly runs benchmarks in dedicated job
90% faster test runs for everyday development

Run Tests Requiring CLI Tools (CODEX Finding #1)

# Tests requiring external CLI tools skip gracefully if tools not installed

# Kustomize tests (deployment validation)
pytest -m requires_kustomize
# Skips with message if kustomize not installed

# Kubectl tests (K8s integration)
pytest -m requires_kubectl
# Skips with message if kubectl not installed

# Helm tests (chart validation)
pytest -m requires_helm
# Skips with message if helm not installed

# Exclude CLI-dependent tests
pytest -m "not requires_kustomize and not requires_helm"

Installation instructions:

# Kustomize
curl -s https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh | bash

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Kubectl
# See: https://kubernetes.io/docs/tasks/tools/

Run by File

pytest tests/api/test_api_keys_endpoints.py
pytest tests/test_gdpr.py::TestGDPREndpoints

Run Specific Test

pytest tests/api/test_api_keys_endpoints.py::TestCreateAPIKey::test_create_api_key_success

Useful Flags

pytest -v              # Verbose output
pytest -x              # Stop on first failure
pytest --tb=short      # Short traceback format
pytest -k "api_key"    # Run tests matching pattern
pytest --lf            # Run last failed tests
pytest --co            # Show collected tests without running

E2E Infrastructure Setup

# CODEX FINDING #3: E2E tests now auto-run when docker is available
# No need to set TESTING=true anymore!

# Start test infrastructure
docker compose -f docker-compose.test.yml up -d

# Wait for services to be healthy (30-60s)
docker compose -f docker-compose.test.yml ps

# Run E2E tests (infrastructure check is automatic)
pytest -m e2e

# Cleanup
docker compose -f docker-compose.test.yml down -v

Run Meta-Validation (CODEX Finding #5)

# Comprehensive test suite validation
python scripts/validate_test_suite.py

# Strict mode (warnings treated as errors)
python scripts/validate_test_suite.py --strict

# Checks performed:
# - Marker consistency (no conflicting unit+integration markers)
# - Test naming conventions
# - Import guards for optional dependencies
# - CLI tool availability guards

Git Hooks and Validation

Updated: 2025-11-13 - Reorganized for developer productivity and CI parity This project uses a two-stage validation strategy to balance speed and comprehensiveness:

Pre-commit Hooks (Fast - < 30 seconds)

Runs on changed files only at commit time. Optimized for rapid iteration. What runs:

Auto-fixers: black, isort, trailing-whitespace, etc.
Fast linters: flake8, bandit, shellcheck
Critical safety: memory safety, fixture organization, async mock usage
File-specific validators: workflow syntax, MDX frontmatter

# Test pre-commit performance
echo "# test" >> README.md
git add README.md
git commit -m "test: verify pre-commit speed"
# Target: < 30 seconds

Pre-push Hooks (Comprehensive - 8-12 minutes)

Runs on all files before push. Matches CI validation exactly to eliminate surprises. 4-Phase Validation: Phase 1: Fast Checks (< 30s)

Lockfile validation (uv lock --check)
Workflow validation tests

Phase 2: Type Checking (1-2 min, warning only)

MyPy type checking (non-blocking)

Phase 3: Test Suite (3-5 min)

Unit tests: pytest tests/ -m unit -x --tb=short
Smoke tests: pytest tests/smoke/ -v --tb=short
Integration tests (last failed): pytest tests/integration/ -x --tb=short --lf
Property tests: HYPOTHESIS_PROFILE=ci pytest -m property -x --tb=short

Phase 4: Pre-commit Hooks (5-8 min)

All comprehensive validators (documentation, deployment, etc.)
Runs with --hook-stage pre-push flag

# Install hooks
make git-hooks
# Or: pre-commit install --hook-type pre-commit --hook-type pre-push

# Verify configuration
python scripts/validate_pre_push_hook.py

# Test push validation (full 4-phase suite)
git push
# Target: 8-12 minutes, matches CI exactly

Performance Monitoring

# Measure pre-commit performance
python scripts/measure_hook_performance.py --stage commit

# Measure pre-push performance
python scripts/measure_hook_performance.py --stage push

# Measure both stages
python scripts/measure_hook_performance.py --stage all

Expected Performance

Stage	Target	Files	Description
Pre-commit	< 30s	Changed only	Fast feedback for commits
Pre-push	8-12 min	All files	Comprehensive CI-equivalent validation

Benefits

Fast commits: 80-90% faster than before (2-5 min → 15-30s)
Zero surprises: Pre-push matches CI exactly
Early detection: Catches issues before push, not in CI
CI reliability: Expected 80%+ reduction in CI failures

Troubleshooting

If pre-commit hooks fail:

# See specific failures
pre-commit run --all-files

# Run specific hook
pre-commit run black --all-files

# Skip hooks (emergency only)
git commit --no-verify

If pre-push hooks fail:

# Run specific phase
uv run pytest tests/ -m unit  # Phase 3: Unit tests
uv run pytest tests/smoke/    # Phase 3: Smoke tests
pre-commit run --all-files --hook-stage pre-push  # Phase 4

# Skip pre-push (emergency only - will likely fail in CI!)
git push --no-verify

Documentation

Categorization: docs-internal/HOOK_CATEGORIZATION.md
Migration guide: docs-internal/PRE_COMMIT_PRE_PUSH_REORGANIZATION.md
Performance monitoring: scripts/measure_hook_performance.py

TDD Best Practices

1. Red-Green-Refactor Cycle

# RED: Write failing test first
def test_new_feature():
    result = new_function()
    assert result == expected_value  # Fails - function doesn't exist

# GREEN: Implement minimal code to pass
def new_function():
    return expected_value  # Passes

# REFACTOR: Improve implementation
def new_function():
    # Clean, efficient implementation
    return calculate_expected_value()

2. Test One Thing at a Time

# ✅ Good - Single assertion
def test_create_api_key_returns_id():
    result = create_api_key(...)
    assert "key_id" in result

def test_create_api_key_returns_secret():
    result = create_api_key(...)
    assert "api_key" in result

# ❌ Bad - Multiple unrelated assertions
def test_create_api_key():
    result = create_api_key(...)
    assert "key_id" in result
    assert "api_key" in result
    assert result["name"] == "Test"
    assert len(result["api_key"]) > 20

3. Use Exact Mock Assertions

# ✅ Good - Validates exact call
mock_manager.create_api_key.assert_called_once_with(
    user_id="8c7b4e5d-1234-5678-abcd-ef1234567890",
    name="Test Key",
    expires_days=365,
)

# ❌ Bad - Doesn't validate parameters
mock_manager.create_api_key.assert_called()

4. Arrange-Act-Assert Pattern

def test_endpoint():
    # Arrange - Set up test data
    user_data = {"name": "Alice", "email": "alice@example.com"}

    # Act - Perform the action
    response = client.post("/users", json=user_data)

    # Assert - Verify the outcome
    assert response.status_code == 201
    assert response.json()["name"] == "Alice"

5. Test Error Cases

def test_create_api_key_max_keys_exceeded(test_client, mock_api_key_manager):
    """Test API key creation when user has reached the limit"""
    mock_api_key_manager.create_api_key.side_effect = ValueError(
        "Maximum of 5 API keys allowed per user"
    )

    response = test_client.post("/api/v1/api-keys/", json={...})

    assert response.status_code == 400
    assert "Maximum of 5 API keys" in response.json()["detail"]

6. Use Descriptive Test Names

# ✅ Good - Clear what's being tested
def test_delete_user_account_without_confirmation():
    """Test deletion requires explicit confirmation"""

# ❌ Bad - Vague test name
def test_delete():
    """Test delete"""

Regression Test Patterns

Overview

Regression tests prevent fixed bugs from reoccurring by encoding the fix as a permanent test case. The project maintains a comprehensive suite of regression tests in tests/regression/ that encode fixes for real Codex findings and production issues.

Key Regression Test Categories

Category	Location	Purpose
pytest-xdist Isolation	`tests/regression/test_pytest_xdist_*.py`	Prevent parallel test pollution
Service Principal	`tests/regression/test_service_principal_*.py`	Auth isolation between tests
LangGraph Types	`tests/regression/test_langgraph_return_types.py`	Type safety for agent responses
FastAPI Auth	`tests/regression/test_fastapi_auth_override_*.py`	Dependency injection correctness
GDPR Isolation	`tests/regression/test_gdpr_singleton_*.py`	Privacy compliance isolation

Running Regression Tests

# Run all regression tests
pytest tests/regression/ -v

# Run specific regression category
pytest tests/regression/test_pytest_xdist*.py -v

# Include in CI pipeline
pytest -m regression

For detailed examples, see the test files in tests/regression/ which document the original issue and fix in their docstrings.

Last Updated: 2025-11-10 Status: ✅ Complete and current Recent Updates:

2025-11-10: Added comprehensive regression test patterns section documenting fixes for Codex findings
2025-01-05: Initial comprehensive testing guide

Getting Started

Core Concepts

Framework Comparisons

Security

Local Development

Testing

Contributing

Workflows

Troubleshooting

Integrations

Diagrams

​Testing Guide

​Table of Contents

​Test Organization

​Test Categories

​Unit Tests (@pytest.mark.unit)

​API Tests (@pytest.mark.api)

​Integration Tests (@pytest.mark.integration)

​E2E Tests (@pytest.mark.e2e)

​E2E Test Strategy Decision (2025-01-05)

​Deployment Configuration Tests (@pytest.mark.deployment)

​Performance & Benchmark Tests (@pytest.mark.benchmark)

​Fixture Standards

​Shared Fixtures (tests/conftest.py)

​mock_current_user (Function-scoped)

​test_container (Session-scoped)

​container (Function-scoped)

​Identity & Authentication

​User Identity Formats

​1. OpenFGA Format (Authorization)

​2. Keycloak UUID Format (Authentication)

​Examples

​✅ Correct Usage

​❌ Incorrect Usage

​Test Fixture Patterns

​API Endpoint Tests

​Admin Permission Tests

​Running Tests

​Run All Tests

​Run by Category

​Run Performance Benchmarks (CODEX Finding #4)

​Run Tests Requiring CLI Tools (CODEX Finding #1)

​Run by File

​Run Specific Test

​Useful Flags

​E2E Infrastructure Setup

​Run Meta-Validation (CODEX Finding #5)

​Git Hooks and Validation

​Pre-commit Hooks (Fast - < 30 seconds)

​Pre-push Hooks (Comprehensive - 8-12 minutes)

​Performance Monitoring

​Expected Performance

​Benefits

​Troubleshooting

​Documentation

​TDD Best Practices

​1. Red-Green-Refactor Cycle

​2. Test One Thing at a Time

​3. Use Exact Mock Assertions

​4. Arrange-Act-Assert Pattern

​5. Test Error Cases

​6. Use Descriptive Test Names

​Regression Test Patterns

​Overview

​Key Regression Test Categories

​Running Regression Tests

Testing Guide

Table of Contents

Test Organization

Test Categories

Unit Tests (`@pytest.mark.unit`)

API Tests (`@pytest.mark.api`)

Integration Tests (`@pytest.mark.integration`)

E2E Tests (`@pytest.mark.e2e`)

E2E Test Strategy Decision (2025-01-05)

Deployment Configuration Tests (`@pytest.mark.deployment`)

Performance & Benchmark Tests (`@pytest.mark.benchmark`)

Fixture Standards

Shared Fixtures (`tests/conftest.py`)

`mock_current_user` (Function-scoped)

`test_container` (Session-scoped)

`container` (Function-scoped)

Identity & Authentication

User Identity Formats

1. OpenFGA Format (Authorization)

2. Keycloak UUID Format (Authentication)

Examples

✅ Correct Usage

❌ Incorrect Usage

Test Fixture Patterns

API Endpoint Tests

Admin Permission Tests

Running Tests

Run All Tests

Run by Category

Run Performance Benchmarks (CODEX Finding #4)

Run Tests Requiring CLI Tools (CODEX Finding #1)

Run by File

Run Specific Test

Useful Flags

E2E Infrastructure Setup

Run Meta-Validation (CODEX Finding #5)

Git Hooks and Validation

Pre-commit Hooks (Fast - < 30 seconds)

Pre-push Hooks (Comprehensive - 8-12 minutes)

Performance Monitoring

Expected Performance

Benefits

Troubleshooting

Documentation

TDD Best Practices

1. Red-Green-Refactor Cycle

2. Test One Thing at a Time

3. Use Exact Mock Assertions

4. Arrange-Act-Assert Pattern

5. Test Error Cases

6. Use Descriptive Test Names

Regression Test Patterns

Overview

Key Regression Test Categories

Running Regression Tests