Skip to main content

Testing Guide

This document describes the testing strategy, conventions, and best practices for the MCP Server LangGraph project.

Table of Contents


Test Organization

tests/
├── api/                        # API endpoint tests
│   ├── test_api_keys_endpoints.py
│   ├── test_service_principals_endpoints.py
│   └── test_openapi_compliance.py
├── e2e/                        # End-to-end journey tests
│   ├── test_full_user_journey.py
│   ├── test_scim_provisioning.py
│   └── helpers.py              # E2E test helpers
├── conftest.py                 # Shared fixtures
├── test_gdpr.py               # GDPR compliance tests
└── test_auth.py               # Authentication tests

Test Categories

Unit Tests (@pytest.mark.unit)

  • Purpose: Test individual components in isolation
  • Dependencies: Mocked
  • Speed: Fast (<1s per test)
  • Examples: Model validation, business logic, utility functions

API Tests (@pytest.mark.api)

  • Purpose: Test REST API endpoints with mocked dependencies
  • Dependencies: FastAPI TestClient with dependency injection overrides
  • Speed: Fast (<1s per test)
  • Examples: API key endpoints, service principal endpoints, GDPR endpoints

Integration Tests (@pytest.mark.integration)

  • Purpose: Test multiple components working together
  • Dependencies: Some real, some mocked
  • Speed: Medium (1-5s per test)
  • Examples: Database + business logic, external API integrations

E2E Tests (@pytest.mark.e2e)

  • Purpose: Test complete user journeys
  • Infrastructure: Requires docker-compose.test.yml services
  • Speed: Slow (5-30s per test)
  • Current Status: Transitioning from mocks to real infrastructure

E2E Test Strategy Decision (2025-01-05)

Decision: Migrate E2E tests from mocks to real infrastructure Rationale:
  1. Infrastructure validation already exists (test_infrastructure_check)
  2. Mock-based E2E tests provide false confidence
  3. Real E2E tests catch integration issues that unit tests miss
Implementation Plan:
  • ✅ Phase 1: Document current state (this file)
  • 🔄 Phase 2: Remove mock dependencies from E2E tests (future)
  • 🔄 Phase 3: Implement real Keycloak/OpenFGA integration (future)
  • 🔄 Phase 4: Remove @pytest.mark.skip from journey tests (future)
Current State:
  • Tests marked as @pytest.mark.e2e but use mocks (lines 116-131 in test_full_user_journey.py)
  • Infrastructure check validates real services (PostgreSQL, Redis, OpenFGA, Keycloak, Qdrant)
  • Comment on line 115: “Use HTTP mock until real Keycloak is implemented”
Recommendation: Keep mocks for now, migrate incrementally as infrastructure matures

Deployment Configuration Tests (@pytest.mark.deployment)

  • Purpose: Validate Helm charts, Kustomize overlays, and deployment manifests
  • Dependencies: File system, helm/kustomize CLI tools (optional)
  • Speed: Fast (<1s per test)
  • Examples: Secret key alignment, CORS security, version consistency
  • Location: tests/deployment/
Test Coverage (11 tests, 91% coverage):
  • ✅ Helm secret template validation (missing keys detection)
  • ✅ CORS security (prevents wildcard + credentials vulnerability)
  • ✅ Hard-coded credential detection
  • ✅ Placeholder validation (YOUR_PROJECT_ID, REPLACE_ME, example.com)
  • ✅ ExternalSecrets key alignment
  • ✅ Namespace consistency across overlays
  • ✅ Version consistency across deployment methods
  • ✅ Resource limits and security contexts
  • ✅ Pod security standards compliance
Running Deployment Tests:
# Run all deployment configuration tests
pytest tests/deployment/ -v

# Run unit tests (no helm/kustomize required)
pytest tests/deployment/test_helm_configuration.py -v

# Run E2E tests (requires helm/kustomize installed)
pytest tests/deployment/test_deployment_e2e.py -v

# Validate all deployment configs (comprehensive script)
./scripts/validate-deployments.sh

# Check deployed cluster health (requires kubectl)
./scripts/check-deployment-health.sh production-mcp-server-langgraph mcp
Pre-commit Validation: Deployment tests run automatically on commit via pre-commit hooks:
  • validate-deployment-secrets - Secret key alignment
  • validate-cors-security - CORS configuration safety
  • check-hardcoded-credentials - Credential exposure prevention
  • validate-redis-password-required - Redis authentication enforcement
  • check-dangerous-placeholders - Placeholder leak detection
CI/CD Integration: GitHub Actions workflow (.github/workflows/validate-deployments.yml) runs on every PR:
  • Helm chart linting and template rendering
  • Kustomize build validation across 5 environments (matrix)
  • YAML syntax validation
  • Security scanning (gitleaks, CORS, placeholders)
  • Version consistency checks
See adr/adr-0046-deployment-configuration-tdd-infrastructure.md for full details.

Performance & Benchmark Tests (@pytest.mark.benchmark)

  • Purpose: Validate system performance and detect regressions
  • Dependencies: pytest-benchmark plugin
  • Speed: Slow (100 iterations, ~30s per benchmark suite)
  • Examples: JWT encoding/decoding, OpenFGA authorization, LLM request handling
  • Location: tests/performance/
CODEX FINDINGS #2 & #4: Performance Optimizations Finding #2: Timeout Test Performance
  • Problem: Timeout tests used real asyncio.sleep(5-10s), burning ~15s per run
  • Solution: Reduced to 0.05-0.3s sleeps (100x faster, same behavior)
  • Impact: Tests now complete in ~6s instead of 15s (60% speedup)
  • Validation: Meta-tests prevent regression (test_performance_regression.py)
Finding #4: Benchmark Opt-In Model
  • Problem: Benchmarks ran by default, slowing everyday test runs
  • Solution: Benchmarks now skip unless explicitly requested
  • Impact: 90% faster test runs (benchmarks skipped in 0.57s vs 30+ seconds)
  • Usage: See “Run Performance Benchmarks” section above
Running Performance Tests:
# Run benchmarks (opt-in)
pytest --run-benchmarks

# Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Exclude benchmarks (default behavior)
pytest  # Benchmarks automatically skipped

# Compare benchmark results over time
pytest-benchmark compare 0001 0002

# View benchmark history
pytest-benchmark list
Benchmark Test Structure:
from tests.performance.conftest import PercentileBenchmark

@pytest.mark.benchmark
class TestMyBenchmarks:
    def test_operation_performance(self, percentile_benchmark):
        """
        Benchmark operation with percentile-based assertions.

        Uses 100 iterations for statistical accuracy.
        """
        result = percentile_benchmark(my_operation, arg1, arg2)

        # Assert p95 latency < 10ms (more stable than mean)
        percentile_benchmark.assert_percentile(95, 0.010)

        # Assert p99 latency < 15ms
        percentile_benchmark.assert_percentile(99, 0.015)
Why percentile-based assertions?
  • More stable than mean (resistant to outliers)
  • Better reflects user experience (p95/p99 SLA targets)
  • Industry standard for performance testing

Fixture Standards

Shared Fixtures (tests/conftest.py)

mock_current_user (Function-scoped)

Standard authenticated user fixture for API tests.
{
    "user_id": "user:alice",              # OpenFGA format
    "keycloak_id": "8c7b4e5d-...",       # Keycloak UUID
    "username": "alice",
    "email": "alice@example.com"
}
Usage:
def test_my_endpoint(test_client, mock_current_user):
    # Test client already has auth override
    response = test_client.get("/api/v1/resource")
    assert response.status_code == 200

test_container (Session-scoped)

Dependency injection container for test environment.
@pytest.fixture(scope="session")
def test_container():
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()
Features:
  • No-op telemetry (no output)
  • No-op auth (accepts any token)
  • In-memory storage
  • No global side effects

container (Function-scoped)

Per-test container for isolated testing.
@pytest.fixture
def container(test_container):
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()

Identity & Authentication

User Identity Formats

The system uses dual identity formats for compatibility:

1. OpenFGA Format (Authorization)

"user:alice"  # Format: user:{username}
  • Used for: OpenFGA tuples, API responses, authorization checks
  • Best Practice: Always use this format for user_id fields in API responses

2. Keycloak UUID Format (Authentication)

"8c7b4e5d-1234-5678-abcd-ef1234567890"
  • Used for: Keycloak Admin API calls, internal database keys
  • Best Practice: Use this format for keycloak_id when interacting with Keycloak

Examples

✅ Correct Usage

# API endpoint handler
await api_key_manager.create_api_key(
    user_id=current_user.get("keycloak_id"),  # UUID for database
    name=request.name,
)

# API response
return {
    "user_id": "user:alice",  # OpenFGA format for client
    "username": "alice"
}

❌ Incorrect Usage

# Don't use plain usernames without prefix
user_id = "alice"  # Wrong!

# Don't use wrong format for Keycloak
await keycloak.get_user(user_id="user:alice")  # Should use UUID

# Don't use UUID for OpenFGA
await openfga.check(user="8c7b4e5d-...")  # Should use user:alice format

Test Fixture Patterns

API Endpoint Tests

@pytest.fixture
def test_client(mock_current_user):
    from fastapi import FastAPI
    from fastapi.testclient import TestClient

    app = FastAPI()
    app.include_router(router)

    # Override auth dependency
    app.dependency_overrides[get_current_user] = lambda: mock_current_user

    return TestClient(app)

def test_endpoint(test_client, mock_current_user):
    response = test_client.get("/api/v1/resource")
    assert response.json()["user_id"] == mock_current_user["user_id"]

Admin Permission Tests

@pytest.fixture
def mock_admin_user(mock_current_user):
    """User with elevated permissions"""
    admin_user = mock_current_user.copy()
    admin_user["roles"] = ["admin"]
    return admin_user

@pytest.fixture
def admin_test_client(mock_sp_manager, mock_admin_user):
    # ... setup with mock_admin_user
    return TestClient(app)

Running Tests

Run All Tests

pytest

Run by Category

pytest -m unit           # Unit tests only
pytest -m api            # API tests only
pytest -m integration    # Integration tests only
pytest -m e2e            # E2E tests only (requires infrastructure)

Run Performance Benchmarks (CODEX Finding #4)

# Benchmarks are SKIPPED by default for faster iteration

# Option 1: Run with custom flag
pytest --run-benchmarks

# Option 2: Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Option 3: Exclude benchmarks explicitly
pytest -m "not benchmark"

# Compare benchmark results
pytest-benchmark compare 0001 0002
Why benchmarks are opt-in:
  • Benchmarks run 100 iterations for statistical accuracy (~30s per suite)
  • Normal development doesn’t need benchmark validation
  • CI explicitly runs benchmarks in dedicated job
  • 90% faster test runs for everyday development

Run Tests Requiring CLI Tools (CODEX Finding #1)

# Tests requiring external CLI tools skip gracefully if tools not installed

# Kustomize tests (deployment validation)
pytest -m requires_kustomize
# Skips with message if kustomize not installed

# Kubectl tests (K8s integration)
pytest -m requires_kubectl
# Skips with message if kubectl not installed

# Helm tests (chart validation)
pytest -m requires_helm
# Skips with message if helm not installed

# Exclude CLI-dependent tests
pytest -m "not requires_kustomize and not requires_helm"
Installation instructions:
# Kustomize
curl -s https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh | bash

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Kubectl
# See: https://kubernetes.io/docs/tasks/tools/

Run by File

pytest tests/api/test_api_keys_endpoints.py
pytest tests/test_gdpr.py::TestGDPREndpoints

Run Specific Test

pytest tests/api/test_api_keys_endpoints.py::TestCreateAPIKey::test_create_api_key_success

Useful Flags

pytest -v              # Verbose output
pytest -x              # Stop on first failure
pytest --tb=short      # Short traceback format
pytest -k "api_key"    # Run tests matching pattern
pytest --lf            # Run last failed tests
pytest --co            # Show collected tests without running

E2E Infrastructure Setup

# CODEX FINDING #3: E2E tests now auto-run when docker is available
# No need to set TESTING=true anymore!

# Start test infrastructure
docker compose -f docker-compose.test.yml up -d

# Wait for services to be healthy (30-60s)
docker compose -f docker-compose.test.yml ps

# Run E2E tests (infrastructure check is automatic)
pytest -m e2e

# Cleanup
docker compose -f docker-compose.test.yml down -v

Run Meta-Validation (CODEX Finding #5)

# Comprehensive test suite validation
python scripts/validate_test_suite.py

# Strict mode (warnings treated as errors)
python scripts/validate_test_suite.py --strict

# Checks performed:
# - Marker consistency (no conflicting unit+integration markers)
# - Test naming conventions
# - Import guards for optional dependencies
# - CLI tool availability guards

Git Hooks and Validation

Updated: 2025-11-13 - Reorganized for developer productivity and CI parity This project uses a two-stage validation strategy to balance speed and comprehensiveness:

Pre-commit Hooks (Fast - < 30 seconds)

Runs on changed files only at commit time. Optimized for rapid iteration. What runs:
  • Auto-fixers: black, isort, trailing-whitespace, etc.
  • Fast linters: flake8, bandit, shellcheck
  • Critical safety: memory safety, fixture organization, async mock usage
  • File-specific validators: workflow syntax, MDX frontmatter
# Test pre-commit performance
echo "# test" >> README.md
git add README.md
git commit -m "test: verify pre-commit speed"
# Target: < 30 seconds

Pre-push Hooks (Comprehensive - 8-12 minutes)

Runs on all files before push. Matches CI validation exactly to eliminate surprises. 4-Phase Validation: Phase 1: Fast Checks (< 30s)
  • Lockfile validation (uv lock --check)
  • Workflow validation tests
Phase 2: Type Checking (1-2 min, warning only)
  • MyPy type checking (non-blocking)
Phase 3: Test Suite (3-5 min)
  • Unit tests: pytest tests/ -m unit -x --tb=short
  • Smoke tests: pytest tests/smoke/ -v --tb=short
  • Integration tests (last failed): pytest tests/integration/ -x --tb=short --lf
  • Property tests: HYPOTHESIS_PROFILE=ci pytest -m property -x --tb=short
Phase 4: Pre-commit Hooks (5-8 min)
  • All comprehensive validators (documentation, deployment, etc.)
  • Runs with --hook-stage pre-push flag
# Install hooks
make git-hooks
# Or: pre-commit install --hook-type pre-commit --hook-type pre-push

# Verify configuration
python scripts/validate_pre_push_hook.py

# Test push validation (full 4-phase suite)
git push
# Target: 8-12 minutes, matches CI exactly

Performance Monitoring

# Measure pre-commit performance
python scripts/measure_hook_performance.py --stage commit

# Measure pre-push performance
python scripts/measure_hook_performance.py --stage push

# Measure both stages
python scripts/measure_hook_performance.py --stage all

Expected Performance

StageTargetFilesDescription
Pre-commit< 30sChanged onlyFast feedback for commits
Pre-push8-12 minAll filesComprehensive CI-equivalent validation

Benefits

  • Fast commits: 80-90% faster than before (2-5 min → 15-30s)
  • Zero surprises: Pre-push matches CI exactly
  • Early detection: Catches issues before push, not in CI
  • CI reliability: Expected 80%+ reduction in CI failures

Troubleshooting

If pre-commit hooks fail:
# See specific failures
pre-commit run --all-files

# Run specific hook
pre-commit run black --all-files

# Skip hooks (emergency only)
git commit --no-verify
If pre-push hooks fail:
# Run specific phase
uv run pytest tests/ -m unit  # Phase 3: Unit tests
uv run pytest tests/smoke/    # Phase 3: Smoke tests
pre-commit run --all-files --hook-stage pre-push  # Phase 4

# Skip pre-push (emergency only - will likely fail in CI!)
git push --no-verify

Documentation

  • Categorization: docs-internal/HOOK_CATEGORIZATION.md
  • Migration guide: docs-internal/PRE_COMMIT_PRE_PUSH_REORGANIZATION.md
  • Performance monitoring: scripts/measure_hook_performance.py

TDD Best Practices

1. Red-Green-Refactor Cycle

# RED: Write failing test first
def test_new_feature():
    result = new_function()
    assert result == expected_value  # Fails - function doesn't exist

# GREEN: Implement minimal code to pass
def new_function():
    return expected_value  # Passes

# REFACTOR: Improve implementation
def new_function():
    # Clean, efficient implementation
    return calculate_expected_value()

2. Test One Thing at a Time

# ✅ Good - Single assertion
def test_create_api_key_returns_id():
    result = create_api_key(...)
    assert "key_id" in result

def test_create_api_key_returns_secret():
    result = create_api_key(...)
    assert "api_key" in result

# ❌ Bad - Multiple unrelated assertions
def test_create_api_key():
    result = create_api_key(...)
    assert "key_id" in result
    assert "api_key" in result
    assert result["name"] == "Test"
    assert len(result["api_key"]) > 20

3. Use Exact Mock Assertions

# ✅ Good - Validates exact call
mock_manager.create_api_key.assert_called_once_with(
    user_id="8c7b4e5d-1234-5678-abcd-ef1234567890",
    name="Test Key",
    expires_days=365,
)

# ❌ Bad - Doesn't validate parameters
mock_manager.create_api_key.assert_called()

4. Arrange-Act-Assert Pattern

def test_endpoint():
    # Arrange - Set up test data
    user_data = {"name": "Alice", "email": "alice@example.com"}

    # Act - Perform the action
    response = client.post("/users", json=user_data)

    # Assert - Verify the outcome
    assert response.status_code == 201
    assert response.json()["name"] == "Alice"

5. Test Error Cases

def test_create_api_key_max_keys_exceeded(test_client, mock_api_key_manager):
    """Test API key creation when user has reached the limit"""
    mock_api_key_manager.create_api_key.side_effect = ValueError(
        "Maximum of 5 API keys allowed per user"
    )

    response = test_client.post("/api/v1/api-keys/", json={...})

    assert response.status_code == 400
    assert "Maximum of 5 API keys" in response.json()["detail"]

6. Use Descriptive Test Names

# ✅ Good - Clear what's being tested
def test_delete_user_account_without_confirmation():
    """Test deletion requires explicit confirmation"""

# ❌ Bad - Vague test name
def test_delete():
    """Test delete"""

Regression Test Patterns

Overview

Regression tests prevent fixed bugs from reoccurring by encoding the fix as a permanent test case. The project maintains a comprehensive suite of regression tests in tests/regression/ that encode fixes for real Codex findings and production issues.

Key Regression Test Categories

CategoryLocationPurpose
pytest-xdist Isolationtests/regression/test_pytest_xdist_*.pyPrevent parallel test pollution
Service Principaltests/regression/test_service_principal_*.pyAuth isolation between tests
LangGraph Typestests/regression/test_langgraph_return_types.pyType safety for agent responses
FastAPI Authtests/regression/test_fastapi_auth_override_*.pyDependency injection correctness
GDPR Isolationtests/regression/test_gdpr_singleton_*.pyPrivacy compliance isolation

Running Regression Tests

# Run all regression tests
pytest tests/regression/ -v

# Run specific regression category
pytest tests/regression/test_pytest_xdist*.py -v

# Include in CI pipeline
pytest -m regression
For detailed examples, see the test files in tests/regression/ which document the original issue and fix in their docstrings.
Last Updated: 2025-11-10 Status: ✅ Complete and current Recent Updates:
  • 2025-11-10: Added comprehensive regression test patterns section documenting fixes for Codex findings
  • 2025-01-05: Initial comprehensive testing guide