Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Testing Guide

This document describes the testing strategy, conventions, and best practices for the MCP Server LangGraph project.

Table of Contents


Test Organization

tests/
├── api/                        # API endpoint tests
│   ├── test_api_keys_endpoints.py
│   ├── test_service_principals_endpoints.py
│   └── test_openapi_compliance.py
├── e2e/                        # End-to-end journey tests
│   ├── test_full_user_journey.py
│   ├── test_scim_provisioning.py
│   └── helpers.py              # E2E test helpers
├── conftest.py                 # Shared fixtures
├── test_gdpr.py               # GDPR compliance tests
└── test_auth.py               # Authentication tests

Test Categories

Unit Tests (@pytest.mark.unit)

  • Purpose: Test individual components in isolation
  • Dependencies: Mocked
  • Speed: Fast (<1s per test)
  • Examples: Model validation, business logic, utility functions

API Tests (@pytest.mark.api)

  • Purpose: Test REST API endpoints with mocked dependencies
  • Dependencies: FastAPI TestClient with dependency injection overrides
  • Speed: Fast (<1s per test)
  • Examples: API key endpoints, service principal endpoints, GDPR endpoints

Integration Tests (@pytest.mark.integration)

  • Purpose: Test multiple components working together
  • Dependencies: Some real, some mocked
  • Speed: Medium (1-5s per test)
  • Examples: Database + business logic, external API integrations

E2E Tests (@pytest.mark.e2e)

  • Purpose: Test complete user journeys
  • Infrastructure: Requires docker-compose.test.yml services
  • Speed: Slow (5-30s per test)
  • Current Status: Transitioning from mocks to real infrastructure

E2E Test Strategy Decision (2025-01-05)

Decision: Migrate E2E tests from mocks to real infrastructure Rationale:
  1. Infrastructure validation already exists (test_infrastructure_check)
  2. Mock-based E2E tests provide false confidence
  3. Real E2E tests catch integration issues that unit tests miss
Implementation Plan:
  • ✅ Phase 1: Document current state (this file)
  • 🔄 Phase 2: Remove mock dependencies from E2E tests (future)
  • 🔄 Phase 3: Implement real Keycloak/OpenFGA integration (future)
  • 🔄 Phase 4: Remove @pytest.mark.skip from journey tests (future)
Current State:
  • Tests marked as @pytest.mark.e2e but use mocks (lines 116-131 in test_full_user_journey.py)
  • Infrastructure check validates real services (PostgreSQL, Redis, OpenFGA, Keycloak, Qdrant)
  • Comment on line 115: “Use HTTP mock until real Keycloak is implemented”
Recommendation: Keep mocks for now, migrate incrementally as infrastructure matures

Deployment Configuration Tests (@pytest.mark.deployment)

  • Purpose: Validate Helm charts, Kustomize overlays, and deployment manifests
  • Dependencies: File system, helm/kustomize CLI tools (optional)
  • Speed: Fast (<1s per test)
  • Examples: Secret key alignment, CORS security, version consistency
  • Location: tests/deployment/
Test Coverage (11 tests, 91% coverage):
  • ✅ Helm secret template validation (missing keys detection)
  • ✅ CORS security (prevents wildcard + credentials vulnerability)
  • ✅ Hard-coded credential detection
  • ✅ Placeholder validation (YOUR_PROJECT_ID, REPLACE_ME, example.com)
  • ✅ ExternalSecrets key alignment
  • ✅ Namespace consistency across overlays
  • ✅ Version consistency across deployment methods
  • ✅ Resource limits and security contexts
  • ✅ Pod security standards compliance
Running Deployment Tests:
# Run all deployment configuration tests
pytest tests/deployment/ -v

# Run unit tests (no helm/kustomize required)
pytest tests/deployment/test_helm_configuration.py -v

# Run E2E tests (requires helm/kustomize installed)
pytest tests/deployment/test_deployment_e2e.py -v

# Validate all deployment configs (comprehensive script)
./scripts/validate-deployments.sh

# Check deployed cluster health (requires kubectl)
./scripts/check-deployment-health.sh production-mcp-server-langgraph mcp
Pre-commit Validation: Deployment tests run automatically on commit via pre-commit hooks:
  • validate-deployment-secrets - Secret key alignment
  • validate-cors-security - CORS configuration safety
  • check-hardcoded-credentials - Credential exposure prevention
  • validate-redis-password-required - Redis authentication enforcement
  • check-dangerous-placeholders - Placeholder leak detection
CI/CD Integration: GitHub Actions workflow (.github/workflows/validate-deployments.yml) runs on every PR:
  • Helm chart linting and template rendering
  • Kustomize build validation across 5 environments (matrix)
  • YAML syntax validation
  • Security scanning (gitleaks, CORS, placeholders)
  • Version consistency checks
See adr/adr-0046-deployment-configuration-tdd-infrastructure.md for full details.

Performance & Benchmark Tests (@pytest.mark.benchmark)

  • Purpose: Validate system performance and detect regressions
  • Dependencies: pytest-benchmark plugin
  • Speed: Slow (100 iterations, ~30s per benchmark suite)
  • Examples: JWT encoding/decoding, OpenFGA authorization, LLM request handling
  • Location: tests/performance/
CODEX FINDINGS #2 & #4: Performance Optimizations Finding #2: Timeout Test Performance
  • Problem: Timeout tests used real asyncio.sleep(5-10s), burning ~15s per run
  • Solution: Reduced to 0.05-0.3s sleeps (100x faster, same behavior)
  • Impact: Tests now complete in ~6s instead of 15s (60% speedup)
  • Validation: Meta-tests prevent regression (test_performance_regression.py)
Finding #4: Benchmark Opt-In Model
  • Problem: Benchmarks ran by default, slowing everyday test runs
  • Solution: Benchmarks now skip unless explicitly requested
  • Impact: 90% faster test runs (benchmarks skipped in 0.57s vs 30+ seconds)
  • Usage: See “Run Performance Benchmarks” section above
Running Performance Tests:
# Run benchmarks (opt-in)
pytest --run-benchmarks

# Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Exclude benchmarks (default behavior)
pytest  # Benchmarks automatically skipped

# Compare benchmark results over time
pytest-benchmark compare 0001 0002

# View benchmark history
pytest-benchmark list
Benchmark Test Structure:
from tests.performance.conftest import PercentileBenchmark

@pytest.mark.benchmark
class TestMyBenchmarks:
    def test_operation_performance(self, percentile_benchmark):
        """
        Benchmark operation with percentile-based assertions.

        Uses 100 iterations for statistical accuracy.
        """
        result = percentile_benchmark(my_operation, arg1, arg2)

        # Assert p95 latency < 10ms (more stable than mean)
        percentile_benchmark.assert_percentile(95, 0.010)

        # Assert p99 latency < 15ms
        percentile_benchmark.assert_percentile(99, 0.015)
Why percentile-based assertions?
  • More stable than mean (resistant to outliers)
  • Better reflects user experience (p95/p99 SLA targets)
  • Industry standard for performance testing

Fixture Standards

Shared Fixtures (tests/conftest.py)

mock_current_user (Function-scoped)

Standard authenticated user fixture for API tests.
{
    "user_id": "user:alice",              # OpenFGA format
    "keycloak_id": "8c7b4e5d-...",       # Keycloak UUID
    "username": "alice",
    "email": "alice@example.com"
}
Usage:
def test_my_endpoint(test_client, mock_current_user):
    # Test client already has auth override
    response = test_client.get("/api/v1/resource")
    assert response.status_code == 200

test_container (Session-scoped)

Dependency injection container for test environment.
@pytest.fixture(scope="session")
def test_container():
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()
Features:
  • No-op telemetry (no output)
  • No-op auth (accepts any token)
  • In-memory storage
  • No global side effects

container (Function-scoped)

Per-test container for isolated testing.
@pytest.fixture
def container(test_container):
    from mcp_server_langgraph.core.container import create_test_container
    return create_test_container()

Identity & Authentication

User Identity Formats

The system uses dual identity formats for compatibility:

1. OpenFGA Format (Authorization)

"user:alice"  # Format: user:{username}
  • Used for: OpenFGA tuples, API responses, authorization checks
  • Best Practice: Always use this format for user_id fields in API responses

2. Keycloak UUID Format (Authentication)

"8c7b4e5d-1234-5678-abcd-ef1234567890"
  • Used for: Keycloak Admin API calls, internal database keys
  • Best Practice: Use this format for keycloak_id when interacting with Keycloak

Examples

✅ Correct Usage

# API endpoint handler
await api_key_manager.create_api_key(
    user_id=current_user.get("keycloak_id"),  # UUID for database
    name=request.name,
)

# API response
return {
    "user_id": "user:alice",  # OpenFGA format for client
    "username": "alice"
}

❌ Incorrect Usage

# Don't use plain usernames without prefix
user_id = "alice"  # Wrong!

# Don't use wrong format for Keycloak
await keycloak.get_user(user_id="user:alice")  # Should use UUID

# Don't use UUID for OpenFGA
await openfga.check(user="8c7b4e5d-...")  # Should use user:alice format

Test Fixture Patterns

API Endpoint Tests

@pytest.fixture
def test_client(mock_current_user):
    from fastapi import FastAPI
    from fastapi.testclient import TestClient

    app = FastAPI()
    app.include_router(router)

    # Override auth dependency
    app.dependency_overrides[get_current_user] = lambda: mock_current_user

    return TestClient(app)

def test_endpoint(test_client, mock_current_user):
    response = test_client.get("/api/v1/resource")
    assert response.json()["user_id"] == mock_current_user["user_id"]

Admin Permission Tests

@pytest.fixture
def mock_admin_user(mock_current_user):
    """User with elevated permissions"""
    admin_user = mock_current_user.copy()
    admin_user["roles"] = ["admin"]
    return admin_user

@pytest.fixture
def admin_test_client(mock_sp_manager, mock_admin_user):
    # ... setup with mock_admin_user
    return TestClient(app)

Running Tests

Run All Tests

pytest

Run by Category

pytest -m unit           # Unit tests only
pytest -m api            # API tests only
pytest -m integration    # Integration tests only
pytest -m e2e            # E2E tests only (requires infrastructure)

Run Performance Benchmarks (CODEX Finding #4)

# Benchmarks are SKIPPED by default for faster iteration

# Option 1: Run with custom flag
pytest --run-benchmarks

# Option 2: Run only benchmarks (CI pattern)
pytest -m benchmark --benchmark-only

# Option 3: Exclude benchmarks explicitly
pytest -m "not benchmark"

# Compare benchmark results
pytest-benchmark compare 0001 0002
Why benchmarks are opt-in:
  • Benchmarks run 100 iterations for statistical accuracy (~30s per suite)
  • Normal development doesn’t need benchmark validation
  • CI explicitly runs benchmarks in dedicated job
  • 90% faster test runs for everyday development

Run Tests Requiring CLI Tools (CODEX Finding #1)

# Tests requiring external CLI tools skip gracefully if tools not installed

# Kustomize tests (deployment validation)
pytest -m requires_kustomize
# Skips with message if kustomize not installed

# Kubectl tests (K8s integration)
pytest -m requires_kubectl
# Skips with message if kubectl not installed

# Helm tests (chart validation)
pytest -m requires_helm
# Skips with message if helm not installed

# Exclude CLI-dependent tests
pytest -m "not requires_kustomize and not requires_helm"
Installation instructions:
# Kustomize
curl -s https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh | bash

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Kubectl
# See: https://kubernetes.io/docs/tasks/tools/

Run by File

pytest tests/api/test_api_keys_endpoints.py
pytest tests/test_gdpr.py::TestGDPREndpoints

Run Specific Test

pytest tests/api/test_api_keys_endpoints.py::TestCreateAPIKey::test_create_api_key_success

Useful Flags

pytest -v              # Verbose output
pytest -x              # Stop on first failure
pytest --tb=short      # Short traceback format
pytest -k "api_key"    # Run tests matching pattern
pytest --lf            # Run last failed tests
pytest --co            # Show collected tests without running

E2E Infrastructure Setup

# CODEX FINDING #3: E2E tests now auto-run when docker is available
# No need to set TESTING=true anymore!

# Start test infrastructure
docker compose -f docker-compose.test.yml up -d

# Wait for services to be healthy (30-60s)
docker compose -f docker-compose.test.yml ps

# Run E2E tests (infrastructure check is automatic)
pytest -m e2e

# Cleanup
docker compose -f docker-compose.test.yml down -v

Run Meta-Validation (CODEX Finding #5)

# Comprehensive test suite validation
python scripts/validate_test_suite.py

# Strict mode (warnings treated as errors)
python scripts/validate_test_suite.py --strict

# Checks performed:
# - Marker consistency (no conflicting unit+integration markers)
# - Test naming conventions
# - Import guards for optional dependencies
# - CLI tool availability guards

Git Hooks and Validation

Updated: 2025-11-13 - Reorganized for developer productivity and CI parity This project uses a two-stage validation strategy to balance speed and comprehensiveness:

Pre-commit Hooks (Fast - < 30 seconds)

Runs on changed files only at commit time. Optimized for rapid iteration. What runs:
  • Auto-fixers: black, isort, trailing-whitespace, etc.
  • Fast linters: flake8, bandit, shellcheck
  • Critical safety: memory safety, fixture organization, async mock usage
  • File-specific validators: workflow syntax, MDX frontmatter
# Test pre-commit performance
echo "# test" >> README.md
git add README.md
git commit -m "test: verify pre-commit speed"
# Target: < 30 seconds

Pre-push Hooks (Comprehensive - 8-12 minutes)

Runs on all files before push. Matches CI validation exactly to eliminate surprises. 4-Phase Validation: Phase 1: Fast Checks (< 30s)
  • Lockfile validation (uv lock --check)
  • Workflow validation tests
Phase 2: Type Checking (1-2 min, warning only)
  • MyPy type checking (non-blocking)
Phase 3: Test Suite (3-5 min)
  • Unit tests: pytest tests/ -m unit -x --tb=short
  • Smoke tests: pytest tests/smoke/ -v --tb=short
  • Integration tests (last failed): pytest tests/integration/ -x --tb=short --lf
  • Property tests: HYPOTHESIS_PROFILE=ci pytest -m property -x --tb=short
Phase 4: Pre-commit Hooks (5-8 min)
  • All comprehensive validators (documentation, deployment, etc.)
  • Runs with --hook-stage pre-push flag
# Install hooks
make git-hooks
# Or: pre-commit install --hook-type pre-commit --hook-type pre-push

# Verify configuration
python scripts/validate_pre_push_hook.py

# Test push validation (full 4-phase suite)
git push
# Target: 8-12 minutes, matches CI exactly

Performance Monitoring

# Measure pre-commit performance
python scripts/measure_hook_performance.py --stage commit

# Measure pre-push performance
python scripts/measure_hook_performance.py --stage push

# Measure both stages
python scripts/measure_hook_performance.py --stage all

Expected Performance

StageTargetFilesDescription
Pre-commit< 30sChanged onlyFast feedback for commits
Pre-push8-12 minAll filesComprehensive CI-equivalent validation

Benefits

  • Fast commits: 80-90% faster than before (2-5 min → 15-30s)
  • Zero surprises: Pre-push matches CI exactly
  • Early detection: Catches issues before push, not in CI
  • CI reliability: Expected 80%+ reduction in CI failures

Troubleshooting

If pre-commit hooks fail:
# See specific failures
pre-commit run --all-files

# Run specific hook
pre-commit run black --all-files

# Skip hooks (emergency only)
git commit --no-verify
If pre-push hooks fail:
# Run specific phase
uv run pytest tests/ -m unit  # Phase 3: Unit tests
uv run pytest tests/smoke/    # Phase 3: Smoke tests
pre-commit run --all-files --hook-stage pre-push  # Phase 4

# Skip pre-push (emergency only - will likely fail in CI!)
git push --no-verify

Documentation

  • Categorization: docs-internal/HOOK_CATEGORIZATION.md
  • Migration guide: docs-internal/PRE_COMMIT_PRE_PUSH_REORGANIZATION.md
  • Performance monitoring: scripts/measure_hook_performance.py

TDD Best Practices

1. Red-Green-Refactor Cycle

# RED: Write failing test first
def test_new_feature():
    result = new_function()
    assert result == expected_value  # Fails - function doesn't exist

# GREEN: Implement minimal code to pass
def new_function():
    return expected_value  # Passes

# REFACTOR: Improve implementation
def new_function():
    # Clean, efficient implementation
    return calculate_expected_value()

2. Test One Thing at a Time

# ✅ Good - Single assertion
def test_create_api_key_returns_id():
    result = create_api_key(...)
    assert "key_id" in result

def test_create_api_key_returns_secret():
    result = create_api_key(...)
    assert "api_key" in result

# ❌ Bad - Multiple unrelated assertions
def test_create_api_key():
    result = create_api_key(...)
    assert "key_id" in result
    assert "api_key" in result
    assert result["name"] == "Test"
    assert len(result["api_key"]) > 20

3. Use Exact Mock Assertions

# ✅ Good - Validates exact call
mock_manager.create_api_key.assert_called_once_with(
    user_id="8c7b4e5d-1234-5678-abcd-ef1234567890",
    name="Test Key",
    expires_days=365,
)

# ❌ Bad - Doesn't validate parameters
mock_manager.create_api_key.assert_called()

4. Arrange-Act-Assert Pattern

def test_endpoint():
    # Arrange - Set up test data
    user_data = {"name": "Alice", "email": "alice@example.com"}

    # Act - Perform the action
    response = client.post("/users", json=user_data)

    # Assert - Verify the outcome
    assert response.status_code == 201
    assert response.json()["name"] == "Alice"

5. Test Error Cases

def test_create_api_key_max_keys_exceeded(test_client, mock_api_key_manager):
    """Test API key creation when user has reached the limit"""
    mock_api_key_manager.create_api_key.side_effect = ValueError(
        "Maximum of 5 API keys allowed per user"
    )

    response = test_client.post("/api/v1/api-keys/", json={...})

    assert response.status_code == 400
    assert "Maximum of 5 API keys" in response.json()["detail"]

6. Use Descriptive Test Names

# ✅ Good - Clear what's being tested
def test_delete_user_account_without_confirmation():
    """Test deletion requires explicit confirmation"""

# ❌ Bad - Vague test name
def test_delete():
    """Test delete"""

Regression Test Patterns

Overview

Regression tests prevent fixed bugs from reoccurring by encoding the fix as a permanent test case. The project maintains a comprehensive suite of regression tests in tests/regression/ that encode fixes for real Codex findings and production issues.

Key Regression Test Categories

CategoryLocationPurpose
pytest-xdist Isolationtests/regression/test_pytest_xdist_*.pyPrevent parallel test pollution
Service Principaltests/regression/test_service_principal_*.pyAuth isolation between tests
LangGraph Typestests/regression/test_langgraph_return_types.pyType safety for agent responses
FastAPI Authtests/regression/test_fastapi_auth_override_*.pyDependency injection correctness
GDPR Isolationtests/regression/test_gdpr_singleton_*.pyPrivacy compliance isolation

Running Regression Tests

# Run all regression tests
pytest tests/regression/ -v

# Run specific regression category
pytest tests/regression/test_pytest_xdist*.py -v

# Include in CI pipeline
pytest -m regression
For detailed examples, see the test files in tests/regression/ which document the original issue and fix in their docstrings.
Last Updated: 2025-11-10 Status: ✅ Complete and current Recent Updates:
  • 2025-11-10: Added comprehensive regression test patterns section documenting fixes for Codex findings
  • 2025-01-05: Initial comprehensive testing guide