Overview
Comprehensive testing ensures reliability, maintainability, and confidence in deployments. This guide covers unit testing, integration testing, property-based testing, load testing, and security testing.
We aim for >80% code coverage with a multi-layered testing approach including unit, integration, property-based, and end-to-end tests.
Testing Pyramid
Unit Tests
Fast (< 1s)
Isolated
Mocked dependencies
70-80% of tests
Integration Tests
Real services
Database, Redis, APIs
15-20% of tests
Slower (seconds)
E2E Tests
Full system
User workflows
5-10% of tests
Slowest (minutes)
Test Organization
Directory Structure
tests/
├── unit/ # Unit tests
│ ├── test_auth.py
│ ├── test_agent.py
│ ├── test_llm.py
│ ├── test_openfga_client.py
│ ├── test_session.py
│ └── test_user_provider.py
│
├── integration/ # Integration tests
│ ├── test_api.py
│ ├── test_database.py
│ ├── test_redis.py
│ ├── test_keycloak.py
│ └── test_openfga.py
│
├── property/ # Property-based tests
│ ├── test_auth_properties.py
│ └── test_llm_properties.py
│
├── e2e/ # End-to-end tests
│ ├── test_user_journey.py
│ └── test_conversation_flow.py
│
├── load/ # Load tests
│ ├── locustfile.py
│ └── k6_script.js
│
├── security/ # Security tests
│ ├── test_auth_security.py
│ └── test_injection.py
│
└── conftest.py # Shared fixtures
Test Markers
## conftest.py
import pytest
def pytest_configure ( config ):
"""Register custom markers"""
config.addinivalue_line( "markers" , "unit: Unit tests (fast, isolated)" )
config.addinivalue_line( "markers" , "integration: Integration tests (requires services)" )
config.addinivalue_line( "markers" , "e2e: End-to-end tests (full system)" )
config.addinivalue_line( "markers" , "slow: Slow running tests" )
config.addinivalue_line( "markers" , "security: Security tests" )
Usage :
## Run unit tests only
pytest -m unit
## Run integration tests
pytest -m integration
## Run all except slow tests
pytest -m "not slow"
## Run security tests
pytest -m security
Unit Testing
Writing Unit Tests
## tests/unit/test_auth.py
import pytest
from unittest.mock import Mock, AsyncMock, patch
from src.auth.middleware import AuthMiddleware
from src.auth.providers.inmemory import InMemoryAuthProvider
@pytest.fixture
def auth_middleware ():
"""Create auth middleware with in-memory provider"""
return AuthMiddleware( provider = "inmemory" )
@pytest.mark.unit
async def test_create_token ( auth_middleware ):
"""Test JWT token creation"""
# Act
token = auth_middleware.create_token( "alice" , expires_in = 3600 )
# Assert
assert token is not None
assert isinstance (token, str )
assert len (token) > 50 # JWTs are long
# Verify token
user_id = await auth_middleware.verify_token(token)
assert user_id == "alice"
@pytest.mark.unit
async def test_invalid_token ( auth_middleware ):
"""Test that invalid tokens are rejected"""
with pytest.raises( Exception , match = "Invalid token" ):
await auth_middleware.verify_token( "invalid.token.here" )
@pytest.mark.unit
async def test_expired_token ( auth_middleware ):
"""Test that expired tokens are rejected"""
import asyncio
# Create token with 0 expiration
token = auth_middleware.create_token( "alice" , expires_in = 0 )
# Wait for expiration
await asyncio.sleep( 1 )
# Should raise
with pytest.raises( Exception , match = "expired" ):
await auth_middleware.verify_token(token)
@pytest.mark.unit
@pytest.mark.parametrize ( "user_id,expires_in" , [
( "alice" , 3600 ),
( "bob" , 1800 ),
( "charlie" , 7200 ),
])
async def test_multiple_users ( auth_middleware , user_id , expires_in ):
"""Test token creation for multiple users"""
token = auth_middleware.create_token(user_id, expires_in)
verified = await auth_middleware.verify_token(token)
assert verified == user_id
Mocking Dependencies
## tests/unit/test_agent.py
import pytest
from unittest.mock import AsyncMock, patch
from src.agent.langgraph_agent import LangGraphAgent
@pytest.fixture
def mock_llm ():
"""Mock LLM for testing"""
llm = AsyncMock()
llm.ainvoke.return_value = AsyncMock(
content = "Mocked response" ,
usage = AsyncMock(
prompt_tokens = 10 ,
completion_tokens = 20 ,
total_tokens = 30
)
)
return llm
@pytest.mark.unit
async def test_agent_invoke ( mock_llm ):
"""Test agent invocation with mocked LLM"""
# Arrange
agent = LangGraphAgent( llm = mock_llm)
# Act
response = await agent.ainvoke( "Test query" )
# Assert
assert response == "Mocked response"
mock_llm.ainvoke.assert_called_once_with( "Test query" )
@pytest.mark.unit
@patch ( 'src.agent.langgraph_agent.OpenFGAClient' )
async def test_agent_with_authorization ( mock_openfga_class , mock_llm ):
"""Test agent with mocked OpenFGA"""
# Setup mock
mock_openfga = AsyncMock()
mock_openfga.check_permission.return_value = True
mock_openfga_class.return_value = mock_openfga
# Create agent
agent = LangGraphAgent( llm = mock_llm, openfga_enabled = True )
# Invoke
response = await agent.ainvoke( "Test query" , user_id = "alice" )
# Verify authorization checked
mock_openfga.check_permission.assert_called_once()
Testing Async Code
## tests/unit/test_session.py
import pytest
from src.session.redis_session import RedisSessionStore
from unittest.mock import AsyncMock
@pytest.fixture
def mock_redis ():
"""Mock Redis client"""
redis = AsyncMock()
redis.get.return_value = None
redis.set.return_value = True
redis.delete.return_value = 1
return redis
@pytest.mark.unit
async def test_session_create ( mock_redis ):
"""Test session creation"""
store = RedisSessionStore( redis_client = mock_redis)
session_id = await store.create_session(
user_id = "alice" ,
data = { "key" : "value" }
)
assert session_id is not None
mock_redis.set.assert_called_once()
@pytest.mark.unit
async def test_session_get ( mock_redis ):
"""Test session retrieval"""
# Setup
mock_redis.get.return_value = '{"user_id": "alice", "key": "value"}'
store = RedisSessionStore( redis_client = mock_redis)
# Act
session = await store.get_session( "session_123" )
# Assert
assert session[ "user_id" ] == "alice"
assert session[ "key" ] == "value"
mock_redis.get.assert_called_once_with( "session:session_123" )
@pytest.mark.unit
async def test_session_delete ( mock_redis ):
"""Test session deletion"""
store = RedisSessionStore( redis_client = mock_redis)
result = await store.delete_session( "session_123" )
assert result is True
mock_redis.delete.assert_called_once_with( "session:session_123" )
Integration Testing
Setup Test Environment
## tests/conftest.py
import pytest
import asyncpg
from redis.asyncio import Redis
from testcontainers.postgres import PostgresContainer
from testcontainers.redis import RedisContainer
@pytest.fixture ( scope = "session" )
def postgres_container ():
"""Start PostgreSQL container for tests"""
with PostgresContainer( "postgres:15" ) as postgres:
yield postgres
@pytest.fixture ( scope = "session" )
def redis_container ():
"""Start Redis container for tests"""
with RedisContainer( "redis:7-alpine" ) as redis:
yield redis
@pytest.fixture
async def db_connection ( postgres_container ):
"""Create database connection"""
conn = await asyncpg.connect(postgres_container.get_connection_url())
try :
yield conn
finally :
await conn.close()
@pytest.fixture
async def redis_client ( redis_container ):
"""Create Redis client"""
client = Redis.from_url(redis_container.get_connection_url())
try :
yield client
finally :
await client.close()
Integration Tests
## tests/integration/test_database.py
import pytest
@pytest.mark.integration
async def test_user_creation ( db_connection ):
"""Test creating user in database"""
# Insert user
await db_connection.execute(
"INSERT INTO users (user_id, email, created_at) VALUES ($1, $2, NOW())" ,
"alice" , "alice@example.com"
)
# Verify
row = await db_connection.fetchrow(
"SELECT * FROM users WHERE user_id = $1" ,
"alice"
)
assert row[ "user_id" ] == "alice"
assert row[ "email" ] == "alice@example.com"
# Cleanup
await db_connection.execute( "DELETE FROM users WHERE user_id = $1" , "alice" )
@pytest.mark.integration
async def test_redis_operations ( redis_client ):
"""Test Redis operations"""
# Set value
await redis_client.set( "test_key" , "test_value" , ex = 60 )
# Get value
value = await redis_client.get( "test_key" )
assert value == b "test_value"
# Delete
await redis_client.delete( "test_key" )
# Verify deleted
value = await redis_client.get( "test_key" )
assert value is None
API Integration Tests
## tests/integration/test_api.py
import pytest
from httpx import AsyncClient
from src.main import app
@pytest.fixture
async def client ():
"""Create async HTTP client"""
async with AsyncClient( app = app, base_url = "http://test" ) as ac:
yield ac
@pytest.mark.integration
async def test_health_endpoint ( client ):
"""Test health check endpoint"""
response = await client.get( "/health/live" )
assert response.status_code == 200
assert response.json() == { "status" : "alive" }
@pytest.mark.integration
async def test_chat_endpoint ( client ):
"""Test chat endpoint"""
# Login first
login_response = await client.post(
"/auth/login" ,
json = { "username" : "test_user" , "password" : "test_pass" }
)
token = login_response.json()[ "access_token" ]
# Chat request
response = await client.post(
"/chat" ,
json = { "query" : "Hello" },
headers = { "Authorization" : f "Bearer { token } " }
)
assert response.status_code == 200
data = response.json()
assert "response" in data
assert isinstance (data[ "response" ], str )
Property-Based Testing
Using Hypothesis
## tests/property/test_auth_properties.py
import pytest
from hypothesis import given, strategies as st
from src.auth.middleware import AuthMiddleware
auth = AuthMiddleware( provider = "inmemory" )
@pytest.mark.unit
@given ( user_id = st.text( min_size = 1 , max_size = 100 ))
async def test_token_roundtrip ( user_id ):
"""Property: Creating and verifying token should return original user_id"""
# Create token
token = auth.create_token(user_id, expires_in = 3600 )
# Verify token
verified_user_id = await auth.verify_token(token)
# Property: Should get back same user_id
assert verified_user_id == user_id
@pytest.mark.unit
@given (
user_id = st.text( min_size = 1 , max_size = 100 ),
expires_in = st.integers( min_value = 1 , max_value = 86400 )
)
async def test_token_with_various_expirations ( user_id , expires_in ):
"""Property: Tokens with valid expiration should verify"""
token = auth.create_token(user_id, expires_in = expires_in)
verified = await auth.verify_token(token)
assert verified == user_id
## tests/property/test_llm_properties.py
from hypothesis import given, strategies as st
from src.llm.factory import LLMFactory
@pytest.mark.unit
@given (
provider = st.sampled_from([ "anthropic" , "openai" , "google" ]),
temperature = st.floats( min_value = 0.0 , max_value = 2.0 ),
max_tokens = st.integers( min_value = 1 , max_value = 8192 )
)
def test_llm_factory_properties ( provider , temperature , max_tokens ):
"""Property: LLM factory should create valid instances"""
llm = LLMFactory(
provider = provider,
temperature = temperature,
max_tokens = max_tokens
)
assert llm is not None
assert hasattr (llm, 'ainvoke' )
Load Testing
Using Locust
## tests/load/locustfile.py
from locust import HttpUser, task, between
import random
class MCPServerUser ( HttpUser ):
wait_time = between( 1 , 3 )
token = None
def on_start ( self ):
"""Login and get token"""
response = self .client.post( "/auth/login" , json = {
"username" : "load_test_user" ,
"password" : "password"
})
self .token = response.json()[ "access_token" ]
@task ( 3 )
def chat ( self ):
"""Send chat messages"""
queries = [
"What is AI?" ,
"Explain machine learning" ,
"How does deep learning work?" ,
"What are transformers?" ,
]
self .client.post(
"/chat" ,
json = { "query" : random.choice(queries)},
headers = { "Authorization" : f "Bearer { self .token } " }
)
@task ( 1 )
def list_conversations ( self ):
"""List user conversations"""
self .client.get(
"/conversations" ,
headers = { "Authorization" : f "Bearer { self .token } " }
)
@task ( 1 )
def health_check ( self ):
"""Check health"""
self .client.get( "/health/ready" )
Run load test :
## Install Locust
uv tool install locust
## Run test
locust -f tests/load/locustfile.py \
--host=http://localhost:8000 \
--users=100 \
--spawn-rate=10 \
--run-time=5m
## Open web UI
open http://localhost:8089
Using k6
// tests/load/k6_script.js
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export let options = {
stages: [
{ duration: '1m' , target: 50 }, // Ramp up
{ duration: '3m' , target: 50 }, // Stay at 50 users
{ duration: '1m' , target: 100 }, // Spike to 100
{ duration: '2m' , target: 100 }, // Stay at 100
{ duration: '1m' , target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: [ 'p(95)<500' ], // 95% of requests < 500ms
http_req_failed: [ 'rate<0.01' ], // Error rate < 1%
},
};
export default function () {
// Login
let loginRes = http . post ( 'http://localhost:8000/auth/login' , JSON . stringify ({
username: 'test_user' ,
password: 'password'
}), {
headers: { 'Content-Type' : 'application/json' },
});
check ( loginRes , {
'login successful' : ( r ) => r . status === 200 ,
});
let token = loginRes . json ( 'access_token' );
// Chat request
let chatRes = http . post ( 'http://localhost:8000/chat' , JSON . stringify ({
query: 'What is AI?'
}), {
headers: {
'Content-Type' : 'application/json' ,
'Authorization' : `Bearer ${ token } ` ,
},
});
check ( chatRes , {
'chat successful' : ( r ) => r . status === 200 ,
'response time OK' : ( r ) => r . timings . duration < 2000 ,
});
sleep ( 1 );
}
Run k6 test :
## Install k6
brew install k6
## Run test
k6 run tests/load/k6_script.js
## Run with cloud output
k6 run --out cloud tests/load/k6_script.js
Security Testing
SQL Injection Tests
## tests/security/test_injection.py
import pytest
from httpx import AsyncClient
@pytest.mark.security
@pytest.mark.parametrize ( "malicious_input" , [
"'; DROP TABLE users;--" ,
"' OR '1'='1" ,
"admin'--" ,
"1' UNION SELECT NULL,NULL,NULL--" ,
])
async def test_sql_injection_protection ( client , malicious_input ):
"""Test SQL injection protection"""
response = await client.post(
"/auth/login" ,
json = { "username" : malicious_input, "password" : "password" }
)
# Should not succeed or cause error
assert response.status_code in [ 401 , 400 ]
assert "DROP" not in str (response.content).upper()
@pytest.mark.security
@pytest.mark.parametrize ( "xss_input" , [
"`<script>`alert('XSS')`</script>`" ,
"`<img src=x onerror=alert('XSS')>`" ,
"javascript:alert('XSS')" ,
])
async def test_xss_protection ( client , token , xss_input ):
"""Test XSS protection"""
response = await client.post(
"/chat" ,
json = { "query" : xss_input},
headers = { "Authorization" : f "Bearer { token } " }
)
# Response should not contain unescaped script tags
assert "`<script>`" not in response.text
assert "onerror=" not in response.text
Authentication Tests
## tests/security/test_auth_security.py
import pytest
@pytest.mark.security
async def test_requires_authentication ( client ):
"""Test that protected endpoints require auth"""
response = await client.get( "/conversations" )
assert response.status_code == 401
@pytest.mark.security
async def test_invalid_token_rejected ( client ):
"""Test that invalid tokens are rejected"""
response = await client.get(
"/conversations" ,
headers = { "Authorization" : "Bearer invalid.token.here" }
)
assert response.status_code == 401
@pytest.mark.security
async def test_expired_token_rejected ( client ):
"""Test that expired tokens are rejected"""
# Create token with immediate expiration
expired_token = create_token( "alice" , expires_in =- 1 )
response = await client.get(
"/conversations" ,
headers = { "Authorization" : f "Bearer { expired_token } " }
)
assert response.status_code == 401
Continuous Integration
GitHub Actions Workflow
## .github/workflows/test.yml
name : Tests
on :
push :
branches : [ main , develop ]
pull_request :
branches : [ main , develop ]
jobs :
unit-tests :
runs-on : ubuntu-latest
strategy :
matrix :
python-version : [ "3.12" ]
steps :
- uses : actions/checkout@v3
- name : Set up Python
uses : actions/setup-python@v4
with :
python-version : ${{ matrix.python-version }}
- name : Install uv
run : curl -LsSf https://astral.sh/uv/install.sh | sh
- name : Install dependencies
run : |
uv sync
- name : Run unit tests
run : |
uv run pytest -m unit \
--cov=src \
--cov-report=xml \
--cov-report=term
- name : Upload coverage
uses : codecov/codecov-action@v3
with :
file : ./coverage.xml
fail_ci_if_error : true
integration-tests :
runs-on : ubuntu-latest
services :
postgres :
image : postgres:15
env :
POSTGRES_PASSWORD : postgres
options : > -
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis :
image : redis:7-alpine
options : > -
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps :
- uses : actions/checkout@v3
- name : Set up Python
uses : actions/setup-python@v4
with :
python-version : "3.12"
- name : Install dependencies
run : |
uv sync
- name : Run integration tests
env :
DATABASE_URL : postgresql://postgres:postgres@localhost/postgres
REDIS_URL : redis://localhost:6379
run : |
pytest -m integration -v
security-tests :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v3
- name : Run Bandit
run : |
uv tool install bandit
bandit -r src/ -f json -o bandit-report.json
- name : Run Safety
run : |
uv tool install safety
safety check --json
- name : Run Trivy
uses : aquasecurity/trivy-action@master
with :
scan-type : 'fs'
scan-ref : '.'
format : 'sarif'
output : 'trivy-results.sarif'
Test Coverage
Measuring Coverage
## Run with coverage
pytest --cov=src --cov-report=html --cov-report=term
## View HTML report
open htmlcov/index.html
## Coverage by file
pytest --cov=src --cov-report=term-missing
## Fail if coverage < 80%
pytest --cov=src --cov-fail-under=80
Coverage Configuration
## .coveragerc
[ run ]
source = s rc
omit =
*/tests/*
*/venv/*
*/migrations/*
[ report ]
exclude_lines =
pragma: no cover
def __repr__
raise AssertionError
raise NotImplementedError
if __name__ == . __main__.:
if TYPE_CHECKING:
@abstractmethod
precision = 2
show_missing = T rue
Best Practices
# Good
def test_user_creation_with_valid_data ()
def test_authentication_fails_with_invalid_token ()
def test_llm_returns_response_for_valid_query ()
# Bad
def test1 ()
def test_user ()
def test_stuff ()
def test_example ():
# Arrange - Setup test data
user_id = "alice"
auth = AuthMiddleware()
# Act - Execute the code being tested
token = auth.create_token(user_id)
# Assert - Verify the results
assert token is not None
verified = await auth.verify_token(token)
assert verified == user_id
# Good - Use fixtures
@pytest.fixture
def user ():
return { "id" : "alice" , "email" : "alice@example.com" }
def test_user ( user ):
assert user[ "id" ] == "alice"
# Bad - Setup in test
def test_user ():
user = { "id" : "alice" , "email" : "alice@example.com" }
assert user[ "id" ] == "alice"
# Good - Focused test
def test_user_has_correct_id ( user ):
assert user[ "id" ] == "alice"
def test_user_has_correct_email ( user ):
assert user[ "email" ] == "alice@example.com"
# Acceptable - Related assertions
def test_user_data ( user ):
assert user[ "id" ] == "alice"
assert user[ "email" ] == "alice@example.com"
Test Dependency Management
Critical : All packages imported in tests must be available in project dependencies. Missing test dependencies cause CI failures that pass locally.
The Problem
On 2025-11-12, 10 CI jobs failed with ModuleNotFoundError: docker.errors despite tests passing locally:
Root cause : docker and kubernetes packages were in code-execution extras, not dev extras
Impact : CI workflows install dev extras but tests import docker modules
Result : Tests pass locally (all extras installed) but fail in CI (only dev installed)
Prevention Strategy
Three layers of protection prevent this issue:
Regression Test
File : tests/regression/test_dev_dependencies.pyValidates all test imports against project dependencies: @pytest.mark.regression
def test_test_imports_have_dev_dependencies ():
"""Ensure every test import has a corresponding dependency"""
# Checks all imports in tests/ against:
# - Main dependencies
# - All optional dependencies
# - Handles package vs import name mismatches
# - Excludes stdlib and project modules
Runs : On every commit in CI, catches missing dependencies immediately.
Pre-commit Hook
Hook ID : validate-test-dependenciesRuns dependency validation before each commit: # Triggers on test file or pyproject.toml changes
pytest tests/regression/test_dev_dependencies.py::test_test_imports_have_dev_dependencies -v
Fast : Only runs affected test, typically < 5 seconds.
Workflow Validation
Script : scripts/validation/validate_workflow_test_deps.pyValidates GitHub Actions workflows: # Checks that workflows using setup-python-deps + pytest
# install the required 'dev' extras
python scripts / validation / validate_workflow_test_deps.py
Hook ID : validate-workflow-test-deps (runs on workflow file changes)
Dependency Rules
Rule 1: Test Imports → Dev Extras
Rule 2: Handle Package Name Mismatches
Some packages have different import names: Package Name Import Name Example PyJWTjwtimport jwtpyyamlyamlimport yamlpython-dotenvdotenvfrom dotenv import load_dotenvinfisical-pythoninfisical_clientfrom infisical_client import...
The regression test handles these automatically via get_import_name_mapping().
Python 3.11+ includes tomllib in stdlib. Since the project minimum is Python 3.11,
use tomllib directly: # Python 3.11+ (project minimum)
import tomllib
The tomli backport is no longer needed or allowed.
Rule 4: Exclude Third-Party Helpers
Third-party test tools (e.g., bats-core) have their own dependencies: # tests/test_helper/bats-core/docs/source/conf.py
import recommonmark # ✅ Excluded from validation
The regression test skips test_helper/ directories automatically.
Quick Reference
Local Testing
Dependency Configuration
# Run dependency validation
pytest tests/regression/test_dev_dependencies.py -v
# Test pre-commit hook
pre-commit run validate-test-dependencies --all-files
# Test workflow validation
python scripts/validation/validate_workflow_test_deps.py
``` bash
``` yaml Workflow Configuration
# .github/workflows/quality-tests.yaml
- name: Setup Python and dependencies
uses: ./.github/actions/setup-python-deps
with:
python-version: '3.12'
extras: 'dev builder' # ✅ Must include 'dev' for tests
Troubleshooting
Test fails in CI but passes locally
Symptom : ModuleNotFoundError in CI, test passes on your machine.Cause : Package installed locally but not in dev extras.Solution :
Run dependency validation: pytest tests/regression/test_dev_dependencies.py -v
Add missing package to [project.optional-dependencies.dev]
Run uv sync --extra dev to update lockfile
Re-run tests locally to verify
Example :# Test output shows missing package
Test files import packages not in any dependency group:
- docker
# Fix in pyproject.toml
dev = [
"docker>=7.1.0", # Add missing package
]
Symptom : validate-test-dependencies hook fails on commit.Cause : New test import without corresponding dependency.Solution :
Check hook output for missing packages
Add packages to appropriate extras in pyproject.toml:
Test dependencies → dev
Code execution → code-execution
CLI tools → cli
Run uv sync --extra dev
Re-run pre-commit: pre-commit run validate-test-dependencies --all-files
Workflow validation fails
Symptom : validate-workflow-test-deps hook fails.Cause : Workflow runs pytest but doesn’t install dev extras.Solution :
Update workflow to include dev in extras:- uses : ./.github/actions/setup-python-deps
with :
extras : 'dev' # or 'dev builder' for multiple
Historical Context : This protection system was implemented after commit 7b51437 (2025-11-12) to prevent the recurrence of 10 simultaneous CI failures. The regression test, pre-commit hooks, and workflow validation ensure this class of error can never happen again.
Next Steps
Testing Complete : Comprehensive test coverage for reliable deployments!