Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
21. CI/CD Pipeline Strategy
Date: 2025-10-13
Status
Accepted
Category
Infrastructure & Deployment
Context
Modern software projects require automated CI/CD to:
- Maintain Code Quality: Prevent bugs from reaching production
- Ensure Security: Detect vulnerabilities early
- Enable Fast Iteration: Deploy changes quickly and safely
- Build Confidence: Automated testing reduces manual QA burden
- Document Changes: Automated releases with changelogs
Without CI/CD:
- Manual testing is slow, error-prone, incomplete
- Security vulnerabilities go undetected
- Deployment is risky and inconsistent
- Code quality degrades over time
- Contributors lack feedback on their changes
For an AI agent system with:
- Multiple LLM providers (Anthropic, OpenAI, Google, etc.)
- Complex auth (JWT, Keycloak, OpenFGA)
- Multiple deployment targets (Docker, Kubernetes, Cloud Run, Helm)
- Compliance requirements (GDPR, SOC 2, HIPAA)
…manual testing is infeasible. Automated pipelines are mandatory.
Decision
We will implement a multi-stage CI/CD pipeline using GitHub Actions with the following workflows:
Pipeline Architecture
┌─────────────────────────────────────────────────────────┐
│ PR Checks (on pull_request) │
│ ├─ Lint (Black, isort, flake8) │
│ ├─ Type Check (mypy) │
│ └─ Unit Tests (pytest) │
└─────────────────────────────────────────────────────────┘
↓ (PR approved + merged)
┌─────────────────────────────────────────────────────────┐
│ CI Pipeline (on push to main) │
│ ├─ Lint & Type Check │
│ ├─ Unit Tests (all markers) │
│ ├─ Integration Tests │
│ ├─ Property Tests (Hypothesis) │
│ ├─ Contract Tests (MCP schema) │
│ ├─ Build Docker Image │
│ └─ Deployment Validation │
└─────────────────────────────────────────────────────────┘
↓ (tests pass)
┌─────────────────────────────────────────────────────────┐
│ Quality Tests (nightly/on-demand) │
│ ├─ Mutation Testing (mutmut) │
│ ├─ Performance Regression Tests │
│ ├─ OpenAPI Breaking Change Detection │
│ └─ Deployment E2E Tests (kind clusters) │
└─────────────────────────────────────────────────────────┘
↓ (quality gates pass)
┌─────────────────────────────────────────────────────────┐
│ Security Scan (daily/on-demand) │
│ ├─ Bandit (Python security linter) │
│ ├─ Dependency Audit (pip-audit) │
│ ├─ SAST (Static Application Security Testing) │
│ └─ Container Scanning │
└─────────────────────────────────────────────────────────┘
↓ (tag pushed)
┌─────────────────────────────────────────────────────────┐
│ Release Pipeline (on tag v*) │
│ ├─ Build & Publish Docker Image │
│ ├─ Build & Publish PyPI Package │
│ ├─ Generate Release Notes │
│ └─ Create GitHub Release │
└─────────────────────────────────────────────────────────┘
Consequences
Positive Consequences
- Quality Assurance: Multi-layered testing catches bugs before production
- Security: Automated vulnerability scanning prevents security incidents
- Fast Feedback: PRs get feedback within 5-10 minutes
- Confidence: 87%+ test coverage + mutation testing = high code quality
- Deployment Safety: Validation scripts verify all deployment configs
- Developer Productivity: Automated releases save 2+ hours per release
Negative Consequences
- Complexity: 6 workflow files to maintain (1200+ lines of YAML)
- CI Minutes: ~30 minutes per full pipeline run (GitHub Actions cost)
- Maintenance: Workflows require updates when dependencies change
- Debugging: Failed CI requires log analysis and investigation
Neutral Consequences
- Learning Curve: New contributors must understand workflow structure
- External Dependencies: Relies on GitHub Actions availability
Implementation Details
Workflow 1: PR Checks (.github/workflows/pr-checks.yaml)
Trigger: Every pull request
Purpose: Fast feedback on code quality
Jobs:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Black formatting check
run: black --check src/ tests/
- name: isort import ordering
run: isort --check-only src/ tests/
- name: flake8 linting
run: flake8 src/ tests/
type-check:
runs-on: ubuntu-latest
steps:
- name: mypy type checking
run: mypy src/
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
steps:
- name: Run unit tests
run: pytest -m unit --cov=src --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
Duration: ~5 minutes
Fail Fast: Yes (blocks PR merge if fails)
Workflow 2: CI Pipeline (.github/workflows/ci.yaml)
Trigger: Push to main branch
Purpose: Comprehensive testing and Docker build
Jobs:
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Run all tests
run: |
pytest -m unit
pytest -m integration
pytest tests/property/
pytest tests/contract/
build-docker:
needs: test
runs-on: ubuntu-latest
steps:
- name: Build Docker image
run: docker build -t mcp-server-langgraph:${{ github.sha }} .
- name: Test Docker image
run: |
docker run --rm mcp-server-langgraph:${{ github.sha }} \
mcp-server-langgraph --version
validate-deployments:
needs: test
runs-on: ubuntu-latest
steps:
- name: Validate Kubernetes manifests
run: ./scripts/validation/validate_deployments.sh
- name: Validate Helm charts
run: helm lint deployments/helm/langgraph-agent
- name: Validate Kustomize overlays
run: kustomize build deployments/kustomize/overlays/production
Duration: ~15 minutes
Artifacts: Docker image (tagged with git SHA)
Workflow 3: Quality Tests (.github/workflows/quality-tests.yaml)
Trigger: Nightly cron, manual dispatch
Purpose: Deep quality analysis
Jobs:
jobs:
mutation-testing:
runs-on: ubuntu-latest
timeout-minutes: 120
steps:
- name: Run mutmut
run: |
mutmut run --paths-to-mutate src/
mutmut results
- name: Check mutation score
run: |
SCORE=$(mutmut results | grep -oP '\d+%' | head -1 | tr -d '%')
if [ $SCORE -lt 80 ]; then
echo "Mutation score $SCORE% below threshold 80%"
exit 1
fi
performance-regression:
runs-on: ubuntu-latest
steps:
- name: Run benchmark tests
run: pytest tests/performance/test_benchmarks.py --benchmark-json=output.json
- name: Check for regression
run: python scripts/check_performance_regression.py output.json
openapi-validation:
runs-on: ubuntu-latest
steps:
- name: Generate OpenAPI spec
run: python scripts/development/generate_openapi.py
- name: Validate spec
run: openapi-spec-validator openapi.json
- name: Detect breaking changes
run: oasdiff breaking openapi-previous.json openapi.json
Duration: ~60-90 minutes (mutation testing is slow)
Frequency: Nightly (1:00 AM UTC)
Workflow 4: Security Scan (.github/workflows/security-scan.yaml)
Trigger: Daily cron, on-demand
Purpose: Security vulnerability detection
Jobs:
jobs:
bandit-scan:
runs-on: ubuntu-latest
steps:
- name: Run Bandit security linter
run: bandit -r src/ -f json -o bandit-report.json
- name: Check for high-severity issues
run: |
HIGH=$(jq '.results[] | select(.issue_severity=="HIGH")' bandit-report.json | jq -s length)
if [ $HIGH -gt 0 ]; then
echo "Found $HIGH high-severity security issues"
exit 1
fi
dependency-audit:
runs-on: ubuntu-latest
steps:
- name: Run pip-audit
run: pip-audit --require-hashes --format json
- name: Check for vulnerabilities
run: python scripts/dependency-audit.sh
container-scan:
runs-on: ubuntu-latest
steps:
- name: Build image
run: docker build -t test-image .
- name: Scan with Trivy
uses: aquasecurity/trivy-action@master
with:
image-ref: test-image
severity: 'CRITICAL,HIGH'
exit-code: '1'
Duration: ~10 minutes
Frequency: Daily (2:00 AM UTC)
Workflow 5: Release Pipeline (.github/workflows/release.yaml)
Trigger: Git tag push matching v* (e.g., v2.2.0)
Purpose: Automated release and publishing
Jobs:
jobs:
release:
runs-on: ubuntu-latest
permissions:
contents: write
packages: write
steps:
- name: Extract version from tag
run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_ENV
- name: Build Python package
run: |
python -m build
twine check dist/*
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
- name: Build & push Docker image
run: |
docker build -t ghcr.io/${{ github.repository }}:${{ env.VERSION }} .
docker push ghcr.io/${{ github.repository }}:${{ env.VERSION }}
- name: Generate release notes
run: |
gh release create ${{ github.ref_name }} \
--title "Release ${{ env.VERSION }}" \
--notes-file RELEASE_NOTES.md \
--verify-tag
Duration: ~8 minutes
Artifacts:
- PyPI package
- Docker image (GHCR)
- GitHub Release
Workflow 6: Stale Issues (.github/workflows/stale.yaml)
Trigger: Daily cron
Purpose: Clean up stale issues and PRs
jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v8
with:
stale-issue-message: 'This issue is stale. Please update or it will be closed.'
days-before-stale: 60
days-before-close: 14
Testing Strategy Matrix
| Test Type | Frequency | Duration | Coverage | Purpose |
|---|
| Unit Tests | Every PR | 3 min | 87% | Fast feedback on logic |
| Integration Tests | Main branch | 5 min | Key paths | Test component interactions |
| Property Tests | Main branch | 2 min | Edge cases | Hypothesis-based fuzzing |
| Contract Tests | Main branch | 1 min | API contracts | MCP protocol compliance |
| Mutation Tests | Nightly | 90 min | 80%+ | Test effectiveness |
| Performance Tests | Nightly | 10 min | Baselines | Regression detection |
| E2E Deployment Tests | Weekly | 20 min | All targets | Deployment validation |
| Security Scans | Daily | 10 min | Vulnerabilities | Security posture |
Total Test Suite: 367 tests across 8 test categories
Deployment Validation
Kubernetes Validation
# scripts/validation/validate_deployments.sh
# Validate all Kubernetes manifests
kubectl apply --dry-run=client -f deployments/kubernetes/
# Check Helm chart syntax
helm lint deployments/helm/langgraph-agent/
# Validate Kustomize builds
for env in dev staging production; do
kustomize build deployments/kustomize/overlays/$env
done
# Test with kind cluster
kind create cluster --name test-cluster
kubectl apply -f deployments/kubernetes/
kubectl wait --for=condition=ready pod -l app=langgraph-agent --timeout=300s
Docker Build Validation
# .github/workflows/ci.yaml
- name: Build multi-platform images
run: |
docker buildx create --use
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag test-image:latest \
--load .
- name: Test image
run: |
docker run --rm test-image:latest mcp-server-langgraph --version
docker run --rm test-image:latest mcp-server-langgraph-http --help
Alternatives Considered
1. Jenkins
Description: Self-hosted CI/CD with Jenkins
Pros:
- Full control over infrastructure
- Extensive plugin ecosystem
- No CI minutes cost
Cons:
- Maintenance burden (self-hosted servers)
- Scaling complexity (need to provision workers)
- No native GitHub integration
- Slower than cloud-native solutions
Why Rejected: Maintenance overhead outweighs control benefits for open-source project
2. GitLab CI/CD
Description: GitLab’s integrated CI/CD
Pros:
- Integrated with Git repository
- Good Docker/Kubernetes support
- Free tier available
Cons:
- Requires GitLab migration (currently on GitHub)
- Smaller ecosystem than GitHub Actions
- Learning curve for GitHub-native developers
Why Rejected: Already on GitHub, no compelling reason to switch
3. CircleCI
Description: Cloud-based CI/CD platform
Pros:
- Fast execution
- Good Docker support
- Easy configuration
Cons:
- Cost (expensive for open source)
- GitHub Actions is free for public repos
- Less GitHub integration than Actions
Why Rejected: GitHub Actions is free and better integrated
4. Travis CI
Description: Legacy CI/CD platform
Pros:
- Historic open-source support
- Simple YAML config
Cons:
- Declining market share (losing to GitHub Actions)
- Slower than modern alternatives
- Limited features compared to GitHub Actions
Why Rejected: GitHub Actions is the modern standard
5. Manual Testing Only
Description: No CI/CD, manual testing and releases
Pros:
- No workflow maintenance
- No CI costs
- Simple
Cons:
- Error-prone (humans make mistakes)
- Slow (hours per release)
- Unscalable (cannot test all combinations)
- No security scanning
- Poor quality (testing skipped under pressure)
Why Rejected: Infeasible for production-grade software
CI/CD Metrics
| Workflow | Avg Duration | Success Rate | Runs per Week |
|---|
| PR Checks | 5 min | 94% | 50+ |
| CI Pipeline | 15 min | 92% | 20+ |
| Quality Tests | 90 min | 88% | 7 (nightly) |
| Security Scan | 10 min | 96% | 7 (daily) |
| Release | 8 min | 98% | 2-3 |
Total CI Minutes/Week: 1,200 minutes ($5-10 cost on GitHub Actions)
Code Quality Impact
- Bug Detection: 45+ bugs caught by CI before production (last 6 months)
- Security Issues: 12 vulnerabilities detected and patched via automated scans
- Deployment Failures Prevented: 8 broken deployments caught by validation scripts
- Test Coverage: Increased from 45% → 87% via automated coverage tracking
Best Practices
1. Fail Fast
# Run fast checks first (lint, type-check) before slow tests
jobs:
lint:
runs-on: ubuntu-latest
unit-tests:
needs: lint # Only run tests if lint passes
2. Matrix Testing
# Test multiple Python versions
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
os: [ubuntu-latest, macos-latest]
3. Caching
# Cache dependencies to speed up runs
- uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }}
4. Conditional Workflows
# Skip expensive tests on docs-only changes
on:
push:
paths-ignore:
- '**.md'
- 'docs/**'
5. Secure Secrets
# Use GitHub Secrets for sensitive data
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
Future Enhancements
- Automated Dependency Updates: Dependabot or Renovate for dependency PRs
- Preview Deployments: Automatic deployment of PRs to staging environments
- Slack Notifications: Pipeline status notifications to team Slack channel
- Deployment Rollback: Automated rollback on production health check failures
- Blue-Green Deployments: Zero-downtime deployments with traffic shifting
References
- GitHub Actions Documentation: https://docs.github.com/en/actions
- Workflow Files:
.github/workflows/*.yaml
- CI Workflow:
.github/workflows/ci.yaml
- PR Checks:
.github/workflows/pr-checks.yaml
- Quality Tests:
.github/workflows/quality-tests.yaml
- Security Scan:
.github/workflows/security-scan.yaml
- Release Workflow:
.github/workflows/release.yaml
- Validation Scripts:
scripts/validation/validate_deployments.sh