21. CI/CD Pipeline Strategy

Date: 2025-10-13

Status

Accepted

Context

Modern software projects require automated CI/CD to:

Maintain Code Quality: Prevent bugs from reaching production
Ensure Security: Detect vulnerabilities early
Enable Fast Iteration: Deploy changes quickly and safely
Build Confidence: Automated testing reduces manual QA burden
Document Changes: Automated releases with changelogs

Without CI/CD:

Manual testing is slow, error-prone, incomplete
Security vulnerabilities go undetected
Deployment is risky and inconsistent
Code quality degrades over time
Contributors lack feedback on their changes

For an AI agent system with:

Multiple LLM providers (Anthropic, OpenAI, Google, etc.)
Complex auth (JWT, Keycloak, OpenFGA)
Multiple deployment targets (Docker, Kubernetes, Cloud Run, Helm)
Compliance requirements (GDPR, SOC 2, HIPAA)

…manual testing is infeasible. Automated pipelines are mandatory.

Decision

We will implement a multi-stage CI/CD pipeline using GitHub Actions with the following workflows:

Pipeline Architecture

┌─────────────────────────────────────────────────────────┐
│  PR Checks (on pull_request)                            │
│  ├─ Lint (Black, isort, flake8)                        │
│  ├─ Type Check (mypy)                                  │
│  └─ Unit Tests (pytest)                                │
└─────────────────────────────────────────────────────────┘
                         ↓ (PR approved + merged)
┌─────────────────────────────────────────────────────────┐
│  CI Pipeline (on push to main)                          │
│  ├─ Lint & Type Check                                  │
│  ├─ Unit Tests (all markers)                           │
│  ├─ Integration Tests                                  │
│  ├─ Property Tests (Hypothesis)                        │
│  ├─ Contract Tests (MCP schema)                        │
│  ├─ Build Docker Image                                 │
│  └─ Deployment Validation                              │
└─────────────────────────────────────────────────────────┘
                         ↓ (tests pass)
┌─────────────────────────────────────────────────────────┐
│  Quality Tests (nightly/on-demand)                      │
│  ├─ Mutation Testing (mutmut)                          │
│  ├─ Performance Regression Tests                       │
│  ├─ OpenAPI Breaking Change Detection                  │
│  └─ Deployment E2E Tests (kind clusters)               │
└─────────────────────────────────────────────────────────┘
                         ↓ (quality gates pass)
┌─────────────────────────────────────────────────────────┐
│  Security Scan (daily/on-demand)                        │
│  ├─ Bandit (Python security linter)                    │
│  ├─ Dependency Audit (pip-audit)                       │
│  ├─ SAST (Static Application Security Testing)         │
│  └─ Container Scanning                                 │
└─────────────────────────────────────────────────────────┘
                         ↓ (tag pushed)
┌─────────────────────────────────────────────────────────┐
│  Release Pipeline (on tag v*)                           │
│  ├─ Build & Publish Docker Image                       │
│  ├─ Build & Publish PyPI Package                       │
│  ├─ Generate Release Notes                             │
│  └─ Create GitHub Release                              │
└─────────────────────────────────────────────────────────┘

Consequences

Positive Consequences

Quality Assurance: Multi-layered testing catches bugs before production
Security: Automated vulnerability scanning prevents security incidents
Fast Feedback: PRs get feedback within 5-10 minutes
Confidence: 87%+ test coverage + mutation testing = high code quality
Deployment Safety: Validation scripts verify all deployment configs
Developer Productivity: Automated releases save 2+ hours per release

Negative Consequences

Complexity: 6 workflow files to maintain (1200+ lines of YAML)
CI Minutes: ~30 minutes per full pipeline run (GitHub Actions cost)
Maintenance: Workflows require updates when dependencies change
Debugging: Failed CI requires log analysis and investigation

Neutral Consequences

Learning Curve: New contributors must understand workflow structure
External Dependencies: Relies on GitHub Actions availability

Implementation Details

Workflow 1: PR Checks (`.github/workflows/pr-checks.yaml`)

Trigger: Every pull request Purpose: Fast feedback on code quality Jobs:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - name: Black formatting check
        run: black --check src/ tests/

      - name: isort import ordering
        run: isort --check-only src/ tests/

      - name: flake8 linting
        run: flake8 src/ tests/

  type-check:
    runs-on: ubuntu-latest
    steps:
      - name: mypy type checking
        run: mypy src/

  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12']
    steps:
      - name: Run unit tests
        run: pytest -m unit --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v3

Duration: ~5 minutes Fail Fast: Yes (blocks PR merge if fails)

Workflow 2: CI Pipeline (`.github/workflows/ci.yaml`)

Trigger: Push to main branch Purpose: Comprehensive testing and Docker build Jobs:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run all tests
        run: |
          pytest -m unit
          pytest -m integration
          pytest tests/property/
          pytest tests/contract/

  build-docker:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Build Docker image
        run: docker build -t mcp-server-langgraph:${{ github.sha }} .

      - name: Test Docker image
        run: |
          docker run --rm mcp-server-langgraph:${{ github.sha }} \
            mcp-server-langgraph --version

  validate-deployments:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Validate Kubernetes manifests
        run: ./scripts/validation/validate_deployments.sh

      - name: Validate Helm charts
        run: helm lint deployments/helm/langgraph-agent

      - name: Validate Kustomize overlays
        run: kustomize build deployments/kustomize/overlays/production

Duration: ~15 minutes Artifacts: Docker image (tagged with git SHA)

Workflow 3: Quality Tests (`.github/workflows/quality-tests.yaml`)

Trigger: Nightly cron, manual dispatch Purpose: Deep quality analysis Jobs:

jobs:
  mutation-testing:
    runs-on: ubuntu-latest
    timeout-minutes: 120
    steps:
      - name: Run mutmut
        run: |
          mutmut run --paths-to-mutate src/
          mutmut results

      - name: Check mutation score
        run: |
          SCORE=$(mutmut results | grep -oP '\d+%' | head -1 | tr -d '%')
          if [ $SCORE -lt 80 ]; then
            echo "Mutation score $SCORE% below threshold 80%"
            exit 1
          fi

  performance-regression:
    runs-on: ubuntu-latest
    steps:
      - name: Run benchmark tests
        run: pytest tests/performance/test_benchmarks.py --benchmark-json=output.json

      - name: Check for regression
        run: python scripts/check_performance_regression.py output.json

  openapi-validation:
    runs-on: ubuntu-latest
    steps:
      - name: Generate OpenAPI spec
        run: python scripts/development/generate_openapi.py

      - name: Validate spec
        run: openapi-spec-validator openapi.json

      - name: Detect breaking changes
        run: oasdiff breaking openapi-previous.json openapi.json

Duration: ~60-90 minutes (mutation testing is slow) Frequency: Nightly (1:00 AM UTC)

Workflow 4: Security Scan (`.github/workflows/security-scan.yaml`)

Trigger: Daily cron, on-demand Purpose: Security vulnerability detection Jobs:

jobs:
  bandit-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Run Bandit security linter
        run: bandit -r src/ -f json -o bandit-report.json

      - name: Check for high-severity issues
        run: |
          HIGH=$(jq '.results[] | select(.issue_severity=="HIGH")' bandit-report.json | jq -s length)
          if [ $HIGH -gt 0 ]; then
            echo "Found $HIGH high-severity security issues"
            exit 1
          fi

  dependency-audit:
    runs-on: ubuntu-latest
    steps:
      - name: Run pip-audit
        run: pip-audit --require-hashes --format json

      - name: Check for vulnerabilities
        run: python scripts/dependency-audit.sh

  container-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Build image
        run: docker build -t test-image .

      - name: Scan with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: test-image
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

Duration: ~10 minutes Frequency: Daily (2:00 AM UTC)

Workflow 5: Release Pipeline (`.github/workflows/release.yaml`)

Trigger: Git tag push matching v* (e.g., v2.2.0) Purpose: Automated release and publishing Jobs:

jobs:
  release:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      packages: write
    steps:
      - name: Extract version from tag
        run: echo "VERSION=${GITHUB_REF#refs/tags/v}" >> $GITHUB_ENV

      - name: Build Python package
        run: |
          python -m build
          twine check dist/*

      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_API_TOKEN }}

      - name: Build & push Docker image
        run: |
          docker build -t ghcr.io/${{ github.repository }}:${{ env.VERSION }} .
          docker push ghcr.io/${{ github.repository }}:${{ env.VERSION }}

      - name: Generate release notes
        run: |
          gh release create ${{ github.ref_name }} \
            --title "Release ${{ env.VERSION }}" \
            --notes-file RELEASE_NOTES.md \
            --verify-tag

Duration: ~8 minutes Artifacts:

PyPI package
Docker image (GHCR)
GitHub Release

Workflow 6: Stale Issues (`.github/workflows/stale.yaml`)

Trigger: Daily cron Purpose: Clean up stale issues and PRs

jobs:
  stale:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/stale@v8
        with:
          stale-issue-message: 'This issue is stale. Please update or it will be closed.'
          days-before-stale: 60
          days-before-close: 14

Testing Strategy Matrix

Test Type	Frequency	Duration	Coverage	Purpose
Unit Tests	Every PR	3 min	87%	Fast feedback on logic
Integration Tests	Main branch	5 min	Key paths	Test component interactions
Property Tests	Main branch	2 min	Edge cases	Hypothesis-based fuzzing
Contract Tests	Main branch	1 min	API contracts	MCP protocol compliance
Mutation Tests	Nightly	90 min	80%+	Test effectiveness
Performance Tests	Nightly	10 min	Baselines	Regression detection
E2E Deployment Tests	Weekly	20 min	All targets	Deployment validation
Security Scans	Daily	10 min	Vulnerabilities	Security posture

Total Test Suite: 367 tests across 8 test categories

Deployment Validation

Kubernetes Validation

# scripts/validation/validate_deployments.sh

# Validate all Kubernetes manifests
kubectl apply --dry-run=client -f deployments/kubernetes/

# Check Helm chart syntax
helm lint deployments/helm/langgraph-agent/

# Validate Kustomize builds
for env in dev staging production; do
  kustomize build deployments/kustomize/overlays/$env
done

# Test with kind cluster
kind create cluster --name test-cluster
kubectl apply -f deployments/kubernetes/
kubectl wait --for=condition=ready pod -l app=langgraph-agent --timeout=300s

Docker Build Validation

# .github/workflows/ci.yaml
- name: Build multi-platform images
  run: |
    docker buildx create --use
    docker buildx build \
      --platform linux/amd64,linux/arm64 \
      --tag test-image:latest \
      --load .

- name: Test image
  run: |
    docker run --rm test-image:latest mcp-server-langgraph --version
    docker run --rm test-image:latest mcp-server-langgraph-http --help

Alternatives Considered

1. Jenkins

Description: Self-hosted CI/CD with Jenkins Pros:

Full control over infrastructure
Extensive plugin ecosystem
No CI minutes cost

Cons:

Maintenance burden (self-hosted servers)
Scaling complexity (need to provision workers)
No native GitHub integration
Slower than cloud-native solutions

Why Rejected: Maintenance overhead outweighs control benefits for open-source project

2. GitLab CI/CD

Description: GitLab’s integrated CI/CD Pros:

Integrated with Git repository
Good Docker/Kubernetes support
Free tier available

Cons:

Requires GitLab migration (currently on GitHub)
Smaller ecosystem than GitHub Actions
Learning curve for GitHub-native developers

Why Rejected: Already on GitHub, no compelling reason to switch

3. CircleCI

Description: Cloud-based CI/CD platform Pros:

Fast execution
Good Docker support
Easy configuration

Cons:

Cost (expensive for open source)
GitHub Actions is free for public repos
Less GitHub integration than Actions

Why Rejected: GitHub Actions is free and better integrated

4. Travis CI

Description: Legacy CI/CD platform Pros:

Historic open-source support
Simple YAML config

Cons:

Declining market share (losing to GitHub Actions)
Slower than modern alternatives
Limited features compared to GitHub Actions

Why Rejected: GitHub Actions is the modern standard

5. Manual Testing Only

Description: No CI/CD, manual testing and releases Pros:

No workflow maintenance
No CI costs
Simple

Cons:

Error-prone (humans make mistakes)
Slow (hours per release)
Unscalable (cannot test all combinations)
No security scanning
Poor quality (testing skipped under pressure)

Why Rejected: Infeasible for production-grade software

CI/CD Metrics

Pipeline Performance

Workflow	Avg Duration	Success Rate	Runs per Week
PR Checks	5 min	94%	50+
CI Pipeline	15 min	92%	20+
Quality Tests	90 min	88%	7 (nightly)
Security Scan	10 min	96%	7 (daily)
Release	8 min	98%	2-3

Total CI Minutes/Week: ~~1,200 minutes (~~$5-10 cost on GitHub Actions)

Code Quality Impact

Bug Detection: 45+ bugs caught by CI before production (last 6 months)
Security Issues: 12 vulnerabilities detected and patched via automated scans
Deployment Failures Prevented: 8 broken deployments caught by validation scripts
Test Coverage: Increased from 45% → 87% via automated coverage tracking

Best Practices

1. Fail Fast

# Run fast checks first (lint, type-check) before slow tests
jobs:
  lint:
    runs-on: ubuntu-latest
  unit-tests:
    needs: lint  # Only run tests if lint passes

2. Matrix Testing

# Test multiple Python versions
strategy:
  matrix:
    python-version: ['3.10', '3.11', '3.12']
    os: [ubuntu-latest, macos-latest]

3. Caching

# Cache dependencies to speed up runs
- uses: actions/cache@v3
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }}

4. Conditional Workflows

# Skip expensive tests on docs-only changes
on:
  push:
    paths-ignore:
      - '**.md'
      - 'docs/**'

5. Secure Secrets

# Use GitHub Secrets for sensitive data
env:
  ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}

Future Enhancements

Automated Dependency Updates: Dependabot or Renovate for dependency PRs
Preview Deployments: Automatic deployment of PRs to staging environments
Slack Notifications: Pipeline status notifications to team Slack channel
Deployment Rollback: Automated rollback on production health check failures
Blue-Green Deployments: Zero-downtime deployments with traffic shifting

References

GitHub Actions Documentation: https://docs.github.com/en/actions
Workflow Files: .github/workflows/*.yaml
CI Workflow: .github/workflows/ci.yaml
PR Checks: .github/workflows/pr-checks.yaml
Quality Tests: .github/workflows/quality-tests.yaml
Security Scan: .github/workflows/security-scan.yaml
Release Workflow: .github/workflows/release.yaml
Validation Scripts: scripts/validation/validate_deployments.sh

Overview

Project

Core Platform

Authentication & Identity

Infrastructure & Deployment

Development & Quality

Testing Infrastructure

CI/CD & Operations

Tooling & Standards

Compliance

​21. CI/CD Pipeline Strategy

​Status

​Category

​Context

​Decision

​Pipeline Architecture

​Consequences

​Positive Consequences

​Negative Consequences

​Neutral Consequences

​Implementation Details

​Workflow 1: PR Checks (.github/workflows/pr-checks.yaml)

​Workflow 2: CI Pipeline (.github/workflows/ci.yaml)

​Workflow 3: Quality Tests (.github/workflows/quality-tests.yaml)

​Workflow 4: Security Scan (.github/workflows/security-scan.yaml)

​Workflow 5: Release Pipeline (.github/workflows/release.yaml)

​Workflow 6: Stale Issues (.github/workflows/stale.yaml)

​Testing Strategy Matrix

​Deployment Validation

​Kubernetes Validation

​Docker Build Validation

​Alternatives Considered

​1. Jenkins

​2. GitLab CI/CD

​3. CircleCI

​4. Travis CI

​5. Manual Testing Only

​CI/CD Metrics

​Pipeline Performance

​Code Quality Impact

​Best Practices

​1. Fail Fast

​2. Matrix Testing

​3. Caching

​4. Conditional Workflows

​5. Secure Secrets

​Future Enhancements

​References

21. CI/CD Pipeline Strategy

Status

Category

Context

Decision

Pipeline Architecture

Consequences

Positive Consequences

Negative Consequences

Neutral Consequences

Implementation Details

Workflow 1: PR Checks (`.github/workflows/pr-checks.yaml`)

Workflow 2: CI Pipeline (`.github/workflows/ci.yaml`)

Workflow 3: Quality Tests (`.github/workflows/quality-tests.yaml`)

Workflow 4: Security Scan (`.github/workflows/security-scan.yaml`)

Workflow 5: Release Pipeline (`.github/workflows/release.yaml`)

Workflow 6: Stale Issues (`.github/workflows/stale.yaml`)

Testing Strategy Matrix

Deployment Validation

Kubernetes Validation

Docker Build Validation

Alternatives Considered

1. Jenkins

2. GitLab CI/CD

3. CircleCI

4. Travis CI

5. Manual Testing Only

CI/CD Metrics

Pipeline Performance

Code Quality Impact

Best Practices

1. Fail Fast

2. Matrix Testing

3. Caching

4. Conditional Workflows

5. Secure Secrets

Future Enhancements

References