Documentation Index
Fetch the complete documentation index at: https://mcp-server-langgraph.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Cache Management Strategy
Last Updated: 2025-11-17
Status: Active Policy
Related: CI/CD Strategy ADR, CI/CD Failure Prevention, Coverage Threshold Philosophy
Overview
GitHub Actions caching can significantly speed up CI/CD pipelines but can also cause subtle issues when caches become stale or corrupted. This document outlines strategies to prevent and resolve cache-related problems.
Problem Statement
What Happened (2025-11-17)
Symptom: uv pip check failing in CI with dependency conflicts, but passing locally:
# Local (✅ passes):
$ uv pip check
Checked 297 packages in 4ms
All installed packages are compatible
# CI (❌ fails):
$ uv pip check
error: Dependency conflicts detected
Root Cause: GitHub Actions cache corruption
- Cache key based on
pyproject.toml and uv.lock hashes
- Cache restored from previous run with different dependency state
uv sync --frozen installed from lockfile but cache had old packages
- Result: Frankensteined environment with mixed old/new dependencies
Solution: Cache version bump (v1 → v2)
# Before (v1):
key: ${{ runner.os }}-uv-${{ inputs.cache-key-prefix }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
# After (v2):
key: ${{ runner.os }}-uv-v2-${{ inputs.cache-key-prefix }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
Prevention Strategies
1. Cache Versioning (Primary Defense)
Implementation: Use version prefix in cache keys
# .github/actions/setup-python-deps/action.yml
cache:
key: ${{ runner.os }}-uv-v2-${{ inputs.cache-key-prefix }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
restore-keys: |
${{ runner.os }}-uv-v2-${{ inputs.cache-key-prefix }}-
${{ runner.os }}-uv-v2-
Best Practice: Include version in cache key with comment explaining when/why:
# Cache version v2: Bust stale caches causing dependency conflicts (2025-11-17)
# Increment version number to force fresh cache if conflicts persist
When to Increment:
- ✅ After major dependency updates (e.g., Python version, uv version)
- ✅ When dependency conflicts persist across multiple runs
- ✅ After significant changes to dependency resolution logic
- ✅ When cache behavior seems inconsistent between local and CI
When NOT to Increment:
- ❌ For every PR (defeats purpose of caching)
- ❌ Before investigating root cause
- ❌ As first resort for CI failures
2. Dependency Consistency Validation
Implementation: Validate dependencies after installation
- name: Validate dependency consistency
run: |
set -euo pipefail
if ! CHECK_OUTPUT=$(uv pip check 2>&1); then
echo "::warning::Dependency conflicts detected in CI (may be false positive):"
echo "$CHECK_OUTPUT"
echo ""
echo "Note: This check passes locally but may fail in CI due to environment differences."
echo "If tests pass despite this warning, the conflicts are likely non-breaking."
# Don't fail the job - let tests determine if there are real issues
else
echo "✓ All dependencies are consistent (no conflicts detected)"
fi
shell: bash
Philosophy:
- Fail loudly but don’t block: Show conflicts as warnings, let tests determine impact
- Capture output: Display actual conflicts for debugging
- Context matters: CI environment differences may cause false positives
3. Lockfile Validation
Implementation: Verify lockfile is current before using cache
- name: Validate lockfile is up-to-date
run: |
uv lock --check || {
echo "::error::uv.lock is out of date with pyproject.toml"
echo "Run 'uv lock' locally and commit the updated lockfile"
exit 1
}
echo "✓ Lockfile is current and valid"
Why This Helps:
- Prevents using cache with outdated lockfile
- Catches developer errors (forgot to run
uv lock)
- Ensures reproducible builds
4. Cache Scope Isolation
Implementation: Use different cache keys for different job types
# Unit tests
cache-key-prefix: 'unit-tests'
# Integration tests
cache-key-prefix: 'integration-tests'
# Quality tests
cache-key-prefix: 'quality-tests'
Why This Helps:
- Prevents cache poisoning between job types
- Allows different dependency extras per job type
- Enables targeted cache invalidation
5. Frozen Installation
Implementation: Always use --frozen flag with uv sync
- name: Install dependencies
run: |
uv venv --python ${{ inputs.python-version }}
uv sync --frozen --extra dev
Why --frozen:
- ✅ Fails if lockfile is out of sync (prevents drift)
- ✅ Never resolves dependencies (faster, reproducible)
- ✅ Guarantees exact versions from lockfile
Never use uv sync without --frozen in CI!
Detection Strategies
1. Early Failure Signals
Indicators of cache corruption:
# Signal 1: uv pip check fails in CI but passes locally
$ uv pip check # CI
error: Dependency conflicts detected
$ uv pip check # Local
All installed packages are compatible
# Signal 2: Inconsistent test failures across runs
# Same code, different results between CI runs
# Signal 3: Import errors for packages in lockfile
ModuleNotFoundError: No module named 'some_package'
# (but package is in uv.lock)
Monitoring:
- name: Cache diagnosis
if: failure()
run: |
echo "=== Cache Debug Info ==="
echo "Cache key: ${{ runner.os }}-uv-v2-..."
echo "Python version: $(python --version)"
echo "uv version: $(uv --version)"
echo ""
echo "=== Installed packages ==="
uv pip list
echo ""
echo "=== Expected packages (from lockfile) ==="
uv pip freeze
2. Automated Cache Health Checks
Implementation: Add periodic cache verification
# Run weekly to detect cache drift
- cron: '0 2 * * 0' # Sundays at 2 AM
jobs:
cache-health-check:
name: Verify Cache Health
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Setup with cache
uses: ./.github/actions/setup-python-deps
with:
python-version: '3.12'
extras: 'dev'
- name: Verify environment
run: |
# Check for conflicts
uv pip check
# Verify lockfile matches installed
uv sync --frozen --dry-run
# Run smoke tests
pytest tests/smoke/ -v
Resolution Strategies
When to use: Cache corruption is confirmed
Steps:
-
Increment cache version:
# Change v2 → v3
key: ${{ runner.os }}-uv-v3-${{ inputs.cache-key-prefix }}-...
-
Document the change:
# Cache version v3: Resolved dependency conflicts from PR #123 (2025-11-17)
# Previous issue: Package X version mismatch between cache and lockfile
-
Commit and push:
git commit -m "fix(ci): bump cache version to v3 (resolve dependency conflicts)"
git push
Impact: Next CI run will build fresh cache (slower first run, then fast again)
2. Manual Cache Clearing (Nuclear Option)
When to use: Cache versioning doesn’t work or cache is severely corrupted
Steps via GitHub UI:
- Go to repository → Actions → Caches
- Search for affected cache keys
- Delete problematic caches
- Re-run failed workflows
Steps via gh CLI:
# List all caches
gh cache list
# Delete specific cache
gh cache delete <cache-id>
# Delete all caches matching pattern
gh cache delete --pattern "*uv-v2-*"
Caution: ⚠️ Deletes cache for ALL branches - use sparingly!
3. Downgrade to Warning (Temporary Workaround)
When to use: Suspected false positive conflicts
Implementation:
# Before (blocking):
if ! uv pip check; then
echo "::error::Dependency conflicts detected"
exit 1
fi
# After (non-blocking):
if ! CHECK_OUTPUT=$(uv pip check 2>&1); then
echo "::warning::Dependency conflicts detected (may be false positive):"
echo "$CHECK_OUTPUT"
# Continue - let tests determine if real issue
else
echo "✓ All dependencies consistent"
fi
When to revert: After confirming tests pass consistently
Best Practices
Cache Key Design
✅ Good cache key structure:
# Format: $OS-$TOOL-$VERSION-$SCOPE-$HASH
key: ${{ runner.os }}-uv-v2-unit-tests-${{ hashFiles('pyproject.toml', 'uv.lock') }}
Components:
$OS: Platform (linux, macos, windows)
$TOOL: Package manager (uv, pip, poetry)
$VERSION: Cache schema version (v1, v2, v3…)
$SCOPE: Job type (unit-tests, integration-tests…)
$HASH: Dependency files hash
❌ Bad cache key examples:
# Too broad - pollutes across different dependency sets
key: ${{ runner.os }}-python-${{ hashFiles('*.toml') }}
# No version - can't invalidate easily
key: deps-${{ hashFiles('uv.lock') }}
# No scope - unit tests and integration tests share cache
key: uv-${{ hashFiles('pyproject.toml') }}
Restore Keys Strategy
Purpose: Fallback cache lookup when exact match not found
Recommended pattern:
cache-from: |
type=gha,scope=${{ matrix.variant }}-${{ platform }} # Exact match
type=gha,scope=${{ matrix.variant }} # Same variant, any platform
type=gha,scope=base # Base layer fallback
Considerations:
- ✅ More fallbacks = faster cache hits
- ❌ More fallbacks = higher risk of stale cache
- Balance: 2-3 restore keys maximum
Cache Retention
Default: GitHub Actions caches expire after 7 days of no access
Custom retention (not supported by GitHub Actions directly):
# Workaround: Force cache refresh by incrementing version periodically
# Add to weekly cron job:
- name: Refresh cache weekly
if: github.event_name == 'schedule'
run: |
# This will create new cache, old one expires in 7 days
uv sync --frozen
Monitoring & Alerting
Metrics to Track
1. Cache Hit Rate:
- name: Report cache metrics
run: |
if [ "$CACHE_HIT" = "true" ]; then
echo "✅ Cache hit - restored from cache"
else
echo "❌ Cache miss - building from scratch"
fi
echo "cache_hit=$CACHE_HIT" >> $GITHUB_OUTPUT
2. Dependency Check Success Rate:
- name: Track dependency check
id: dep_check
run: |
if uv pip check; then
echo "status=pass" >> $GITHUB_OUTPUT
else
echo "status=fail" >> $GITHUB_OUTPUT
fi
3. Cache Age:
# Get cache creation time
gh cache list --json key,createdAt | jq '.[] | select(.key | contains("uv-v2")) | {key, age: (now - (.createdAt | fromdateiso8601))}'
Automated Alerts
Slack notification on repeated cache issues:
- name: Alert on cache corruption
if: failure() && steps.dep_check.outputs.status == 'fail'
uses: slackapi/slack-github-action@v1
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK }}
payload: |
{
"text": "⚠️ CI cache corruption detected in ${{ github.repository }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Workflow*: ${{ github.workflow }}\n*Run*: <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|#${{ github.run_number }}>\n*Status*: Dependency check failed\n*Action*: Consider bumping cache version"
}
}
]
}
Troubleshooting Guide
Issue: “Dependency conflicts detected”
Diagnosis:
# 1. Check local environment
uv pip check # Should pass
# 2. Check lockfile is current
uv lock --check
# 3. Inspect CI cache
gh cache list | grep uv-v2
Solution Path:
- Try cache version bump (quickest)
- If persists, check for actual dependency conflicts
- If still failing, investigate lockfile generation differences
Issue: “ModuleNotFoundError in CI”
Diagnosis:
# 1. Verify package in lockfile
grep "package_name" uv.lock
# 2. Check if extras are correct
# CI workflow should match local extras:
uv sync --frozen --extra dev
Solution Path:
- Verify extras match between local and CI
- Check cache version compatibility
- Try fresh cache build (version bump)
Issue: “Tests pass locally but fail in CI”
Diagnosis:
# 1. Check Python version match
# Local:
python --version
# CI (from workflow):
python-version: '3.12'
# 2. Check uv version match
uv --version
Solution Path:
- Ensure Python versions match exactly
- Pin uv version in CI workflow
- Check for environment variable differences
Maintenance Schedule
Weekly
- ✅ Review cache hit rates
- ✅ Check for dependency check failures
- ✅ Monitor cache size growth
Monthly
- ✅ Audit cache keys for efficiency
- ✅ Review cache versioning strategy
- ✅ Clean up unused cache scopes
Quarterly
- ✅ Test cache invalidation procedure
- ✅ Review and update this documentation
- ✅ Evaluate new caching strategies
Changelog
2025-11-17: Cache v2 Migration
- Issue: Dependency conflicts in CI (passing locally)
- Root Cause: GitHub Actions cache corruption
- Solution: Bumped cache version v1 → v2
- Result: All dependency checks passing
Future Improvements
Questions? Open an issue or ask in #engineering Slack channel.