43. Cost Monitoring Dashboard
Date: 2025-11-02Status
ProposedCategory
Infrastructure & DeploymentContext
As the MCP server handles multiple LLM providers (Anthropic, OpenAI, Google Gemini, etc.) with varying pricing models, there’s a critical need for:- Real-time cost tracking: Monitor token usage and costs across all LLM providers
- Budget monitoring: Track spending against budgets and alert on overages
- Cost attribution: Break down costs by user, session, model, and feature
- Trend analysis: Identify cost patterns and optimization opportunities
- Financial accountability: Provide stakeholders with transparent cost visibility
- Unexpected LLM API bills
- Inability to attribute costs to specific users or projects
- Lack of visibility into cost optimization opportunities
- Difficulty forecasting future spending
Decision
We will implement a Cost Monitoring Dashboard with the following architecture:Backend: Cost Tracking API
Location:src/mcp_server_langgraph/monitoring/cost_tracker.py
Components:
- CostMetricsCollector: Captures token usage and calculates costs
- CostAggregator: Aggregates costs by dimensions (user, model, session, feature)
- BudgetMonitor: Tracks spending against budgets and triggers alerts
- CostAPI: FastAPI endpoints for retrieving cost data
Storage: Time-Series Database
Options Evaluated:-
Prometheus + PostgreSQL (Selected)
- Prometheus for real-time metrics
- PostgreSQL for detailed cost history
- Leverages existing infrastructure
-
ClickHouse (Alternative)
- Excellent for time-series analytics
- Requires new infrastructure
-
TimescaleDB (Alternative)
- PostgreSQL extension for time-series
- Good middle ground
- Prometheus: Counter metrics for tokens/costs (real-time)
- PostgreSQL: Full cost records (audit trail, detailed queries)
Frontend: Grafana Dashboard
Primary Option: Grafana dashboard (leverages existing observability stack) Dashboard Panels:-
Cost Overview
- Total spend (current month)
- Daily burn rate
- Budget utilization (%)
- Cost trend (7/30/90 days)
-
Usage Metrics
- Token usage by model
- Requests per model
- Average cost per request
- Peak usage times
-
Attribution
- Top users by cost
- Cost by feature/endpoint
- Session-level costs
- Department/team breakdown
-
Budget Monitoring
- Budget vs. actual
- Remaining budget
- Projected end-of-month cost
- Alert thresholds
-
Model Comparison
- Cost per model
- Token efficiency
- Response time vs. cost
- Quality metrics vs. cost
deployments/helm/mcp-server-langgraph/dashboards/cost-monitoring.json
Alternative: React Dashboard (Optional enhancement)
For organizations wanting embedded dashboards: Location:src/mcp_server_langgraph/monitoring/dashboard/
Tech Stack:
- React + TypeScript
- Recharts for visualizations
- Tailwind CSS for styling
- Axios for API calls
- Embedded in application
- Custom branding
- Interactive drill-downs
- Export capabilities
- Additional maintenance
- Duplicates Grafana functionality
Cost Calculation Logic
Pricing Strategy:Integration Points
1. LLM Factory Instrumentation Modifysrc/mcp_server_langgraph/llm/llm_factory.py:
Consequences
Positive
- Financial Visibility: Real-time insight into LLM spending
- Cost Control: Budget alerts prevent bill shock
- Optimization Opportunities: Identify expensive operations
- Accountability: Attribute costs to users/teams
- Compliance: Audit trail for cost tracking (SOC 2)
- Forecasting: Data for capacity planning
Negative
- Storage Overhead: Cost data adds to database size
- Performance Impact: Minimal (async cost tracking)
- Maintenance: Pricing table requires monthly updates
- Complexity: Additional monitoring infrastructure
Mitigations
- Async Tracking: Cost recording happens asynchronously
- Batch Writes: Aggregate cost data before writing to DB
- Data Retention: Archive cost data older than 13 months
- Pricing Automation: Consider API-based pricing updates
Implementation Plan
Phase 1: Backend (TDD)
- Write tests for CostMetricsCollector
- Implement CostMetricsCollector
- Write tests for CostAPI endpoints
- Implement CostAPI
- Write tests for BudgetMonitor
- Implement BudgetMonitor
Phase 2: Storage
- Create PostgreSQL schema for cost_records
- Set up Prometheus metrics
- Configure data retention policies
Phase 3: Grafana Dashboard
- Create cost-monitoring.json dashboard
- Configure panels for all metrics
- Set up alert rules
Phase 4: Integration
- Instrument LLMFactory
- Add cost tracking to all LLM calls
- Deploy to staging
- Validate accuracy
Phase 5: Alerts & Automation
- Configure budget alerts
- Set up cost anomaly detection
- Create automated reports
Alternatives Considered
Alternative 1: Third-Party Cost Tracking (e.g., LangSmith, Helicone)
Pros:- No custom development
- Advanced analytics out-of-the-box
- Regular pricing updates
- Additional vendor dependency
- Data privacy concerns
- Monthly SaaS costs ($$$)
- Limited customization
Alternative 2: Simple Logging (No Dashboard)
Pros:- Minimal implementation
- Low overhead
- No visualization
- Manual analysis required
- No real-time alerts
References
- LiteLLM pricing: https://docs.litellm.ai/docs/pricing
- Anthropic pricing: https://www.anthropic.com/pricing
- OpenAI pricing: https://openai.com/pricing
- Google Gemini pricing: https://ai.google.dev/pricing
- Grafana dashboards: https://grafana.com/docs/grafana/latest/dashboards/
- Prometheus best practices: https://prometheus.io/docs/practices/
Related ADRs
- ADR-0003: Dual Observability (Prometheus + OpenTelemetry)
- ADR-0001: Multi-Provider LLM Support
- ADR-0027: Rate Limiting Strategy (complements cost control)