43. Cost Monitoring Dashboard

Date: 2025-11-02

Status

Proposed

Context

As the MCP server handles multiple LLM providers (Anthropic, OpenAI, Google Gemini, etc.) with varying pricing models, there’s a critical need for:

Real-time cost tracking: Monitor token usage and costs across all LLM providers
Budget monitoring: Track spending against budgets and alert on overages
Cost attribution: Break down costs by user, session, model, and feature
Trend analysis: Identify cost patterns and optimization opportunities
Financial accountability: Provide stakeholders with transparent cost visibility

Without centralized cost monitoring, organizations risk:

Unexpected LLM API bills
Inability to attribute costs to specific users or projects
Lack of visibility into cost optimization opportunities
Difficulty forecasting future spending

Decision

We will implement a Cost Monitoring Dashboard with the following architecture:

Backend: Cost Tracking API

Location: src/mcp_server_langgraph/monitoring/cost_tracker.py Components:

CostMetricsCollector: Captures token usage and calculates costs
CostAggregator: Aggregates costs by dimensions (user, model, session, feature)
BudgetMonitor: Tracks spending against budgets and triggers alerts
CostAPI: FastAPI endpoints for retrieving cost data

Data Model:

class TokenUsage(BaseModel):
    """Token usage for a single LLM call."""
    timestamp: datetime
    user_id: str
    session_id: str
    model: str
    provider: str  # anthropic, openai, google
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    estimated_cost_usd: Decimal
    feature: Optional[str]  # e.g., "chat", "tool_execution"
    metadata: Dict[str, Any]

class CostSummary(BaseModel):
    """Aggregated cost summary."""
    period_start: datetime
    period_end: datetime
    total_cost_usd: Decimal
    total_tokens: int
    request_count: int
    by_model: Dict[str, Decimal]
    by_user: Dict[str, Decimal]
    by_feature: Dict[str, Decimal]
    top_cost_sessions: List[Tuple[str, Decimal]]

API Endpoints:

GET  /api/cost/summary?period={period}&group_by={dimension}
GET  /api/cost/usage?start={start}&end={end}&user_id={user_id}
GET  /api/cost/budget?budget_id={id}
POST /api/cost/budget
GET  /api/cost/trends?metric={metric}&period={period}
GET  /api/cost/export?format={csv|json}

Storage: Time-Series Database

Options Evaluated:

Prometheus + PostgreSQL (Selected)
- Prometheus for real-time metrics
- PostgreSQL for detailed cost history
- Leverages existing infrastructure
ClickHouse (Alternative)
- Excellent for time-series analytics
- Requires new infrastructure
TimescaleDB (Alternative)
- PostgreSQL extension for time-series
- Good middle ground

Decision: Use Prometheus for metrics + PostgreSQL for detailed records

Prometheus: Counter metrics for tokens/costs (real-time)
PostgreSQL: Full cost records (audit trail, detailed queries)

Frontend: Grafana Dashboard

Primary Option: Grafana dashboard (leverages existing observability stack) Dashboard Panels:

Cost Overview
- Total spend (current month)
- Daily burn rate
- Budget utilization (%)
- Cost trend (7/30/90 days)
Usage Metrics
- Token usage by model
- Requests per model
- Average cost per request
- Peak usage times
Attribution
- Top users by cost
- Cost by feature/endpoint
- Session-level costs
- Department/team breakdown
Budget Monitoring
- Budget vs. actual
- Remaining budget
- Projected end-of-month cost
- Alert thresholds
Model Comparison
- Cost per model
- Token efficiency
- Response time vs. cost
- Quality metrics vs. cost

Grafana Configuration: deployments/helm/mcp-server-langgraph/dashboards/cost-monitoring.json

Alternative: React Dashboard (Optional enhancement)

For organizations wanting embedded dashboards: Location: src/mcp_server_langgraph/monitoring/dashboard/ Tech Stack:

React + TypeScript
Recharts for visualizations
Tailwind CSS for styling
Axios for API calls

Advantages:

Embedded in application
Custom branding
Interactive drill-downs
Export capabilities

Disadvantages:

Additional maintenance
Duplicates Grafana functionality

Cost Calculation Logic

Pricing Strategy:

# src/mcp_server_langgraph/monitoring/pricing.py

PRICING_TABLE = {
    "anthropic": {
        "claude-sonnet-4-5-20250929": {
            "input": Decimal("0.003"),   # $ per 1K tokens
            "output": Decimal("0.015"),
        },
        "claude-haiku-4-5-20251001": {
            "input": Decimal("0.0008"),
            "output": Decimal("0.004"),
        },
    },
    "openai": {
        "gpt-5.1": {
            "input": Decimal("0.00125"),
            "output": Decimal("0.01"),
        },
        "gpt-5-mini": {
            "input": Decimal("0.00015"),
            "output": Decimal("0.0006"),
        },
    },
    "google": {
        "gemini-2.5-flash": {
            "input": Decimal("0.0003"),  # $0.30 per 1M tokens
            "output": Decimal("0.0025"),  # $2.50 per 1M tokens
        },
    },
}

def calculate_cost(
    model: str,
    provider: str,
    prompt_tokens: int,
    completion_tokens: int
) -> Decimal:
    """Calculate cost for LLM call."""
    pricing = PRICING_TABLE[provider][model]
    input_cost = (Decimal(prompt_tokens) / 1000) * pricing["input"]
    output_cost = (Decimal(completion_tokens) / 1000) * pricing["output"]
    return input_cost + output_cost

Update Frequency: Pricing table updated monthly via configuration

Integration Points

1. LLM Factory Instrumentation Modify src/mcp_server_langgraph/llm/llm_factory.py:

from ..monitoring.cost_tracker import CostMetricsCollector

class LLMFactory:
    def __init__(self):
        self.cost_tracker = CostMetricsCollector()

    async def invoke(self, prompt, **kwargs):
        response = await super().invoke(prompt, **kwargs)

        # Track cost
        await self.cost_tracker.record_usage(
            user_id=kwargs.get("user_id"),
            session_id=kwargs.get("session_id"),
            model=self.model,
            provider=self.provider,
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )

        return response

2. Prometheus Metrics

from prometheus_client import Counter, Histogram

llm_token_usage = Counter(
    "llm_token_usage_total",
    "Total tokens used by LLM calls",
    ["provider", "model", "token_type"]  # token_type: input/output
)

llm_cost = Counter(
    "llm_cost_usd_total",
    "Total estimated cost in USD",
    ["provider", "model"]
)

llm_request_cost = Histogram(
    "llm_request_cost_usd",
    "Cost per request in USD",
    ["provider", "model"]
)

3. Budget Alerts

# src/mcp_server_langgraph/monitoring/budget_alerts.py

class BudgetMonitor:
    async def check_budget(self, budget_id: str):
        """Check if budget exceeded and send alerts."""
        budget = await self.get_budget(budget_id)
        current_spend = await self.get_period_spend(budget_id)

        utilization = current_spend / budget.limit

        if utilization >= 0.9:
            await self.send_alert(
                level="critical",
                message=f"Budget {budget.name} at {utilization*100:.1f}%"
            )
        elif utilization >= 0.75:
            await self.send_alert(
                level="warning",
                message=f"Budget {budget.name} at {utilization*100:.1f}%"
            )

Consequences

Positive

Financial Visibility: Real-time insight into LLM spending
Cost Control: Budget alerts prevent bill shock
Optimization Opportunities: Identify expensive operations
Accountability: Attribute costs to users/teams
Compliance: Audit trail for cost tracking (SOC 2)
Forecasting: Data for capacity planning

Negative

Storage Overhead: Cost data adds to database size
Performance Impact: Minimal (async cost tracking)
Maintenance: Pricing table requires monthly updates
Complexity: Additional monitoring infrastructure

Mitigations

Async Tracking: Cost recording happens asynchronously
Batch Writes: Aggregate cost data before writing to DB
Data Retention: Archive cost data older than 13 months
Pricing Automation: Consider API-based pricing updates

Implementation Plan

Phase 1: Backend (TDD)

Write tests for CostMetricsCollector
Implement CostMetricsCollector
Write tests for CostAPI endpoints
Implement CostAPI
Write tests for BudgetMonitor
Implement BudgetMonitor

Phase 2: Storage

Create PostgreSQL schema for cost_records
Set up Prometheus metrics
Configure data retention policies

Phase 3: Grafana Dashboard

Create cost-monitoring.json dashboard
Configure panels for all metrics
Set up alert rules

Phase 4: Integration

Instrument LLMFactory
Add cost tracking to all LLM calls
Deploy to staging
Validate accuracy

Phase 5: Alerts & Automation

Configure budget alerts
Set up cost anomaly detection
Create automated reports

Alternatives Considered

Alternative 1: Third-Party Cost Tracking (e.g., LangSmith, Helicone)

Pros:

No custom development
Advanced analytics out-of-the-box
Regular pricing updates

Cons:

Additional vendor dependency
Data privacy concerns
Monthly SaaS costs ($$$)
Limited customization

Decision: Build in-house for control and privacy

Alternative 2: Simple Logging (No Dashboard)

Pros:

Minimal implementation
Low overhead

Cons:

No visualization
Manual analysis required
No real-time alerts

Decision: Rejected - insufficient visibility

References

LiteLLM pricing: https://docs.litellm.ai/docs/pricing
Anthropic pricing: https://www.anthropic.com/pricing
OpenAI pricing: https://openai.com/pricing
Google Gemini pricing: https://ai.google.dev/pricing
Grafana dashboards: https://grafana.com/docs/grafana/latest/dashboards/
Prometheus best practices: https://prometheus.io/docs/practices/

ADR-0003: Dual Observability (Prometheus + OpenTelemetry)
ADR-0001: Multi-Provider LLM Support
ADR-0027: Rate Limiting Strategy (complements cost control)

Overview

Project

Core Platform

Authentication & Identity

Infrastructure & Deployment

Development & Quality

Testing Infrastructure

CI/CD & Operations

Tooling & Standards

Compliance

43. Cost Monitoring Dashboard

43. Cost Monitoring Dashboard

Status

Category

Context

Decision

Backend: Cost Tracking API

Storage: Time-Series Database

Frontend: Grafana Dashboard

Alternative: React Dashboard (Optional enhancement)

Cost Calculation Logic

Integration Points

Consequences

Positive

Negative

Mitigations

Implementation Plan

Phase 1: Backend (TDD)

Phase 2: Storage

Phase 3: Grafana Dashboard

Phase 4: Integration

Phase 5: Alerts & Automation

Alternatives Considered

Alternative 1: Third-Party Cost Tracking (e.g., LangSmith, Helicone)

Alternative 2: Simple Logging (No Dashboard)

References

Overview

Project

Core Platform

Authentication & Identity

Infrastructure & Deployment

Development & Quality

Testing Infrastructure

CI/CD & Operations

Tooling & Standards

Compliance

​43. Cost Monitoring Dashboard

​Status

​Category

​Context

​Decision

​Backend: Cost Tracking API

​Storage: Time-Series Database

​Frontend: Grafana Dashboard

​Alternative: React Dashboard (Optional enhancement)

​Cost Calculation Logic

​Integration Points

​Consequences

​Positive

​Negative

​Mitigations

​Implementation Plan

​Phase 1: Backend (TDD)

​Phase 2: Storage

​Phase 3: Grafana Dashboard

​Phase 4: Integration

​Phase 5: Alerts & Automation

​Alternatives Considered

​Alternative 1: Third-Party Cost Tracking (e.g., LangSmith, Helicone)

​Alternative 2: Simple Logging (No Dashboard)

​References

​Related ADRs

43. Cost Monitoring Dashboard

Status

Category

Context

Decision

Backend: Cost Tracking API

Storage: Time-Series Database

Frontend: Grafana Dashboard

Alternative: React Dashboard (Optional enhancement)

Cost Calculation Logic

Integration Points

Consequences

Positive

Negative

Mitigations

Implementation Plan

Phase 1: Backend (TDD)

Phase 2: Storage

Phase 3: Grafana Dashboard

Phase 4: Integration

Phase 5: Alerts & Automation

Alternatives Considered

Alternative 1: Third-Party Cost Tracking (e.g., LangSmith, Helicone)

Alternative 2: Simple Logging (No Dashboard)

References

Related ADRs