Grafana Dashboards
Install Grafana
Application Dashboard
Import this JSON dashboard:LLM Observability Dashboard
Production-Ready Dashboards (v2.1.0)
NEW in v2.1.0 - 7 production-ready Grafana dashboards covering authentication, authorization, LLM performance, and infrastructure metrics.
monitoring/grafana/dashboards/.
Authentication
authentication.json
- Login activity rate (attempts, success, failures)
- Login failure rate gauge with thresholds
- Response time percentiles (p50, p95, p99)
- Active sessions count
- Token operations (create, verify, refresh)
- JWKS cache performance
OpenFGA Authorization
openfga.json
- Authorization check rate (total, allowed, denied)
- Denial rate gauge
- Total relationship tuples
- Check latency percentiles
- Tuple write operations
- Role sync operations and latency
LLM Performance
llm-performance.json
- Agent call rate (successful/failed)
- Error rate gauge
- Response time percentiles
- Tool calls rate
- LLM invocations by model
- Fallback model usage
Keycloak SSO
keycloak.json
- Service status gauge
- Response time (p50, p95, p99)
- Login request rate
- Error rates (login, token refresh)
- Active sessions and users
- Resource utilization (CPU, memory)
Redis Sessions
redis-sessions.json
- Service status and memory usage
- Active sessions (key count)
- Operations rate (commands/sec)
- Connection pool utilization
- Session evictions
- Memory fragmentation ratio
Security
security.json
- Auth/AuthZ failures per second
- JWT validation errors
- Security status gauge
- Failures by reason and resource
- Failed attempts by user/IP
- Top 10 violators table
Overview
mcp-server-langgraph.json
- Service status uptime gauge
- Request rate by tool
- Error rate percentage
- Response time percentiles
- Memory and CPU usage per pod
- Request success/failure count
Import Dashboards
Option 1: Grafana UI (Manual)- Open Grafana at http://localhost:3000
- Navigate to Dashboards → Import
- Click Upload JSON file
- Select dashboard file from
monitoring/grafana/dashboards/ - Select Prometheus datasource
- Click Import
values.yaml:
Dashboard Features
All production dashboards include:- Auto-refresh - 10-second refresh rate for real-time monitoring
- Time range presets - Last 5m, 15m, 1h, 6h, 24h, 7d
- Thresholds - Color-coded gauges (green/yellow/red)
- Cross-links - Navigate between related dashboards
- Legend tables - Current, max, and mean values
- Panel descriptions - Hover tooltips explaining metrics
Required Metrics
Ensure these metrics are exposed by the application: Authentication (authentication.json):Service Level Objectives (SLOs)
NEW in v2.1.0 - Pre-computed SLO metrics via Prometheus recording rules for efficient monitoring and alerting.
SLO Recording Rules
Themonitoring/prometheus/rules/slo-recording-rules.yaml file contains 40+ recording rules that pre-compute Service Level Indicators (SLIs) for fast querying in Grafana.
Load recording rules:
Available SLO Metrics
- Availability
- Latency
- Error Rate
- Saturation
- Error Budget
- Compliance
Target: 99.9% uptimeUsage in Grafana:
SLO Dashboard Example
Create an SLO summary dashboard:Benefits of SLO Recording Rules
- Performance - Pre-computed metrics query 10-100x faster
- Consistency - Same calculation across all dashboards
- Alerting - Alert on SLO violations, not raw metrics
- Reporting - Historical SLO compliance tracking
- Error Budgets - Multi-window burn rate detection