mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-20 16:46:37 +08:00
🚨 Collection of Prometheus alerting rules
alertalertingalerting-rulesalertmanagerawesomecollectionexportergrafanamonitoringprometheusprometheus-alerting-rulespromqlqueryrulesupervision
LiteLLM (https://github.com/BerriAI/litellm) is a popular LLM-gateway/proxy that exposes Prometheus metrics via its built-in callback. There were no existing alerting rules for LiteLLM in this repo, despite its growing adoption as an OpenAI/Anthropic-compatible proxy. Added 3 alerts covering the most common operational concerns: 1. **LiteLLM provider spend over budget** — soft-warning on cumulative 24h spend per model-name regex. Useful when LiteLLM's native `provider_budget_config` hard-cap is unavailable, disabled, or buggy (e.g. BerriAI/litellm#26701). 2. **LiteLLM proxy failed requests rate high** — error-rate ratio alert for downstream LLM provider availability/auth issues. 3. **LiteLLM request latency p95 high** — histogram-quantile alert for downstream provider response-time degradation. All 3 rules tested via `promtool check rules` (SUCCESS) and validated on a real LiteLLM v1.83.7 production deployment. Reference: https://docs.litellm.ai/docs/proxy/prometheus |
||
|---|---|---|
| .github | ||
| _data | ||
| dist | ||
| site | ||
| .gitignore | ||
| .travis.yml | ||
| CLAUDE.md | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| package.json | ||
| README.md | ||
👋 Awesome Prometheus Alerts 
940+ production-ready Prometheus alerting rules for 90+ services — copy-paste YAML for Kubernetes, MySQL, Redis, Kafka, and more.
Collection available here: https://samber.github.io/awesome-prometheus-alerts
Sponsored by:
Cut Kubernetes & AI costs, boost application stability.
Better Stack lets you centralize, search, and visualize your logs.
✨ Contents
🚨 Rules
Basic resource monitoring
- Prometheus self-monitoring
- Host/Hardware
- SMART
- IPMI
- Docker Containers
- Blackbox
- Windows
- VMWare
- Proxmox VE
- Netdata
- eBPF
- Process Exporter
- Systemd
Databases
- MySQL
- PostgreSQL
- SQL Server
- Oracle Database
- Patroni
- PGBouncer
- Redis
- Memcached
- MongoDB
- Elasticsearch
- OpenSearch
- Meilisearch
- Cassandra
- Clickhouse
- CouchDB
- Solr
Message brokers
Proxies, load balancers and service meshes
Runtimes
Data engineering
Orchestrators
CI/CD
Network and security
- SpeedTest
- SSL/TLS
- cert-manager
- Juniper
- CoreDNS
- FreeSwitch
- Hashicorp Vault
- Keycloak
- Cloudflare
- SNMP
- Cilium
- WireGuard
Storage
Cloud providers
Observability
- Thanos
- Loki
- Promtail
- Cortex
- Grafana Tempo
- Grafana Mimir
- Grafana Alloy
- OpenTelemetry Collector
- Jaeger
Other
🤝 Contributing
Contributions from community (you!) are most welcome!
There are many ways to contribute: writing code, alerting rules, documentation, reporting issues, discussing better error tracking...
💫 Show your support
Give a ⭐️ if this project helped you!
📝 License
- Alert rules and content: Creative Commons CC BY 4.0
- Site source code: MIT
See LICENSE for details.
