mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-25 02:46:59 +08:00
🚨 Collection of Prometheus alerting rules
alertalertingalerting-rulesalertmanagerawesomecollectionexportergrafanamonitoringprometheusprometheus-alerting-rulespromqlqueryrulesupervision
- Replace rate()/increase() with deriv()/delta() on gauge metrics:
node_vmstat_pgmajfault, cassandra_stats (criteo exporter),
gitlab_ci_pipeline_failure_reasons, flink_taskmanager_job_task_numRecordsIn
- Fix histogram_quantile on non-_bucket metric: cilium_policy_implementation_delay
- Fix Thanos bucket replicate latency: use _count instead of _bucket for guard clause
- Fix Thanos query latency: use _count instead of _bucket for guard clause
- Restore job filter in Thanos objstore guard clauses (compact + store)
- Remove redundant job= filters from unique metrics: ~30 Thanos rules,
kube_persistentvolume_status_phase, otelcol_process_runtime_*
- Fix high-cardinality Istio latency grouping (drop source labels from by())
- Add division-by-zero guard to host context switch ratio
- Raise noisy ClickHouse thresholds: RejectedInserts > 2, DelayedInserts > 10
- Remove redundant for: 1m from HAProxy check failure rules
- Add job rename comments to up{job=...} rules (Hadoop, OpenStack, SNMP, OTel)
- Remove external mixin references from comments
- Fix Tempo dropped spans metric name: add missing _total suffix
- Fix Thanos bucket replicate run latency: add missing le label in by()
|
||
|---|---|---|
| .github | ||
| _data | ||
| _layouts | ||
| assets | ||
| dist | ||
| .gitignore | ||
| .travis.yml | ||
| _config.yml | ||
| alertmanager.md | ||
| blackbox-exporter.md | ||
| CLAUDE.md | ||
| CONTRIBUTING.md | ||
| docker-compose.yml | ||
| Gemfile | ||
| Gemfile.lock | ||
| index.md | ||
| LICENSE | ||
| package.json | ||
| README.md | ||
| rules.md | ||
| sleep-peacefully.md | ||
👋 Awesome Prometheus Alerts 
Most alerting rules are common to every Prometheus setup. We need a place to find them all. 🤘 🚨 📊
Collection available here: https://samber.github.io/awesome-prometheus-alerts
Sponsored by:
Cut Kubernetes & AI costs, boost application stability.
Better Stack lets you centralize, search, and visualize your logs.
✨ Contents
🚨 Rules
Basic resource monitoring
- Prometheus self-monitoring
- Host/Hardware
- SMART
- IPMI
- Docker Containers
- Blackbox
- Windows
- VMWare
- Proxmox VE
- Netdata
- eBPF
- Process Exporter
- Systemd
Databases
- MySQL
- PostgreSQL
- SQL Server
- Oracle Database
- Patroni
- PGBouncer
- Redis
- Memcached
- MongoDB
- Elasticsearch
- Meilisearch
- Cassandra
- Clickhouse
- CouchDB
- Solr
Message brokers
Proxies, load balancers and service meshes
Runtimes
Data engineering
Orchestrators
CI/CD
Network and security
- SpeedTest
- SSL/TLS
- cert-manager
- Juniper
- CoreDNS
- FreeSwitch
- Hashicorp Vault
- Keycloak
- Cloudflare
- SNMP
- Cilium
- WireGuard
Storage
Cloud providers
Observability
- Thanos
- Loki
- Promtail
- Cortex
- Grafana Tempo
- Grafana Mimir
- Grafana Alloy
- OpenTelemetry Collector
- Jaeger
Other
🤝 Contributing
Contributions from community (you!) are most welcome!
There are many ways to contribute: writing code, alerting rules, documentation, reporting issues, discussing better error tracking...
🏋️ Improvements
- Create an alert rule builder in Jekyll for custom alerts (severity, thresholds, instances...)
- Add resolution suggestions to rule descriptions, for faster incident resolution (#85).
💫 Show your support
Give a ⭐️ if this project helped you!
📝 License
Licensed under the Creative Commons 4.0 License, see LICENSE file for more detail.
