awesome-prometheus-alerts/README.md
Samuel Berthe e6cdcdb9e5 feat: add Apache Flink and Apache Spark alerting rules
Add 20 new alerting rules under the Runtimes category:
- Apache Flink (12 rules): job status, TaskManager registration, slot
  availability, restarts, checkpoints, backpressure, heap memory, GC,
  and record processing
- Apache Spark (8 rules): worker health, waiting apps, memory/cores
  exhaustion, executor GC, task failures, and disk spill
2026-03-16 04:46:00 +01:00

8.1 KiB
Raw Blame History

👋 Awesome Prometheus Alerts Awesome

Most alerting rules are common to every Prometheus setup. We need a place to find them all. 🤘 🚨 📊

Collection available here: https://samber.github.io/awesome-prometheus-alerts

Contents

🚨 Rules

Basic resource monitoring

Databases and brokers

Reverse proxies and load balancers

Runtimes

Orchestrators

Network, security and storage

Other

🤝 Contributing

Contributions from community (you!) are most welcome!

There are many ways to contribute: writing code, alerting rules, documentation, reporting issues, discussing better error tracking...

Instructions here

🏋️ Improvements

  • Create an alert rule builder in Jekyll for custom alerts (severity, thresholds, instances...)
  • Add resolution suggestions to rule descriptions, for faster incident resolution (#85).

💫 Show your support

Give a if this project helped you!

support us

📝 License

CC4

Licensed under the Creative Commons 4.0 License, see LICENSE file for more detail.