* feat: add systemd_exporter alerting rules (7 rules)
Add new Systemd service under Basic resource monitoring with rules for:
- Unit failed/inactive state detection
- Service crash loop detection
- Task limit exhaustion
- Socket refused/high connections
- Timer missed trigger
* fix: narrow systemd unit inactive query to reduce noise
Add type="service" and name filter to the inactive unit alert
to avoid false positives from legitimately inactive units.
* feat: add Cloud providers alerting rules (33 rules across 4 exporters)
New "Cloud providers" category with rules for:
- AWS CloudWatch (13 rules): exporter health + EC2, RDS, SQS, ALB, Lambda
- Google Cloud / Stackdriver (5 rules): scrape health, API quotas, staleness
- DigitalOcean (10 rules): droplets, databases, k8s, load balancers, incidents
- Azure (5 rules): API errors, rate limits, collection performance
* fix: address PR review - move Cloud providers before Other, fix service name
- Move "Cloud providers" group before "Other" in rules.yml for consistent ordering
- Rename "Google Cloud / Stackdriver" to "Google Cloud Stackdriver" to avoid
awkward /-/ in generated anchors and dist/rules/ paths
- Fix README anchor link to match the new service name
* feat: add OpenStack alerting rules (openstack-exporter)
Add 20 alerting rules for openstack-exporter/openstack-exporter covering
Nova, Neutron, Cinder, Octavia, and Placement services.
* docs: add OpenStack to README services list
* fix: align OpenStack load balancer alert name with operating_status semantics
The operating_status label uses ONLINE/OFFLINE/DEGRADED/ERROR values,
not ACTIVE. Rename alert to "not online" and use the label in the
description for clarity.
* feat: add process-exporter alerting rules (ncabatoff/process-exporter)
* docs: add Process to README services list
* fix: address PR review feedback for process-exporter rules
- Rename service from "Process" to "Process Exporter" for clarity
- Fix grammar: "file descriptors usage" → "file descriptor usage"
- Clarify CPU alert description as core-equivalent percentage
- Rename "high disk IO" to "high disk write IO" for accuracy
* feat: add IPMI exporter alerting rules
Add 17 alerting rules for prometheus-community/ipmi_exporter covering
temperature, fan, voltage, current, power sensors, chassis status,
and system event log monitoring.
* docs: add IPMI to README service list
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Add 7 alerting rules for prometheus/snmp_exporter covering device
availability, interface status, error rates, bandwidth utilization,
and device restarts. Rules use standard IF-MIB and SNMPv2-MIB metrics.
* data: adding python/ruby/golang
* fix: address review feedback on runtime alerts
- JVM non-heap: guard against unbounded metaspace (max_bytes = -1)
- JVM old gen GC: note regex only matches CMS/G1/Parallel collectors
- JVM/Python file descriptors: note process_* metrics are generic
- Go memory usage: fix description (sys_bytes is runtime memory, not host)
- Go goroutine spike: use deriv() instead of rate() on gauge
- Go GC CPU fraction: note deprecation since Go 1.20
- Go GC duration: clarify quantile="1" is max, not p99
- Python uncollectable: use increase() on counter instead of raw threshold
- Add threshold comments for workload-dependent defaults
Adding rules to: prometheus, kubernetes, redis, docker and postgresql.
Arranging exporters into categories.
Showing number of rules.
Thanks to Gitlab for opensourcing alerting rules!