awesome-prometheus-alerts/dist/rules/nats/nats-exporter.yml
Somrat Dutta 8c0bdc2b24
feat: Add NATS and JetStream Prometheus alert rules (#430)
* feat: Add comprehensive NATS and JetStream Prometheus alert rules

- Added multiple Prometheus alert rules for monitoring NATS server and JetStream metrics.
- Included alerts for:
  - High connection count
  - High pending bytes
  - High subscriptions count
  - High routes count
  - High memory usage
  - Slow consumers
  - NATS server downtime
  - High CPU usage
  - High number of active connections
  - High JetStream store and memory usage
  - Subscription limits exceeded
  - High pending messages
  - Authentication timeouts
  - Errors in NATS (JetStream API errors)
  - JetStream consumers limit exceeded
  - Exceeding max payload size
  - Leaf node connection issues
  - Ping operations limit exceeded
  - Write deadline exceeded
- Ensured consistency between `exporter.yml` and `rules.yml` files.
- Improved overall NATS and JetStream monitoring to prevent performance degradation and ensure system reliability.

This commit enhances the visibility of NATS and JetStream operations by providing key metrics to alert on potential issues and optimize system performance.

* Update rules.yml

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* - minor changes, rollback rules.yml
- address comment changes
- revert to old rules.yml as they are generated

* fix indentation

---------

Co-authored-by: somratdutta <duttasomratand.com>
Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
Co-authored-by: somrat.dutta <somrat.dutta@nutanix.com>
2024-08-20 20:37:03 +02:00

41 lines
No EOL
1.5 KiB
YAML

groups:
- name: NatsExporter
rules:
- alert: NatsHighConnectionCount
expr: 'gnatsd_varz_connections > 100'
for: 3m
labels:
severity: warning
annotations:
summary: Nats high connection count (instance {{ $labels.instance }})
description: "High number of NATS connections ({{ $value }}) for {{ $labels.instance }}\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: NatsHighPendingBytes
expr: 'gnatsd_connz_pending_bytes > 100000'
for: 3m
labels:
severity: warning
annotations:
summary: Nats high pending bytes (instance {{ $labels.instance }})
description: "High number of NATS pending bytes ({{ $value }}) for {{ $labels.instance }}\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: NatsHighSubscriptionsCount
expr: 'gnatsd_connz_subscriptions > 50'
for: 3m
labels:
severity: warning
annotations:
summary: Nats high subscriptions count (instance {{ $labels.instance }})
description: "High number of NATS subscriptions ({{ $value }}) for {{ $labels.instance }}\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: NatsHighRoutesCount
expr: 'gnatsd_routez_num_routes > 10'
for: 3m
labels:
severity: warning
annotations:
summary: Nats high routes count (instance {{ $labels.instance }})
description: "High number of NATS routes ({{ $value }}) for {{ $labels.instance }}\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"