⚠️ Disclamer ⚠️

Alert thresholds depend on nature of applications.
Some queries may have arbitrary tolerance threshold.

Building an efficient an battle-tested monitoring platform takes time. 😉

0. Prometheus global configuration

{% highlight yaml %} # prometheus.yml global: scrape_interval: 15s ... rule_files: - 'alerts/*.yml' scrape_configs: ... {% endhighlight %} {% highlight yaml %} # alerts/example-redis.yml groups: - name: ExampleRedisGroup rules: - alert: ExampleRedisDown expr: redis_up{} == 0 for: 2m labels: severity: error annotations: summary: "Redis instance ($instance) down" description: "Whatever" {% endhighlight %}

{{ serviceIndex }}. {{ service.name }} {% if exporter.name %} : {% if exporter.doc_url %} {{ exporter.name }} {% else %} {{ exporter.name }} {% endif %} {% endif %} {% if nbrRules > 0 %} [copy all] {% endif %}
{% if nbrRules == 0 %} {% highlight javascript %} // @TODO: Please contribute => https://github.com/samber/awesome-prometheus-alerts 👋 {% endhighlight %} {% endif %}
- {{ serviceIndex }}.{{ ruleIndex }}. {{ rule.name }}
  
  {{ rule.description }} [copy]
  
  {% assign ruleName = rule.name | split: ' ' %} {% capture ruleNameCamelcase %}{% for word in ruleName %}{{ word | capitalize }} {% endfor %}{% endcapture %} {% highlight yaml %} {% for comment in comments %}# {{ comment | strip }} {% endfor %} - alert: {{ ruleNameCamelcase | remove: ' ' }} expr: {{ rule.query }} for: 5m labels: severity: {{ rule.severity }} annotations: summary: "{{ rule.name }} (instance {% raw %}{{ $labels.instance }}{% endraw %})" description: "{{ rule.description }}\n VALUE = {% raw %}{{ $value }}{% endraw %}\n LABELS: {% raw %}{{ $labels }}{% endraw %}" {% endhighlight %}

⚠️ Disclamer ⚠️

0. Prometheus global configuration

{{ serviceIndex }}. {{ service.name }} {% if exporter.name %} : {% if exporter.doc_url %} {{ exporter.name }} {% else %} {{ exporter.name }} {% endif %} {% endif %} {% if nbrRules > 0 %} [copy all] {% endif %}

{{ serviceIndex }}.{{ ruleIndex }}. {{ rule.name }}