⚠️ Disclamer ⚠️

Alert threshold depends on nature of application.
Some query may have arbitrary tolerance threshold.

Building an efficient an battle-tested monitoring platform takes time. 😉

0. Prometheus global configuration

{% highlight yaml %} # prometheus.yml global: scrape_interval: 15s ... rule_files: - 'alerts/*.yml' scrape_configs: ... {% endhighlight %} {% highlight yaml %} # alerts/example-redis.yml groups: - name: ExampleRedisGroup rules: - alert: ExampleRedisDown expr: redis_up{} == 0 for: 2m labels: severity: error annotations: summary: "Redis instance ($instance) down" description: "Whatever" {% endhighlight %}