mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-24 18:36:59 +08:00
refactor: remove previous website
This commit is contained in:
parent
cc6835cdf0
commit
9b995315d5
14 changed files with 29 additions and 1082 deletions
6
.gitignore
vendored
6
.gitignore
vendored
|
|
@ -1,9 +1,3 @@
|
|||
# Jekyll (legacy)
|
||||
_site/
|
||||
.sass-cache/
|
||||
.jekyll-cache/
|
||||
.jekyll-metadata
|
||||
|
||||
# Generated data
|
||||
_data/rules.json
|
||||
test/rules/
|
||||
|
|
|
|||
37
CLAUDE.md
37
CLAUDE.md
|
|
@ -6,17 +6,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|||
|
||||
A curated collection of ~940 Prometheus alerting rules covering 90+ services across 100+ exporters, organized in categories: basic resource monitoring (Prometheus, host/hardware, SMART, Docker, Blackbox, Windows, VMware, Netdata), databases (MySQL, PostgreSQL, Redis, MongoDB, Elasticsearch, Cassandra, Clickhouse, CouchDB, etc.), message brokers (RabbitMQ, Kafka, Pulsar, Nats, Zookeeper), proxies/load balancers/service meshes (Nginx, Apache, HaProxy, Traefik, Caddy, Linkerd, Istio), runtimes (PHP-FPM, JVM, Sidekiq), data engineering (Apache Flink, Apache Spark, Hadoop), orchestrators (Kubernetes, Nomad, Consul, Etcd, OpenStack), CI/CD (Jenkins, ArgoCD, FluxCD, GitLab CI, Spinnaker), network and security (SSL/TLS, CoreDNS, Vault, Cloudflare, Cilium, eBPF), storage (Ceph, ZFS, OpenEBS, Minio), cloud providers (AWS, Azure, DigitalOcean), observability (Thanos, Loki, Cortex, OpenTelemetry Collector, Grafana Tempo/Mimir/Alloy, Jaeger), and other (APC UPS, Graph Node).
|
||||
|
||||
All rules are stored in a single YAML data file (`_data/rules.yml`) and rendered as a Jekyll-based GitHub Pages site at https://samber.github.io/awesome-prometheus-alerts. The site provides copy-pasteable Prometheus alert snippets and downloadable rule files per exporter.
|
||||
All rules are stored in a single YAML data file (`_data/rules.yml`) and rendered as a static site built with Astro + TypeScript (located in `site/`). The site provides copy-pasteable Prometheus alert snippets and downloadable rule files per exporter.
|
||||
|
||||
The project is community-driven. Most contributions are PRs adding or updating rules in `_data/rules.yml`. Files in `dist/rules/` are auto-generated on merge — never edit them manually.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **`_data/rules.yml`** — The single source of truth for all alerting rules. This is the main file contributors edit. It is NOT a valid Prometheus config; the site renders each rule into copy-pasteable Prometheus alert format.
|
||||
- **`rules.md`** — Jekyll template that iterates over `_data/rules.yml` and renders the rules page with copy buttons and formatted YAML blocks.
|
||||
- **`alertmanager.md`** — Static page with Prometheus/AlertManager configuration examples.
|
||||
- **`_layouts/default.html`** — Site layout (Jekyll theme: cayman).
|
||||
- **`_config.yml`** — Jekyll configuration.
|
||||
- **`site/`** — Astro + TypeScript static site. Run `npm run dev` inside this directory to develop locally.
|
||||
- **`site/src/data/rules.ts`** — Typed wrappers and helper functions over `_data/rules.yml`.
|
||||
- **`site/src/data/site.ts`** — Shared site metadata constants (URLs, author, schema objects).
|
||||
- **`site/src/pages/`** — Astro page routes: `index.astro` (homepage), `rules/[group]/[service].astro` (per-service rule pages), `alertmanager.astro`, `blackbox-exporter.astro`, `sleep-peacefully.astro` (guides).
|
||||
- **`site/src/layouts/BaseLayout.astro`** — Root HTML layout (SEO, GA, dark mode).
|
||||
- **`site/src/layouts/GuideLayout.astro`** — Layout for guide pages (TOC, hero, related guides).
|
||||
- **`site/src/components/`** — Shared Astro components (Header, Footer, Sidebar, RuleCard, ExporterSection, etc.).
|
||||
- **`site/astro.config.mjs`** — Astro configuration (sitemap, Vite YAML plugin, base URL).
|
||||
- **`dist/rules/`** — Pre-built downloadable rule files organized by service/exporter (referenced in the site for `wget` commands).
|
||||
|
||||
## Rules YAML Structure
|
||||
|
|
@ -50,19 +54,20 @@ Services are grouped in category. If you are not sure about the classification,
|
|||
## Running Locally
|
||||
|
||||
```bash
|
||||
# With Ruby/Bundler
|
||||
gem install bundler
|
||||
bundle install
|
||||
jekyll serve
|
||||
|
||||
# With Docker Compose
|
||||
docker compose up -d
|
||||
|
||||
# With Docker directly
|
||||
docker run --rm -it -p 4000:4000 -v $(pwd):/srv/jekyll jekyll/jekyll jekyll serve
|
||||
cd site
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Site serves at http://localhost:4000/awesome-prometheus-alerts.
|
||||
Site serves at http://localhost:4321/awesome-prometheus-alerts.
|
||||
|
||||
To build for production:
|
||||
|
||||
```bash
|
||||
cd site
|
||||
npm run build
|
||||
npm run preview
|
||||
```
|
||||
|
||||
## Contributing Rules
|
||||
|
||||
|
|
|
|||
|
|
@ -16,24 +16,16 @@ Please ensure your pull request adheres to the following guidelines:
|
|||
- Description must be factual (the "what?") and should provide root cause suggestions (the "why?"), for faster resolution.
|
||||
- Queries must be tested on latest exporter version.
|
||||
|
||||
## Improving Github page
|
||||
## Improving the website
|
||||
|
||||
The site is built with Astro + TypeScript, located in `site/`.
|
||||
|
||||
### Run locally
|
||||
|
||||
```
|
||||
gem install bundler
|
||||
bundle install
|
||||
jekyll serve
|
||||
cd site
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Or with Docker:
|
||||
|
||||
```
|
||||
docker run --rm -it -p 4000:4000 -v $(pwd):/srv/jekyll jekyll/jekyll jekyll serve
|
||||
```
|
||||
|
||||
Or with Docker Compose:
|
||||
|
||||
```
|
||||
docker compose up -d
|
||||
```
|
||||
Site serves at http://localhost:4321/awesome-prometheus-alerts.
|
||||
|
|
|
|||
3
Gemfile
3
Gemfile
|
|
@ -1,3 +0,0 @@
|
|||
source 'https://rubygems.org'
|
||||
gem 'github-pages', '>= 232', group: :jekyll_plugins
|
||||
gem 'webrick', '~> 1.8'
|
||||
293
Gemfile.lock
293
Gemfile.lock
|
|
@ -1,293 +0,0 @@
|
|||
GEM
|
||||
remote: https://rubygems.org/
|
||||
specs:
|
||||
activesupport (7.2.3.1)
|
||||
base64
|
||||
benchmark (>= 0.3)
|
||||
bigdecimal
|
||||
concurrent-ruby (~> 1.0, >= 1.3.1)
|
||||
connection_pool (>= 2.2.5)
|
||||
drb
|
||||
i18n (>= 1.6, < 2)
|
||||
logger (>= 1.4.2)
|
||||
minitest (>= 5.1, < 6)
|
||||
securerandom (>= 0.3)
|
||||
tzinfo (~> 2.0, >= 2.0.5)
|
||||
addressable (2.8.9)
|
||||
public_suffix (>= 2.0.2, < 8.0)
|
||||
base64 (0.3.0)
|
||||
benchmark (0.5.0)
|
||||
bigdecimal (4.0.1)
|
||||
coffee-script (2.4.1)
|
||||
coffee-script-source
|
||||
execjs
|
||||
coffee-script-source (1.12.2)
|
||||
colorator (1.1.0)
|
||||
commonmarker (0.23.12)
|
||||
concurrent-ruby (1.3.6)
|
||||
connection_pool (3.0.2)
|
||||
csv (3.3.5)
|
||||
dnsruby (1.73.1)
|
||||
base64 (>= 0.2)
|
||||
logger (~> 1.6)
|
||||
simpleidn (~> 0.2.1)
|
||||
drb (2.2.3)
|
||||
em-websocket (0.5.3)
|
||||
eventmachine (>= 0.12.9)
|
||||
http_parser.rb (~> 0)
|
||||
ethon (0.18.0)
|
||||
ffi (>= 1.15.0)
|
||||
logger
|
||||
eventmachine (1.2.7)
|
||||
execjs (2.10.0)
|
||||
faraday (2.14.1)
|
||||
faraday-net_http (>= 2.0, < 3.5)
|
||||
json
|
||||
logger
|
||||
faraday-net_http (3.4.2)
|
||||
net-http (~> 0.5)
|
||||
ffi (1.17.3)
|
||||
ffi (1.17.3-x86_64-linux-gnu)
|
||||
ffi (1.17.3-x86_64-linux-musl)
|
||||
forwardable-extended (2.6.0)
|
||||
gemoji (4.1.0)
|
||||
github-pages (232)
|
||||
github-pages-health-check (= 1.18.2)
|
||||
jekyll (= 3.10.0)
|
||||
jekyll-avatar (= 0.8.0)
|
||||
jekyll-coffeescript (= 1.2.2)
|
||||
jekyll-commonmark-ghpages (= 0.5.1)
|
||||
jekyll-default-layout (= 0.1.5)
|
||||
jekyll-feed (= 0.17.0)
|
||||
jekyll-gist (= 1.5.0)
|
||||
jekyll-github-metadata (= 2.16.1)
|
||||
jekyll-include-cache (= 0.2.1)
|
||||
jekyll-mentions (= 1.6.0)
|
||||
jekyll-optional-front-matter (= 0.3.2)
|
||||
jekyll-paginate (= 1.1.0)
|
||||
jekyll-readme-index (= 0.3.0)
|
||||
jekyll-redirect-from (= 0.16.0)
|
||||
jekyll-relative-links (= 0.6.1)
|
||||
jekyll-remote-theme (= 0.4.3)
|
||||
jekyll-sass-converter (= 1.5.2)
|
||||
jekyll-seo-tag (= 2.8.0)
|
||||
jekyll-sitemap (= 1.4.0)
|
||||
jekyll-swiss (= 1.0.0)
|
||||
jekyll-theme-architect (= 0.2.0)
|
||||
jekyll-theme-cayman (= 0.2.0)
|
||||
jekyll-theme-dinky (= 0.2.0)
|
||||
jekyll-theme-hacker (= 0.2.0)
|
||||
jekyll-theme-leap-day (= 0.2.0)
|
||||
jekyll-theme-merlot (= 0.2.0)
|
||||
jekyll-theme-midnight (= 0.2.0)
|
||||
jekyll-theme-minimal (= 0.2.0)
|
||||
jekyll-theme-modernist (= 0.2.0)
|
||||
jekyll-theme-primer (= 0.6.0)
|
||||
jekyll-theme-slate (= 0.2.0)
|
||||
jekyll-theme-tactile (= 0.2.0)
|
||||
jekyll-theme-time-machine (= 0.2.0)
|
||||
jekyll-titles-from-headings (= 0.5.3)
|
||||
jemoji (= 0.13.0)
|
||||
kramdown (= 2.4.0)
|
||||
kramdown-parser-gfm (= 1.1.0)
|
||||
liquid (= 4.0.4)
|
||||
mercenary (~> 0.3)
|
||||
minima (= 2.5.1)
|
||||
nokogiri (>= 1.16.2, < 2.0)
|
||||
rouge (= 3.30.0)
|
||||
terminal-table (~> 1.4)
|
||||
webrick (~> 1.8)
|
||||
github-pages-health-check (1.18.2)
|
||||
addressable (~> 2.3)
|
||||
dnsruby (~> 1.60)
|
||||
octokit (>= 4, < 8)
|
||||
public_suffix (>= 3.0, < 6.0)
|
||||
typhoeus (~> 1.3)
|
||||
html-pipeline (2.14.3)
|
||||
activesupport (>= 2)
|
||||
nokogiri (>= 1.4)
|
||||
http_parser.rb (0.8.1)
|
||||
i18n (1.14.8)
|
||||
concurrent-ruby (~> 1.0)
|
||||
jekyll (3.10.0)
|
||||
addressable (~> 2.4)
|
||||
colorator (~> 1.0)
|
||||
csv (~> 3.0)
|
||||
em-websocket (~> 0.5)
|
||||
i18n (>= 0.7, < 2)
|
||||
jekyll-sass-converter (~> 1.0)
|
||||
jekyll-watch (~> 2.0)
|
||||
kramdown (>= 1.17, < 3)
|
||||
liquid (~> 4.0)
|
||||
mercenary (~> 0.3.3)
|
||||
pathutil (~> 0.9)
|
||||
rouge (>= 1.7, < 4)
|
||||
safe_yaml (~> 1.0)
|
||||
webrick (>= 1.0)
|
||||
jekyll-avatar (0.8.0)
|
||||
jekyll (>= 3.0, < 5.0)
|
||||
jekyll-coffeescript (1.2.2)
|
||||
coffee-script (~> 2.2)
|
||||
coffee-script-source (~> 1.12)
|
||||
jekyll-commonmark (1.4.0)
|
||||
commonmarker (~> 0.22)
|
||||
jekyll-commonmark-ghpages (0.5.1)
|
||||
commonmarker (>= 0.23.7, < 1.1.0)
|
||||
jekyll (>= 3.9, < 4.0)
|
||||
jekyll-commonmark (~> 1.4.0)
|
||||
rouge (>= 2.0, < 5.0)
|
||||
jekyll-default-layout (0.1.5)
|
||||
jekyll (>= 3.0, < 5.0)
|
||||
jekyll-feed (0.17.0)
|
||||
jekyll (>= 3.7, < 5.0)
|
||||
jekyll-gist (1.5.0)
|
||||
octokit (~> 4.2)
|
||||
jekyll-github-metadata (2.16.1)
|
||||
jekyll (>= 3.4, < 5.0)
|
||||
octokit (>= 4, < 7, != 4.4.0)
|
||||
jekyll-include-cache (0.2.1)
|
||||
jekyll (>= 3.7, < 5.0)
|
||||
jekyll-mentions (1.6.0)
|
||||
html-pipeline (~> 2.3)
|
||||
jekyll (>= 3.7, < 5.0)
|
||||
jekyll-optional-front-matter (0.3.2)
|
||||
jekyll (>= 3.0, < 5.0)
|
||||
jekyll-paginate (1.1.0)
|
||||
jekyll-readme-index (0.3.0)
|
||||
jekyll (>= 3.0, < 5.0)
|
||||
jekyll-redirect-from (0.16.0)
|
||||
jekyll (>= 3.3, < 5.0)
|
||||
jekyll-relative-links (0.6.1)
|
||||
jekyll (>= 3.3, < 5.0)
|
||||
jekyll-remote-theme (0.4.3)
|
||||
addressable (~> 2.0)
|
||||
jekyll (>= 3.5, < 5.0)
|
||||
jekyll-sass-converter (>= 1.0, <= 3.0.0, != 2.0.0)
|
||||
rubyzip (>= 1.3.0, < 3.0)
|
||||
jekyll-sass-converter (1.5.2)
|
||||
sass (~> 3.4)
|
||||
jekyll-seo-tag (2.8.0)
|
||||
jekyll (>= 3.8, < 5.0)
|
||||
jekyll-sitemap (1.4.0)
|
||||
jekyll (>= 3.7, < 5.0)
|
||||
jekyll-swiss (1.0.0)
|
||||
jekyll-theme-architect (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-cayman (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-dinky (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-hacker (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-leap-day (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-merlot (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-midnight (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-minimal (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-modernist (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-primer (0.6.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-github-metadata (~> 2.9)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-slate (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-tactile (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-theme-time-machine (0.2.0)
|
||||
jekyll (> 3.5, < 5.0)
|
||||
jekyll-seo-tag (~> 2.0)
|
||||
jekyll-titles-from-headings (0.5.3)
|
||||
jekyll (>= 3.3, < 5.0)
|
||||
jekyll-watch (2.2.1)
|
||||
listen (~> 3.0)
|
||||
jemoji (0.13.0)
|
||||
gemoji (>= 3, < 5)
|
||||
html-pipeline (~> 2.2)
|
||||
jekyll (>= 3.0, < 5.0)
|
||||
json (2.19.2)
|
||||
kramdown (2.4.0)
|
||||
rexml
|
||||
kramdown-parser-gfm (1.1.0)
|
||||
kramdown (~> 2.0)
|
||||
liquid (4.0.4)
|
||||
listen (3.10.0)
|
||||
logger
|
||||
rb-fsevent (~> 0.10, >= 0.10.3)
|
||||
rb-inotify (~> 0.9, >= 0.9.10)
|
||||
logger (1.7.0)
|
||||
mercenary (0.3.6)
|
||||
mini_portile2 (2.8.9)
|
||||
minima (2.5.1)
|
||||
jekyll (>= 3.5, < 5.0)
|
||||
jekyll-feed (~> 0.9)
|
||||
jekyll-seo-tag (~> 2.1)
|
||||
minitest (5.27.0)
|
||||
net-http (0.9.1)
|
||||
uri (>= 0.11.1)
|
||||
nokogiri (1.19.1)
|
||||
mini_portile2 (~> 2.8.2)
|
||||
racc (~> 1.4)
|
||||
nokogiri (1.19.1-x86_64-linux-gnu)
|
||||
racc (~> 1.4)
|
||||
nokogiri (1.19.1-x86_64-linux-musl)
|
||||
racc (~> 1.4)
|
||||
octokit (4.25.1)
|
||||
faraday (>= 1, < 3)
|
||||
sawyer (~> 0.9)
|
||||
pathutil (0.16.2)
|
||||
forwardable-extended (~> 2.6)
|
||||
public_suffix (5.1.1)
|
||||
racc (1.8.1)
|
||||
rb-fsevent (0.11.2)
|
||||
rb-inotify (0.11.1)
|
||||
ffi (~> 1.0)
|
||||
rexml (3.4.4)
|
||||
rouge (3.30.0)
|
||||
rubyzip (2.4.1)
|
||||
safe_yaml (1.0.5)
|
||||
sass (3.7.4)
|
||||
sass-listen (~> 4.0.0)
|
||||
sass-listen (4.0.0)
|
||||
rb-fsevent (~> 0.9, >= 0.9.4)
|
||||
rb-inotify (~> 0.9, >= 0.9.7)
|
||||
sawyer (0.9.3)
|
||||
addressable (>= 2.3.5)
|
||||
faraday (>= 0.17.3, < 3)
|
||||
securerandom (0.4.1)
|
||||
simpleidn (0.2.3)
|
||||
terminal-table (1.8.0)
|
||||
unicode-display_width (~> 1.1, >= 1.1.1)
|
||||
typhoeus (1.4.1)
|
||||
ethon (>= 0.9.0)
|
||||
tzinfo (2.0.6)
|
||||
concurrent-ruby (~> 1.0)
|
||||
unicode-display_width (1.8.0)
|
||||
uri (1.1.1)
|
||||
webrick (1.9.2)
|
||||
|
||||
PLATFORMS
|
||||
ruby
|
||||
x86_64-linux
|
||||
x86_64-linux-musl
|
||||
|
||||
DEPENDENCIES
|
||||
github-pages (>= 232)
|
||||
webrick (~> 1.8)
|
||||
|
||||
BUNDLED WITH
|
||||
2.3.25
|
||||
|
|
@ -179,7 +179,7 @@ There are many ways to contribute: writing code, alerting rules, documentation,
|
|||
|
||||
## 🏋️ Improvements
|
||||
|
||||
- Create an alert rule builder in Jekyll for custom alerts (severity, thresholds, instances...)
|
||||
- Create an alert rule builder for custom alerts (severity, thresholds, instances...)
|
||||
- Add resolution suggestions to rule descriptions, for faster incident resolution ([#85](https://github.com/samber/awesome-prometheus-alerts/issues/85)).
|
||||
|
||||
## 💫 Show your support
|
||||
|
|
|
|||
|
|
@ -1,8 +0,0 @@
|
|||
theme: jekyll-theme-cayman
|
||||
|
||||
title: Awesome Prometheus alerts
|
||||
description: Collection of alerting rules
|
||||
|
||||
repository: samber/awesome-prometheus-alerts
|
||||
|
||||
baseurl: /awesome-prometheus-alerts
|
||||
|
|
@ -1,162 +0,0 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="{{ site.lang | default: "en-US" }}">
|
||||
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
{% seo %}
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<meta name="theme-color" content="#157878">
|
||||
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
|
||||
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||
<link rel="stylesheet" href="{{ '/assets/css/style.css?v=' | append: site.github.build_revision | relative_url }}">
|
||||
<link rel="stylesheet" href="{{ '/assets/css/app.css?v=' | append: site.github.build_revision | relative_url }}">
|
||||
<link rel="icon" type="image/x-icon" href="{{ '/assets/favicon.ico' | relative_url }}">
|
||||
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
|
||||
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js"></script>
|
||||
<script src="{{ '/assets/js/app.js?v=' | append: site.github.build_revision | relative_url }}"></script>
|
||||
|
||||
<!-- Global site tag (gtag.js) - Google Analytics -->
|
||||
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-118604063-2"></script>
|
||||
<script>
|
||||
window.dataLayer = window.dataLayer || [];
|
||||
|
||||
function gtag() {
|
||||
dataLayer.push(arguments);
|
||||
}
|
||||
gtag('js', new Date());
|
||||
|
||||
gtag('config', 'UA-118604063-2');
|
||||
</script>
|
||||
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<style>
|
||||
#skip-to-content {
|
||||
height: 1px;
|
||||
width: 1px;
|
||||
position: absolute;
|
||||
overflow: hidden;
|
||||
top: -10px;
|
||||
|
||||
&:focus {
|
||||
position: fixed;
|
||||
top: 10px;
|
||||
left: 10px;
|
||||
height: auto;
|
||||
width: auto;
|
||||
background: invert($body-link-color);
|
||||
outline: thick solid invert($body-link-color);
|
||||
}
|
||||
}
|
||||
|
||||
ul.github-buttons-cta li {
|
||||
display: inline-block;
|
||||
height: 20px;
|
||||
padding: 0px 15px;
|
||||
}
|
||||
|
||||
ul.github-buttons-cta li a {
|
||||
/* width: 100px; */
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.fa {
|
||||
/* padding: 14px;
|
||||
width: 50px;
|
||||
height: 50px; */
|
||||
font-size: 25px;
|
||||
text-align: center;
|
||||
text-decoration: none;
|
||||
border-radius: 50%;
|
||||
}
|
||||
|
||||
.fa:hover {
|
||||
opacity: 0.8;
|
||||
}
|
||||
|
||||
.fa-twitter,
|
||||
.fa-linkedin {
|
||||
/* background: #55ACEE; */
|
||||
color: white;
|
||||
}
|
||||
</style>
|
||||
<a id="skip-to-content" href="#content">Skip to the content.</a>
|
||||
|
||||
<header class="page-header" role="banner">
|
||||
<h1 class="project-name">
|
||||
<a href="{{ '/' | relative_url }}" style="color: white">
|
||||
{{ site.title | default: site.github.repository_name }}
|
||||
</a>
|
||||
</h1>
|
||||
<h2 class="project-tagline">{{ site.description | default: site.github.project_tagline }}</h2>
|
||||
<a href="{{ '/alertmanager' | relative_url }}" class="btn">Global configuration</a>
|
||||
<a href="{{ '/rules' | relative_url }}" class="btn">Rules</a>
|
||||
<a href="{{ '/sleep-peacefully' | relative_url }}" class="btn">Sleep peacefully</a>
|
||||
<a href="{{ '/blackbox-exporter' | relative_url }}" class="btn">Blackbox</a>
|
||||
<a href="https://github.com/samber/awesome-prometheus-alerts/blob/master/CONTRIBUTING.md" class="btn">
|
||||
Contribute on GitHub
|
||||
</a>
|
||||
|
||||
<ul class="github-buttons-cta">
|
||||
<li>
|
||||
<a href="https://github.com/samber/awesome-prometheus-alerts">
|
||||
<img alt="GitHub Repo Watchers" src="https://img.shields.io/github/watchers/samber/awesome-prometheus-alerts?style=social">
|
||||
</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="https://github.com/samber/awesome-prometheus-alerts">
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/samber/awesome-prometheus-alerts?style=social">
|
||||
</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="https://github.com/samber/awesome-prometheus-alerts">
|
||||
<img alt="GitHub Repo forks" src="https://img.shields.io/github/forks/samber/awesome-prometheus-alerts?style=social">
|
||||
</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="https://twitter.com/share?via=samuelberthe&related=samuelberthe&text=🚨 📊 Here is a collection of Awesome Prometheus Alerts&url=https://samber.github.io/awesome-prometheus-alerts"
|
||||
class="fa fa-twitter" target="_blank"></a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="http://www.linkedin.com/shareArticle?mini=true&url=https://samber.github.io/awesome-prometheus-alerts/"
|
||||
class="fa fa-linkedin" target="_blank"></a>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
|
||||
<ul id="sponsoring">
|
||||
<li>
|
||||
Kindly supported by 👉
|
||||
</li>
|
||||
<li>
|
||||
<a href="https://cast.ai/samuel">
|
||||
<img width="" src="assets/sponsor-cast-ai.png" />
|
||||
</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="https://betterstack.com/">
|
||||
<img width="" src="assets/sponsor-betterstack.png" />
|
||||
</a>
|
||||
</li>
|
||||
</ul>
|
||||
</header>
|
||||
|
||||
<main id="content" class="main-content" role="main">
|
||||
{{ content }}
|
||||
|
||||
<footer class="site-footer">
|
||||
{% if site.github.is_project_page %}
|
||||
<span class="site-footer-owner">
|
||||
<a href="{{ site.github.repository_url }}">{{ site.title }}</a> is maintained by
|
||||
<a href="{{ site.github.owner_url }}">{{ site.github.owner_name }}</a>.
|
||||
</span>
|
||||
{% endif %}
|
||||
</footer>
|
||||
</main>
|
||||
|
||||
</body>
|
||||
|
||||
</html>
|
||||
141
alertmanager.md
141
alertmanager.md
|
|
@ -1,141 +0,0 @@
|
|||
<h1 style="text-align: center;">
|
||||
Global configuration
|
||||
</h1>
|
||||
|
||||
If you notice a delay between an event and the first notification, read the following blog post => [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html).
|
||||
|
||||
## Prometheus configuration
|
||||
|
||||
{% highlight yaml %}
|
||||
# prometheus.yml
|
||||
|
||||
global:
|
||||
scrape_interval: 20s
|
||||
|
||||
# A short evaluation_interval will check alerting rules very often.
|
||||
# It can be costly if you run Prometheus with 100+ alerts.
|
||||
evaluation_interval: 20s
|
||||
...
|
||||
|
||||
rule_files:
|
||||
- 'alerts/*.yml'
|
||||
|
||||
scrape_configs:
|
||||
...
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
{% highlight yaml %}
|
||||
# alerts/example-redis.yml
|
||||
|
||||
groups:
|
||||
|
||||
- name: ExampleRedisGroup
|
||||
rules:
|
||||
- alert: ExampleRedisDown
|
||||
expr: redis_up{} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Redis instance down"
|
||||
description: "Whatever"
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
## AlertManager configuration
|
||||
|
||||
{% highlight yaml %}
|
||||
{% raw %}
|
||||
# alertmanager.yml
|
||||
|
||||
route:
|
||||
# When a new group of alerts is created by an incoming alert, wait at
|
||||
# least 'group_wait' to send the initial notification.
|
||||
# This way ensures that you get multiple alerts for the same group that start
|
||||
# firing shortly after another are batched together on the first
|
||||
# notification.
|
||||
group_wait: 10s
|
||||
|
||||
# When the first notification was sent, wait 'group_interval' to send a batch
|
||||
# of new alerts that started firing for that group.
|
||||
group_interval: 30s
|
||||
|
||||
# If an alert has successfully been sent, wait 'repeat_interval' to
|
||||
# resend them.
|
||||
repeat_interval: 30m
|
||||
|
||||
# A default receiver
|
||||
receiver: "slack"
|
||||
|
||||
# All the above attributes are inherited by all child routes and can
|
||||
# overwritten on each.
|
||||
routes:
|
||||
- receiver: "slack"
|
||||
group_wait: 10s
|
||||
match_re:
|
||||
severity: critical|warning
|
||||
continue: true
|
||||
|
||||
- receiver: "pager"
|
||||
group_wait: 10s
|
||||
match_re:
|
||||
severity: critical
|
||||
continue: true
|
||||
|
||||
receivers:
|
||||
- name: "slack"
|
||||
slack_configs:
|
||||
- api_url: 'https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/xxxxxxxxxxxxxxxxxxxxxxxxxxx'
|
||||
send_resolved: true
|
||||
channel: 'monitoring'
|
||||
text: "{{ range .Alerts }}<!channel> {{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}"
|
||||
|
||||
- name: "pager"
|
||||
webhook_configs:
|
||||
- url: http://a.b.c.d:8080/send/sms
|
||||
send_resolved: true
|
||||
|
||||
{% endraw %}
|
||||
{% endhighlight %}
|
||||
|
||||
## Reduce Prometheus server load
|
||||
|
||||
For expansive or frequent PromQL queries, Prometheus allows to precompute rules.
|
||||
|
||||
{% highlight yaml %}
|
||||
{% raw %}
|
||||
groups:
|
||||
|
||||
# first define the recorded rule
|
||||
- name: ExampleRecordedGroup
|
||||
rules:
|
||||
- record: job:rabbitmq_queue_messages_delivered_total:rate:5m
|
||||
expr: rate(rabbitmq_queue_messages_delivered_total[5m])
|
||||
|
||||
# then use it in alerts
|
||||
- name: ExampleAlertingGroup
|
||||
rules:
|
||||
- alert: ExampleRabbitmqLowMessageDelivery
|
||||
expr: sum(job:rabbitmq_queue_messages_delivered_total:rate:5m) < 10
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Low delivery rate in Rabbitmq queues"
|
||||
{% endraw %}
|
||||
{% endhighlight %}
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If the notification takes too much time to be triggered, check the following delays:
|
||||
- `scrape_interval = 20s` (prometheus.yml)
|
||||
- `evaluation_interval = 20s` (prometheus.yml)
|
||||
- `increase(mysql_global_status_slow_queries[1m]) > 0` (alerts/example-mysql.yml)
|
||||
- `for: 5m` (alerts/example-mysql.yml)
|
||||
- `group_wait = 10s` (alertmanager.yml)
|
||||
|
||||
Also read:
|
||||
- [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html).
|
||||
- [https://hodovi.cc/blog/creating-awesome-alertmanager-templates-for-slack/](https://hodovi.cc/blog/creating-awesome-alertmanager-templates-for-slack/)
|
||||
- [https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to-efficiently-detect-anomalies-at-scale/](https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to-efficiently-detect-anomalies-at-scale/)
|
||||
|
|
@ -1,125 +0,0 @@
|
|||
|
||||
<h1 style="text-align: center;">
|
||||
Blackbox exporter
|
||||
</h1>
|
||||
|
||||
## Wordwide probes
|
||||
|
||||
<a href="https://github.com/prometheus/blackbox_exporter" target="_blank">Blackbox Exporter</a> gives you the ability to probe endpoints over HTTP, HTTPS, DNS, TCP and ICMP.
|
||||
|
||||
You should deploy blackbox exporters in multiple Point of Presence around the globe, to monitor latency. Feel free to use the following endpoints for your own projects:
|
||||
|
||||
- https://probe-<b>montreal</b>.cleverapps.io
|
||||
- https://probe-<b>paris</b>.cleverapps.io
|
||||
- https://probe-<b>jeddah</b>.cleverapps.io
|
||||
- https://probe-<b>singapore</b>.cleverapps.io
|
||||
- https://probe-<b>sydney</b>.cleverapps.io
|
||||
- https://probe-<b>warsaw</b>.cleverapps.io
|
||||
|
||||
☝️ Logs have been disabled. More probes from the community would be appreciated, please contribute <a href="https://github.com/samber/awesome-prometheus-alerts/" target="_blank">here</a>! These blackbox exporters use the following <a href="https://github.com/samber/blackbox_exporter/blob/master/samber.yml" target="_blank">configuration</a>.
|
||||
|
||||
## Prometheus Configuration
|
||||
|
||||
Blackbox exporters and endpoints must be declared in Prometheus. Here is a simple configuration, inspired by [Hayk Davtyan medium post](https://medium.com/geekculture/single-prometheus-job-for-dozens-of-blackbox-exporters-2a7ba492d6c8):
|
||||
|
||||
```yml
|
||||
# sd/blackbox.yml
|
||||
|
||||
- targets:
|
||||
#
|
||||
# Montreal
|
||||
#
|
||||
# http
|
||||
- probe-montreal.cleverapps.io:_:http_2xx:_:Montreal:_:f229cy:_:https://api.screeb.app
|
||||
- probe-montreal.cleverapps.io:_:http_2xx:_:Montreal:_:f229cy:_:https://t.screeb.app/tag.js
|
||||
# icmp
|
||||
- probe-montreal.cleverapps.io:_:icmp_ipv4:_:Montreal:_:f229cy:_:api.screeb.app
|
||||
- probe-montreal.cleverapps.io:_:icmp_ipv4:_:Montreal:_:f229cy:_:t.screeb.app
|
||||
|
||||
|
||||
#
|
||||
# Paris
|
||||
#
|
||||
# http
|
||||
- probe-paris.cleverapps.io:_:http_2xx:_:Paris:_:u09tgy:_:https://api.screeb.app
|
||||
- probe-paris.cleverapps.io:_:http_2xx:_:Paris:_:u09tgy:_:https://t.screeb.app/tag.js
|
||||
# icmp
|
||||
- probe-paris.cleverapps.io:_:icmp_ipv4:_:Paris:_:u09tgy:_:api.screeb.app
|
||||
- probe-paris.cleverapps.io:_:icmp_ipv4:_:Paris:_:u09tgy:_:t.screeb.app
|
||||
|
||||
|
||||
#
|
||||
# Sydney
|
||||
#
|
||||
# http
|
||||
- probe-sydney.cleverapps.io:_:http_2xx:_:Sydney:_:r3gpkn:_:https://api.screeb.app
|
||||
- probe-sydney.cleverapps.io:_:http_2xx:_:Sydney:_:r3gpkn:_:https://t.screeb.app/tag.js
|
||||
# icmp
|
||||
- probe-sydney.cleverapps.io:_:icmp_ipv4:_:Sydney:_:r3gpkn:_:api.screeb.app
|
||||
- probe-sydney.cleverapps.io:_:icmp_ipv4:_:Sydney:_:r3gpkn:_:t.screeb.app
|
||||
|
||||
# ...
|
||||
```
|
||||
|
||||
```yml
|
||||
# prometheus.yml
|
||||
|
||||
global:
|
||||
# ...
|
||||
|
||||
scrape_configs:
|
||||
|
||||
- job_name: 'blackbox'
|
||||
metrics_path: /probe
|
||||
scrape_interval: 30s
|
||||
scheme: https
|
||||
file_sd_configs:
|
||||
- files:
|
||||
- /etc/prometheus/sd/blackbox.yml
|
||||
relabel_configs:
|
||||
# adds "module" label in the final labelset
|
||||
- source_labels: [__address__]
|
||||
regex: '.*:_:(.*):_:.*:_:.*:_:.*'
|
||||
target_label: module
|
||||
# adds "geohash" label in the final labelset
|
||||
- source_labels: [__address__]
|
||||
regex: '.*:_:.*:_:.*:_:(.*):_:.*'
|
||||
target_label: geohash
|
||||
# rewrites "instance" label with corresponding URL
|
||||
- source_labels: [__address__]
|
||||
regex: '.*:_:.*:_:.*:_:.*:_:(.*)'
|
||||
target_label: instance
|
||||
# rewrites "pop" label with corresponding location name
|
||||
- source_labels: [__address__]
|
||||
regex: '.*:_:.*:_:(.*):_:.*:_:.*'
|
||||
target_label: pop
|
||||
# passes "module" parameter to Blackbox exporter
|
||||
- source_labels: [module]
|
||||
target_label: __param_module
|
||||
# passes "target" parameter to Blackbox exporter
|
||||
- source_labels: [instance]
|
||||
target_label: __param_target
|
||||
# the Blackbox exporter's real hostname:port
|
||||
- source_labels: [__address__]
|
||||
regex: '(.*):_:.*:_:.*:_:.*:_:.*'
|
||||
target_label: __address__
|
||||
|
||||
# ...
|
||||
|
||||
```
|
||||
|
||||
## Geohash
|
||||
|
||||

|
||||
|
||||
To display nice maps in Grafana, you need to instruct blackbox exporters about the location. Grafana map panel speaks the "geohash" format:
|
||||
|
||||
- go to google map
|
||||
- extract the lat/long from the url
|
||||
- convert lat/long to geohash here: http://geohash.co
|
||||
|
||||
## Grafana
|
||||
|
||||
Some great dashboard have been created by the community: https://grafana.com/grafana/dashboards/?search=blackbox
|
||||
|
||||
Since Grafana v5.0.0, a map panel is available: https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/geomap/
|
||||
|
|
@ -1,11 +0,0 @@
|
|||
version: '3'
|
||||
|
||||
services:
|
||||
|
||||
jekyll:
|
||||
image: jekyll/jekyll:latest
|
||||
command: jekyll serve
|
||||
volumes:
|
||||
- ./:/srv/jekyll
|
||||
ports:
|
||||
- 4000:4000
|
||||
54
index.md
54
index.md
|
|
@ -1,54 +0,0 @@
|
|||
|
||||
<style>
|
||||
.center-image
|
||||
{
|
||||
margin: 0 auto;
|
||||
display: block;
|
||||
}
|
||||
</style>
|
||||
|
||||
|
||||
{: .center-image }
|
||||
|
||||
|
||||
<h2>
|
||||
Hello world
|
||||
</h2>
|
||||
|
||||
<a href="/awesome-prometheus-alerts/alertmanager">
|
||||
AlertManager configuration
|
||||
</a>
|
||||
|
||||
<a href="/awesome-prometheus-alerts/sleep-peacefully">
|
||||
Alerting time window
|
||||
</a>
|
||||
|
||||
<h2>
|
||||
Out of the box prometheus alerting rules
|
||||
</h2>
|
||||
|
||||
<ul>
|
||||
{% for group in site.data.rules.groups %}
|
||||
<li style="margin-top: 30px;">
|
||||
{% assign nbrRules = 0 %}
|
||||
{% for service in group.services %}
|
||||
{% for exporter in service.exporters %}
|
||||
{% for rule in exporter.rules %}
|
||||
{% assign nbrRules = nbrRules | plus: 1 %}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
<h3>{{ group.name }} <small style="margin-left: 20px;">({{ nbrRules }} rules)</small></h3>
|
||||
<ul>
|
||||
{% for service in group.services %}
|
||||
<li>
|
||||
<a href="/awesome-prometheus-alerts/rules#{{ service.name | replace: " ", "-" | downcase }}">
|
||||
{{ service.name }}
|
||||
</a>
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
141
rules.md
141
rules.md
|
|
@ -1,141 +0,0 @@
|
|||
<style>
|
||||
ul {
|
||||
list-style: none;
|
||||
}
|
||||
</style>
|
||||
|
||||
<!-- CAUTIONS -->
|
||||
<div style="padding: 20px 20px 10px 20px; border: solid grey 1px; border-radius: 10px;">
|
||||
<h2 style="text-align:center;">⚠️ Caution ⚠️</h2>
|
||||
|
||||
<p style="text-align:center;">
|
||||
Alert thresholds depend on nature of applications.
|
||||
<br>
|
||||
Some queries in this page may have arbitrary tolerance threshold.
|
||||
<br><br>
|
||||
Building an efficient and battle-tested monitoring platform takes time. 😉
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
<h1></h1>
|
||||
|
||||
<!-- RULES -->
|
||||
<ul>
|
||||
{% for group in site.data.rules.groups %}
|
||||
{% assign groupIndex = forloop.index %}
|
||||
{% for service in group.services %}
|
||||
{% assign serviceIndex = forloop.index %}
|
||||
{% assign nbrExporters = service.exporters | size %}
|
||||
{% for exporter in service.exporters %}
|
||||
{% assign exporterIndex = forloop.index %}
|
||||
{% assign nbrRules = exporter.rules | size %}
|
||||
<li>
|
||||
{% assign serviceId = service.name | replace: " ", "-" | downcase %}
|
||||
<h2 id="{{ serviceId }}">
|
||||
<span id="{{ serviceId }}-{{ exporterIndex }}"></span>
|
||||
<a class="anchor" href="#{{ serviceId }}-{{ exporterIndex }}">#</a>
|
||||
{{ groupIndex }}.{{ serviceIndex }}.{% if nbrExporters > 1 %}{{ exporterIndex }}.{% endif %}
|
||||
{{ service.name }}
|
||||
{% if exporter.name %}:
|
||||
{% if exporter.doc_url %}
|
||||
<a href="{{ exporter.doc_url }}">
|
||||
{{ exporter.name }}
|
||||
</a>
|
||||
{% else %}
|
||||
{{ exporter.name }}
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
||||
{% if nbrRules > 0 %}
|
||||
<small style="font-size: 60%; vertical-align: middle; margin-left: 10px;">
|
||||
({{ nbrRules }} rules)
|
||||
</small>
|
||||
<span class="clipboard-multiple" data-clipboard-target-id="group-{{ groupIndex }}-service-{{ serviceIndex }}-exporter-{{ exporterIndex }}">[copy section]</span>
|
||||
{% endif %}
|
||||
</h2>
|
||||
|
||||
{% if nbrRules == 0 %}
|
||||
{% highlight javascript %}
|
||||
// @TODO: Please contribute => https://github.com/samber/awesome-prometheus-alerts 👋
|
||||
{% endhighlight %}
|
||||
{% else %}
|
||||
{{ exporter.comments | strip | newline_to_br }}
|
||||
{% highlight bash %}
|
||||
$ wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/{{ service.name | replace: " ", "-" | downcase }}/{{ exporter.slug }}.yml
|
||||
{% endhighlight %}
|
||||
{% endif %}
|
||||
|
||||
<ul>
|
||||
{% for rule in exporter.rules %}
|
||||
{% assign ruleIndex = forloop.index %}
|
||||
{% assign comments = rule.comments | strip | newline_to_br | split: '<br />' %}
|
||||
<li>
|
||||
<h4 id="rule-{{ serviceId }}-{{ exporterIndex }}-{{ ruleIndex }}">
|
||||
<span id="rule-{{ serviceId }}-{{ ruleIndex }}"></span><!-- @deprecated -->
|
||||
<a class="anchor" href="#rule-{{ serviceId }}-{{ exporterIndex }}-{{ ruleIndex }}">#</a>
|
||||
{{ groupIndex}}.{{ serviceIndex }}.{% if nbrExporters > 1 %}{{ exporterIndex }}.{% endif %}{{ ruleIndex }}.
|
||||
{{ rule.name }}
|
||||
</h4>
|
||||
<summary>
|
||||
{{ rule.description }}
|
||||
<span class="clipboard-single" data-clipboard-target-id="group-{{ groupIndex }}-service-{{ serviceIndex }}-exporter-{{ exporterIndex }}-rule-{{ ruleIndex }}" onclick="event.preventDefault();">[copy]</span>
|
||||
</summary>
|
||||
<div id="group-{{ groupIndex }}-service-{{ serviceIndex }}-exporter-{{ exporterIndex }}-rule-{{ ruleIndex }}">
|
||||
{% assign ruleName = rule.name | split: ' ' %}
|
||||
{% capture ruleNameCamelcase %}{% for word in ruleName %}{{ word | capitalize }} {% endfor %}{% endcapture %}
|
||||
|
||||
{% highlight yaml %}
|
||||
{% for comment in comments %}# {{ comment | strip }}
|
||||
{% endfor %}- alert: {{ ruleNameCamelcase | remove: ' ' }}
|
||||
expr: {{ rule.query }}
|
||||
for: {% if rule.for %}{{ rule.for }}{% else %}0m{% endif %}
|
||||
labels:
|
||||
severity: {{ rule.severity }}
|
||||
annotations:
|
||||
summary: {{ rule.name }} (instance {% raw %}{{ $labels.instance }}{% endraw %})
|
||||
description: "{{ rule.description | replace: '"', '\"' }}\n VALUE = {% raw %}{{ $value }}{% endraw %}\n LABELS = {% raw %}{{ $labels }}{% endraw %}"
|
||||
|
||||
{% endhighlight %}
|
||||
|
||||
</div>
|
||||
<br/>
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
|
||||
<hr/>
|
||||
</li>
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
<!-- NAVBAR -->
|
||||
<div id="rules-navbar" class="affix">
|
||||
<h3>Menu</h3>
|
||||
<ul>
|
||||
{% for group in site.data.rules.groups %}
|
||||
<li>
|
||||
<h4>{{ group.name }}</h4>
|
||||
<ul>
|
||||
{% for service in group.services %}
|
||||
<li>
|
||||
<a href="#{{ service.name | replace: " ", "-" | downcase }}">
|
||||
👉 {{ service.name }}
|
||||
</a>
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
|
||||
<script>
|
||||
$('#rules-navbar').affix({offset: {top: 750} }).css('display', 'block');
|
||||
</script>
|
||||
</div>
|
||||
|
|
@ -1,106 +0,0 @@
|
|||
<h1 style="text-align: center;">
|
||||
Sleep Peacefully
|
||||
</h1>
|
||||
|
||||
## Alerting time window
|
||||
|
||||
In some applications, load and activity can vary over the day/week/year.
|
||||
|
||||
In order to prevent alarm fatigue and busy pager, alerts can be disabled during a period of time (such as night or weekend).
|
||||
|
||||
Example:
|
||||
|
||||
- Weekday: `node_load5 > 10 and ON() (0 < day_of_week() < 6)`
|
||||
- Day time: `node_load5 > 10 and ON() (8 < hour() < 18)`
|
||||
- Exclude December: `node_load5 > 10 and ON() (month() != 12)`
|
||||
|
||||
## Advanced time windows and timezones
|
||||
|
||||
```yml
|
||||
# rules.yml
|
||||
|
||||
groups:
|
||||
- name: timezones
|
||||
rules:
|
||||
- record: european_summer_time_offset
|
||||
expr: |
|
||||
(vector(1) and (month() > 3 and month() < 10))
|
||||
or
|
||||
(vector(1) and (month() == 3 and (day_of_month() - day_of_week()) >= 25) and absent((day_of_month() >= 25) and (day_of_week() == 0)))
|
||||
or
|
||||
(vector(1) and (month() == 10 and (day_of_month() - day_of_week()) < 25) and absent((day_of_month() >= 25) and (day_of_week() == 0)))
|
||||
or
|
||||
(vector(1) and ((month() == 10 and hour() < 1) or (month() == 3 and hour() > 0)) and ((day_of_month() >= 25) and (day_of_week() == 0)))
|
||||
or
|
||||
vector(0)
|
||||
|
||||
- record: europe_london_time
|
||||
expr: time() + 3600 * european_summer_time_offset
|
||||
- record: europe_paris_time
|
||||
expr: time() + 3600 * (1 + european_summer_time_offset)
|
||||
|
||||
- record: europe_london_hour
|
||||
expr: hour(europe_london_time)
|
||||
- record: europe_paris_hour
|
||||
expr: hour(europe_paris_time)
|
||||
|
||||
- record: europe_london_weekday
|
||||
expr: 0 < day_of_week(europe_london_time) < 6
|
||||
- record: europe_paris_weekday
|
||||
expr: 0 < day_of_week(europe_paris_time) < 6
|
||||
# opposite
|
||||
- record: not_europe_london_weekday
|
||||
expr: absent(europe_london_weekday)
|
||||
- record: not_europe_paris_weekday
|
||||
expr: absent(europe_paris_weekday)
|
||||
|
||||
- record: europe_london_business_hours
|
||||
expr: 9 <= europe_london_hour < 18
|
||||
- record: europe_paris_business_hours
|
||||
expr: 9 <= europe_paris_hour < 18
|
||||
# opposite
|
||||
- record: not_europe_london_business_hours
|
||||
expr: absent(europe_london_business_hours)
|
||||
- record: not_europe_paris_business_hours
|
||||
expr: absent(europe_paris_business_hours)
|
||||
|
||||
# new year's day / xmas / labor day / all saints' day / ...
|
||||
- record: europe_french_public_holidays
|
||||
expr: |
|
||||
(vector(1) and month(europe_paris_time) == 1 and day_of_month(europe_paris_time) == 1)
|
||||
or
|
||||
(vector(1) and month(europe_paris_time) == 12 and day_of_month(europe_paris_time) == 25)
|
||||
or
|
||||
(vector(1) and month(europe_paris_time) == 5 and day_of_month(europe_paris_time) == 1)
|
||||
or
|
||||
(vector(1) and month(europe_paris_time) == 11 and day_of_month(europe_paris_time) == 1)
|
||||
or
|
||||
vector(0)
|
||||
# opposite
|
||||
- record: not_europe_french_public_holidays
|
||||
expr: absent(europe_french_public_holidays)
|
||||
```
|
||||
|
||||
```yml
|
||||
# alerts.yml
|
||||
|
||||
groups:
|
||||
- name: CPU Load
|
||||
rules:
|
||||
- alert: HighLoadQuietDuringWeekendAndNight
|
||||
expr: node_load5 > 10 and ON() (europe_london_weekday and europe_paris_weekday)
|
||||
|
||||
- alert: HighLoadQuietDuringBackup
|
||||
expr: node_load5 > 10 and ON() absent(hour() == 2)
|
||||
|
||||
- alert: HighLoad
|
||||
expr: |
|
||||
node_load5 > 20 and ON() (europe_london_weekday and europe_paris_weekday)
|
||||
or
|
||||
node_load5 > 10
|
||||
```
|
||||
|
||||
## Sources
|
||||
|
||||
- [https://medium.com/@tom.fawcett/time-of-day-based-notifications-with-prometheus-and-alertmanager-1bf7a23b7695](https://medium.com/@tom.fawcett/time-of-day-based-notifications-with-prometheus-and-alertmanager-1bf7a23b7695)
|
||||
- [https://promcon.io/2019-munich/slides/improved-alerting-with-prometheus-and-alertmanager.pdf](https://promcon.io/2019-munich/slides/improved-alerting-with-prometheus-and-alertmanager.pdf)
|
||||
Loading…
Reference in a new issue