From 9b995315d58ecc864866c917c506b878c8737761 Mon Sep 17 00:00:00 2001 From: Samuel Berthe Date: Tue, 7 Apr 2026 17:14:38 +0200 Subject: [PATCH] refactor: remove previous website --- .gitignore | 6 - CLAUDE.md | 37 +++--- CONTRIBUTING.md | 22 +--- Gemfile | 3 - Gemfile.lock | 293 ------------------------------------------ README.md | 2 +- _config.yml | 8 -- _layouts/default.html | 162 ----------------------- alertmanager.md | 141 -------------------- blackbox-exporter.md | 125 ------------------ docker-compose.yml | 11 -- index.md | 54 -------- rules.md | 141 -------------------- sleep-peacefully.md | 106 --------------- 14 files changed, 29 insertions(+), 1082 deletions(-) delete mode 100644 Gemfile delete mode 100644 Gemfile.lock delete mode 100644 _config.yml delete mode 100644 _layouts/default.html delete mode 100644 alertmanager.md delete mode 100644 blackbox-exporter.md delete mode 100644 docker-compose.yml delete mode 100644 index.md delete mode 100644 rules.md delete mode 100644 sleep-peacefully.md diff --git a/.gitignore b/.gitignore index 0b68d8b..b70799c 100644 --- a/.gitignore +++ b/.gitignore @@ -1,9 +1,3 @@ -# Jekyll (legacy) -_site/ -.sass-cache/ -.jekyll-cache/ -.jekyll-metadata - # Generated data _data/rules.json test/rules/ diff --git a/CLAUDE.md b/CLAUDE.md index fd45a8f..0e67759 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,17 +6,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co A curated collection of ~940 Prometheus alerting rules covering 90+ services across 100+ exporters, organized in categories: basic resource monitoring (Prometheus, host/hardware, SMART, Docker, Blackbox, Windows, VMware, Netdata), databases (MySQL, PostgreSQL, Redis, MongoDB, Elasticsearch, Cassandra, Clickhouse, CouchDB, etc.), message brokers (RabbitMQ, Kafka, Pulsar, Nats, Zookeeper), proxies/load balancers/service meshes (Nginx, Apache, HaProxy, Traefik, Caddy, Linkerd, Istio), runtimes (PHP-FPM, JVM, Sidekiq), data engineering (Apache Flink, Apache Spark, Hadoop), orchestrators (Kubernetes, Nomad, Consul, Etcd, OpenStack), CI/CD (Jenkins, ArgoCD, FluxCD, GitLab CI, Spinnaker), network and security (SSL/TLS, CoreDNS, Vault, Cloudflare, Cilium, eBPF), storage (Ceph, ZFS, OpenEBS, Minio), cloud providers (AWS, Azure, DigitalOcean), observability (Thanos, Loki, Cortex, OpenTelemetry Collector, Grafana Tempo/Mimir/Alloy, Jaeger), and other (APC UPS, Graph Node). -All rules are stored in a single YAML data file (`_data/rules.yml`) and rendered as a Jekyll-based GitHub Pages site at https://samber.github.io/awesome-prometheus-alerts. The site provides copy-pasteable Prometheus alert snippets and downloadable rule files per exporter. +All rules are stored in a single YAML data file (`_data/rules.yml`) and rendered as a static site built with Astro + TypeScript (located in `site/`). The site provides copy-pasteable Prometheus alert snippets and downloadable rule files per exporter. The project is community-driven. Most contributions are PRs adding or updating rules in `_data/rules.yml`. Files in `dist/rules/` are auto-generated on merge — never edit them manually. ## Architecture - **`_data/rules.yml`** — The single source of truth for all alerting rules. This is the main file contributors edit. It is NOT a valid Prometheus config; the site renders each rule into copy-pasteable Prometheus alert format. -- **`rules.md`** — Jekyll template that iterates over `_data/rules.yml` and renders the rules page with copy buttons and formatted YAML blocks. -- **`alertmanager.md`** — Static page with Prometheus/AlertManager configuration examples. -- **`_layouts/default.html`** — Site layout (Jekyll theme: cayman). -- **`_config.yml`** — Jekyll configuration. +- **`site/`** — Astro + TypeScript static site. Run `npm run dev` inside this directory to develop locally. +- **`site/src/data/rules.ts`** — Typed wrappers and helper functions over `_data/rules.yml`. +- **`site/src/data/site.ts`** — Shared site metadata constants (URLs, author, schema objects). +- **`site/src/pages/`** — Astro page routes: `index.astro` (homepage), `rules/[group]/[service].astro` (per-service rule pages), `alertmanager.astro`, `blackbox-exporter.astro`, `sleep-peacefully.astro` (guides). +- **`site/src/layouts/BaseLayout.astro`** — Root HTML layout (SEO, GA, dark mode). +- **`site/src/layouts/GuideLayout.astro`** — Layout for guide pages (TOC, hero, related guides). +- **`site/src/components/`** — Shared Astro components (Header, Footer, Sidebar, RuleCard, ExporterSection, etc.). +- **`site/astro.config.mjs`** — Astro configuration (sitemap, Vite YAML plugin, base URL). - **`dist/rules/`** — Pre-built downloadable rule files organized by service/exporter (referenced in the site for `wget` commands). ## Rules YAML Structure @@ -50,19 +54,20 @@ Services are grouped in category. If you are not sure about the classification, ## Running Locally ```bash -# With Ruby/Bundler -gem install bundler -bundle install -jekyll serve - -# With Docker Compose -docker compose up -d - -# With Docker directly -docker run --rm -it -p 4000:4000 -v $(pwd):/srv/jekyll jekyll/jekyll jekyll serve +cd site +npm install +npm run dev ``` -Site serves at http://localhost:4000/awesome-prometheus-alerts. +Site serves at http://localhost:4321/awesome-prometheus-alerts. + +To build for production: + +```bash +cd site +npm run build +npm run preview +``` ## Contributing Rules diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 816a59f..7974f64 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -16,24 +16,16 @@ Please ensure your pull request adheres to the following guidelines: - Description must be factual (the "what?") and should provide root cause suggestions (the "why?"), for faster resolution. - Queries must be tested on latest exporter version. -## Improving Github page +## Improving the website + +The site is built with Astro + TypeScript, located in `site/`. ### Run locally ``` -gem install bundler -bundle install -jekyll serve +cd site +npm install +npm run dev ``` -Or with Docker: - -``` -docker run --rm -it -p 4000:4000 -v $(pwd):/srv/jekyll jekyll/jekyll jekyll serve -``` - -Or with Docker Compose: - -``` -docker compose up -d -``` +Site serves at http://localhost:4321/awesome-prometheus-alerts. diff --git a/Gemfile b/Gemfile deleted file mode 100644 index cddfa60..0000000 --- a/Gemfile +++ /dev/null @@ -1,3 +0,0 @@ -source 'https://rubygems.org' -gem 'github-pages', '>= 232', group: :jekyll_plugins -gem 'webrick', '~> 1.8' \ No newline at end of file diff --git a/Gemfile.lock b/Gemfile.lock deleted file mode 100644 index 044b68a..0000000 --- a/Gemfile.lock +++ /dev/null @@ -1,293 +0,0 @@ -GEM - remote: https://rubygems.org/ - specs: - activesupport (7.2.3.1) - base64 - benchmark (>= 0.3) - bigdecimal - concurrent-ruby (~> 1.0, >= 1.3.1) - connection_pool (>= 2.2.5) - drb - i18n (>= 1.6, < 2) - logger (>= 1.4.2) - minitest (>= 5.1, < 6) - securerandom (>= 0.3) - tzinfo (~> 2.0, >= 2.0.5) - addressable (2.8.9) - public_suffix (>= 2.0.2, < 8.0) - base64 (0.3.0) - benchmark (0.5.0) - bigdecimal (4.0.1) - coffee-script (2.4.1) - coffee-script-source - execjs - coffee-script-source (1.12.2) - colorator (1.1.0) - commonmarker (0.23.12) - concurrent-ruby (1.3.6) - connection_pool (3.0.2) - csv (3.3.5) - dnsruby (1.73.1) - base64 (>= 0.2) - logger (~> 1.6) - simpleidn (~> 0.2.1) - drb (2.2.3) - em-websocket (0.5.3) - eventmachine (>= 0.12.9) - http_parser.rb (~> 0) - ethon (0.18.0) - ffi (>= 1.15.0) - logger - eventmachine (1.2.7) - execjs (2.10.0) - faraday (2.14.1) - faraday-net_http (>= 2.0, < 3.5) - json - logger - faraday-net_http (3.4.2) - net-http (~> 0.5) - ffi (1.17.3) - ffi (1.17.3-x86_64-linux-gnu) - ffi (1.17.3-x86_64-linux-musl) - forwardable-extended (2.6.0) - gemoji (4.1.0) - github-pages (232) - github-pages-health-check (= 1.18.2) - jekyll (= 3.10.0) - jekyll-avatar (= 0.8.0) - jekyll-coffeescript (= 1.2.2) - jekyll-commonmark-ghpages (= 0.5.1) - jekyll-default-layout (= 0.1.5) - jekyll-feed (= 0.17.0) - jekyll-gist (= 1.5.0) - jekyll-github-metadata (= 2.16.1) - jekyll-include-cache (= 0.2.1) - jekyll-mentions (= 1.6.0) - jekyll-optional-front-matter (= 0.3.2) - jekyll-paginate (= 1.1.0) - jekyll-readme-index (= 0.3.0) - jekyll-redirect-from (= 0.16.0) - jekyll-relative-links (= 0.6.1) - jekyll-remote-theme (= 0.4.3) - jekyll-sass-converter (= 1.5.2) - jekyll-seo-tag (= 2.8.0) - jekyll-sitemap (= 1.4.0) - jekyll-swiss (= 1.0.0) - jekyll-theme-architect (= 0.2.0) - jekyll-theme-cayman (= 0.2.0) - jekyll-theme-dinky (= 0.2.0) - jekyll-theme-hacker (= 0.2.0) - jekyll-theme-leap-day (= 0.2.0) - jekyll-theme-merlot (= 0.2.0) - jekyll-theme-midnight (= 0.2.0) - jekyll-theme-minimal (= 0.2.0) - jekyll-theme-modernist (= 0.2.0) - jekyll-theme-primer (= 0.6.0) - jekyll-theme-slate (= 0.2.0) - jekyll-theme-tactile (= 0.2.0) - jekyll-theme-time-machine (= 0.2.0) - jekyll-titles-from-headings (= 0.5.3) - jemoji (= 0.13.0) - kramdown (= 2.4.0) - kramdown-parser-gfm (= 1.1.0) - liquid (= 4.0.4) - mercenary (~> 0.3) - minima (= 2.5.1) - nokogiri (>= 1.16.2, < 2.0) - rouge (= 3.30.0) - terminal-table (~> 1.4) - webrick (~> 1.8) - github-pages-health-check (1.18.2) - addressable (~> 2.3) - dnsruby (~> 1.60) - octokit (>= 4, < 8) - public_suffix (>= 3.0, < 6.0) - typhoeus (~> 1.3) - html-pipeline (2.14.3) - activesupport (>= 2) - nokogiri (>= 1.4) - http_parser.rb (0.8.1) - i18n (1.14.8) - concurrent-ruby (~> 1.0) - jekyll (3.10.0) - addressable (~> 2.4) - colorator (~> 1.0) - csv (~> 3.0) - em-websocket (~> 0.5) - i18n (>= 0.7, < 2) - jekyll-sass-converter (~> 1.0) - jekyll-watch (~> 2.0) - kramdown (>= 1.17, < 3) - liquid (~> 4.0) - mercenary (~> 0.3.3) - pathutil (~> 0.9) - rouge (>= 1.7, < 4) - safe_yaml (~> 1.0) - webrick (>= 1.0) - jekyll-avatar (0.8.0) - jekyll (>= 3.0, < 5.0) - jekyll-coffeescript (1.2.2) - coffee-script (~> 2.2) - coffee-script-source (~> 1.12) - jekyll-commonmark (1.4.0) - commonmarker (~> 0.22) - jekyll-commonmark-ghpages (0.5.1) - commonmarker (>= 0.23.7, < 1.1.0) - jekyll (>= 3.9, < 4.0) - jekyll-commonmark (~> 1.4.0) - rouge (>= 2.0, < 5.0) - jekyll-default-layout (0.1.5) - jekyll (>= 3.0, < 5.0) - jekyll-feed (0.17.0) - jekyll (>= 3.7, < 5.0) - jekyll-gist (1.5.0) - octokit (~> 4.2) - jekyll-github-metadata (2.16.1) - jekyll (>= 3.4, < 5.0) - octokit (>= 4, < 7, != 4.4.0) - jekyll-include-cache (0.2.1) - jekyll (>= 3.7, < 5.0) - jekyll-mentions (1.6.0) - html-pipeline (~> 2.3) - jekyll (>= 3.7, < 5.0) - jekyll-optional-front-matter (0.3.2) - jekyll (>= 3.0, < 5.0) - jekyll-paginate (1.1.0) - jekyll-readme-index (0.3.0) - jekyll (>= 3.0, < 5.0) - jekyll-redirect-from (0.16.0) - jekyll (>= 3.3, < 5.0) - jekyll-relative-links (0.6.1) - jekyll (>= 3.3, < 5.0) - jekyll-remote-theme (0.4.3) - addressable (~> 2.0) - jekyll (>= 3.5, < 5.0) - jekyll-sass-converter (>= 1.0, <= 3.0.0, != 2.0.0) - rubyzip (>= 1.3.0, < 3.0) - jekyll-sass-converter (1.5.2) - sass (~> 3.4) - jekyll-seo-tag (2.8.0) - jekyll (>= 3.8, < 5.0) - jekyll-sitemap (1.4.0) - jekyll (>= 3.7, < 5.0) - jekyll-swiss (1.0.0) - jekyll-theme-architect (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-cayman (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-dinky (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-hacker (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-leap-day (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-merlot (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-midnight (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-minimal (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-modernist (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-primer (0.6.0) - jekyll (> 3.5, < 5.0) - jekyll-github-metadata (~> 2.9) - jekyll-seo-tag (~> 2.0) - jekyll-theme-slate (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-tactile (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-theme-time-machine (0.2.0) - jekyll (> 3.5, < 5.0) - jekyll-seo-tag (~> 2.0) - jekyll-titles-from-headings (0.5.3) - jekyll (>= 3.3, < 5.0) - jekyll-watch (2.2.1) - listen (~> 3.0) - jemoji (0.13.0) - gemoji (>= 3, < 5) - html-pipeline (~> 2.2) - jekyll (>= 3.0, < 5.0) - json (2.19.2) - kramdown (2.4.0) - rexml - kramdown-parser-gfm (1.1.0) - kramdown (~> 2.0) - liquid (4.0.4) - listen (3.10.0) - logger - rb-fsevent (~> 0.10, >= 0.10.3) - rb-inotify (~> 0.9, >= 0.9.10) - logger (1.7.0) - mercenary (0.3.6) - mini_portile2 (2.8.9) - minima (2.5.1) - jekyll (>= 3.5, < 5.0) - jekyll-feed (~> 0.9) - jekyll-seo-tag (~> 2.1) - minitest (5.27.0) - net-http (0.9.1) - uri (>= 0.11.1) - nokogiri (1.19.1) - mini_portile2 (~> 2.8.2) - racc (~> 1.4) - nokogiri (1.19.1-x86_64-linux-gnu) - racc (~> 1.4) - nokogiri (1.19.1-x86_64-linux-musl) - racc (~> 1.4) - octokit (4.25.1) - faraday (>= 1, < 3) - sawyer (~> 0.9) - pathutil (0.16.2) - forwardable-extended (~> 2.6) - public_suffix (5.1.1) - racc (1.8.1) - rb-fsevent (0.11.2) - rb-inotify (0.11.1) - ffi (~> 1.0) - rexml (3.4.4) - rouge (3.30.0) - rubyzip (2.4.1) - safe_yaml (1.0.5) - sass (3.7.4) - sass-listen (~> 4.0.0) - sass-listen (4.0.0) - rb-fsevent (~> 0.9, >= 0.9.4) - rb-inotify (~> 0.9, >= 0.9.7) - sawyer (0.9.3) - addressable (>= 2.3.5) - faraday (>= 0.17.3, < 3) - securerandom (0.4.1) - simpleidn (0.2.3) - terminal-table (1.8.0) - unicode-display_width (~> 1.1, >= 1.1.1) - typhoeus (1.4.1) - ethon (>= 0.9.0) - tzinfo (2.0.6) - concurrent-ruby (~> 1.0) - unicode-display_width (1.8.0) - uri (1.1.1) - webrick (1.9.2) - -PLATFORMS - ruby - x86_64-linux - x86_64-linux-musl - -DEPENDENCIES - github-pages (>= 232) - webrick (~> 1.8) - -BUNDLED WITH - 2.3.25 diff --git a/README.md b/README.md index d1c0528..c2fbcc4 100644 --- a/README.md +++ b/README.md @@ -179,7 +179,7 @@ There are many ways to contribute: writing code, alerting rules, documentation, ## 🏋️ Improvements -- Create an alert rule builder in Jekyll for custom alerts (severity, thresholds, instances...) +- Create an alert rule builder for custom alerts (severity, thresholds, instances...) - Add resolution suggestions to rule descriptions, for faster incident resolution ([#85](https://github.com/samber/awesome-prometheus-alerts/issues/85)). ## 💫 Show your support diff --git a/_config.yml b/_config.yml deleted file mode 100644 index fb8b54f..0000000 --- a/_config.yml +++ /dev/null @@ -1,8 +0,0 @@ -theme: jekyll-theme-cayman - -title: Awesome Prometheus alerts -description: Collection of alerting rules - -repository: samber/awesome-prometheus-alerts - -baseurl: /awesome-prometheus-alerts diff --git a/_layouts/default.html b/_layouts/default.html deleted file mode 100644 index 2383368..0000000 --- a/_layouts/default.html +++ /dev/null @@ -1,162 +0,0 @@ - - - - - - {% seo %} - - - - - - - - - - - - - - - - - - - - - - Skip to the content. - - - -
- {{ content }} - - -
- - - - diff --git a/alertmanager.md b/alertmanager.md deleted file mode 100644 index d350945..0000000 --- a/alertmanager.md +++ /dev/null @@ -1,141 +0,0 @@ -

- Global configuration -

- -If you notice a delay between an event and the first notification, read the following blog post => [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html). - -## Prometheus configuration - -{% highlight yaml %} -# prometheus.yml - -global: - scrape_interval: 20s - - # A short evaluation_interval will check alerting rules very often. - # It can be costly if you run Prometheus with 100+ alerts. - evaluation_interval: 20s - ... - -rule_files: - - 'alerts/*.yml' - -scrape_configs: - ... - -{% endhighlight %} - -{% highlight yaml %} -# alerts/example-redis.yml - -groups: - -- name: ExampleRedisGroup - rules: - - alert: ExampleRedisDown - expr: redis_up{} == 0 - for: 2m - labels: - severity: critical - annotations: - summary: "Redis instance down" - description: "Whatever" - -{% endhighlight %} - -## AlertManager configuration - -{% highlight yaml %} -{% raw %} -# alertmanager.yml - -route: - # When a new group of alerts is created by an incoming alert, wait at - # least 'group_wait' to send the initial notification. - # This way ensures that you get multiple alerts for the same group that start - # firing shortly after another are batched together on the first - # notification. - group_wait: 10s - - # When the first notification was sent, wait 'group_interval' to send a batch - # of new alerts that started firing for that group. - group_interval: 30s - - # If an alert has successfully been sent, wait 'repeat_interval' to - # resend them. - repeat_interval: 30m - - # A default receiver - receiver: "slack" - - # All the above attributes are inherited by all child routes and can - # overwritten on each. - routes: - - receiver: "slack" - group_wait: 10s - match_re: - severity: critical|warning - continue: true - - - receiver: "pager" - group_wait: 10s - match_re: - severity: critical - continue: true - -receivers: - - name: "slack" - slack_configs: - - api_url: 'https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/xxxxxxxxxxxxxxxxxxxxxxxxxxx' - send_resolved: true - channel: 'monitoring' - text: "{{ range .Alerts }} {{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}" - - - name: "pager" - webhook_configs: - - url: http://a.b.c.d:8080/send/sms - send_resolved: true - -{% endraw %} -{% endhighlight %} - -## Reduce Prometheus server load - -For expansive or frequent PromQL queries, Prometheus allows to precompute rules. - -{% highlight yaml %} -{% raw %} -groups: - - # first define the recorded rule - - name: ExampleRecordedGroup - rules: - - record: job:rabbitmq_queue_messages_delivered_total:rate:5m - expr: rate(rabbitmq_queue_messages_delivered_total[5m]) - - # then use it in alerts - - name: ExampleAlertingGroup - rules: - - alert: ExampleRabbitmqLowMessageDelivery - expr: sum(job:rabbitmq_queue_messages_delivered_total:rate:5m) < 10 - for: 2m - labels: - severity: critical - annotations: - summary: "Low delivery rate in Rabbitmq queues" -{% endraw %} -{% endhighlight %} - -## Troubleshooting - -If the notification takes too much time to be triggered, check the following delays: -- `scrape_interval = 20s` (prometheus.yml) -- `evaluation_interval = 20s` (prometheus.yml) -- `increase(mysql_global_status_slow_queries[1m]) > 0` (alerts/example-mysql.yml) -- `for: 5m` (alerts/example-mysql.yml) -- `group_wait = 10s` (alertmanager.yml) - -Also read: -- [https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html](https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html). -- [https://hodovi.cc/blog/creating-awesome-alertmanager-templates-for-slack/](https://hodovi.cc/blog/creating-awesome-alertmanager-templates-for-slack/) -- [https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to-efficiently-detect-anomalies-at-scale/](https://grafana.com/blog/2024/10/03/how-to-use-prometheus-to-efficiently-detect-anomalies-at-scale/) diff --git a/blackbox-exporter.md b/blackbox-exporter.md deleted file mode 100644 index 4782d36..0000000 --- a/blackbox-exporter.md +++ /dev/null @@ -1,125 +0,0 @@ - -

- Blackbox exporter -

- -## Wordwide probes - -Blackbox Exporter gives you the ability to probe endpoints over HTTP, HTTPS, DNS, TCP and ICMP. - -You should deploy blackbox exporters in multiple Point of Presence around the globe, to monitor latency. Feel free to use the following endpoints for your own projects: - -- https://probe-montreal.cleverapps.io -- https://probe-paris.cleverapps.io -- https://probe-jeddah.cleverapps.io -- https://probe-singapore.cleverapps.io -- https://probe-sydney.cleverapps.io -- https://probe-warsaw.cleverapps.io - -☝️ Logs have been disabled. More probes from the community would be appreciated, please contribute here! These blackbox exporters use the following configuration. - -## Prometheus Configuration - -Blackbox exporters and endpoints must be declared in Prometheus. Here is a simple configuration, inspired by [Hayk Davtyan medium post](https://medium.com/geekculture/single-prometheus-job-for-dozens-of-blackbox-exporters-2a7ba492d6c8): - -```yml -# sd/blackbox.yml - -- targets: - # - # Montreal - # - # http - - probe-montreal.cleverapps.io:_:http_2xx:_:Montreal:_:f229cy:_:https://api.screeb.app - - probe-montreal.cleverapps.io:_:http_2xx:_:Montreal:_:f229cy:_:https://t.screeb.app/tag.js - # icmp - - probe-montreal.cleverapps.io:_:icmp_ipv4:_:Montreal:_:f229cy:_:api.screeb.app - - probe-montreal.cleverapps.io:_:icmp_ipv4:_:Montreal:_:f229cy:_:t.screeb.app - - - # - # Paris - # - # http - - probe-paris.cleverapps.io:_:http_2xx:_:Paris:_:u09tgy:_:https://api.screeb.app - - probe-paris.cleverapps.io:_:http_2xx:_:Paris:_:u09tgy:_:https://t.screeb.app/tag.js - # icmp - - probe-paris.cleverapps.io:_:icmp_ipv4:_:Paris:_:u09tgy:_:api.screeb.app - - probe-paris.cleverapps.io:_:icmp_ipv4:_:Paris:_:u09tgy:_:t.screeb.app - - - # - # Sydney - # - # http - - probe-sydney.cleverapps.io:_:http_2xx:_:Sydney:_:r3gpkn:_:https://api.screeb.app - - probe-sydney.cleverapps.io:_:http_2xx:_:Sydney:_:r3gpkn:_:https://t.screeb.app/tag.js - # icmp - - probe-sydney.cleverapps.io:_:icmp_ipv4:_:Sydney:_:r3gpkn:_:api.screeb.app - - probe-sydney.cleverapps.io:_:icmp_ipv4:_:Sydney:_:r3gpkn:_:t.screeb.app - - # ... -``` - -```yml -# prometheus.yml - -global: - # ... - -scrape_configs: - - - job_name: 'blackbox' - metrics_path: /probe - scrape_interval: 30s - scheme: https - file_sd_configs: - - files: - - /etc/prometheus/sd/blackbox.yml - relabel_configs: - # adds "module" label in the final labelset - - source_labels: [__address__] - regex: '.*:_:(.*):_:.*:_:.*:_:.*' - target_label: module - # adds "geohash" label in the final labelset - - source_labels: [__address__] - regex: '.*:_:.*:_:.*:_:(.*):_:.*' - target_label: geohash - # rewrites "instance" label with corresponding URL - - source_labels: [__address__] - regex: '.*:_:.*:_:.*:_:.*:_:(.*)' - target_label: instance - # rewrites "pop" label with corresponding location name - - source_labels: [__address__] - regex: '.*:_:.*:_:(.*):_:.*:_:.*' - target_label: pop - # passes "module" parameter to Blackbox exporter - - source_labels: [module] - target_label: __param_module - # passes "target" parameter to Blackbox exporter - - source_labels: [instance] - target_label: __param_target - # the Blackbox exporter's real hostname:port - - source_labels: [__address__] - regex: '(.*):_:.*:_:.*:_:.*:_:.*' - target_label: __address__ - - # ... - -``` - -## Geohash - -![](assets/grafana-map-panel.png) - -To display nice maps in Grafana, you need to instruct blackbox exporters about the location. Grafana map panel speaks the "geohash" format: - -- go to google map -- extract the lat/long from the url -- convert lat/long to geohash here: http://geohash.co - -## Grafana - -Some great dashboard have been created by the community: https://grafana.com/grafana/dashboards/?search=blackbox - -Since Grafana v5.0.0, a map panel is available: https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/geomap/ diff --git a/docker-compose.yml b/docker-compose.yml deleted file mode 100644 index 4e55dc2..0000000 --- a/docker-compose.yml +++ /dev/null @@ -1,11 +0,0 @@ -version: '3' - -services: - - jekyll: - image: jekyll/jekyll:latest - command: jekyll serve - volumes: - - ./:/srv/jekyll - ports: - - 4000:4000 diff --git a/index.md b/index.md deleted file mode 100644 index 715f9d1..0000000 --- a/index.md +++ /dev/null @@ -1,54 +0,0 @@ - - - - -![Prometheus logo](/assets/prometheus-logo.png){: .center-image } - - -

- Hello world -

- - - AlertManager configuration - - - - Alerting time window - - -

- Out of the box prometheus alerting rules -

- - diff --git a/rules.md b/rules.md deleted file mode 100644 index c4978e7..0000000 --- a/rules.md +++ /dev/null @@ -1,141 +0,0 @@ - - - -
-

⚠️ Caution ⚠️

- -

- Alert thresholds depend on nature of applications. -
- Some queries in this page may have arbitrary tolerance threshold. -

- Building an efficient and battle-tested monitoring platform takes time. 😉 -

-
- -
-
- -

- - - - - - - -
-

Menu

- - - -
diff --git a/sleep-peacefully.md b/sleep-peacefully.md deleted file mode 100644 index 6ae0d9a..0000000 --- a/sleep-peacefully.md +++ /dev/null @@ -1,106 +0,0 @@ -

- Sleep Peacefully -

- -## Alerting time window - -In some applications, load and activity can vary over the day/week/year. - -In order to prevent alarm fatigue and busy pager, alerts can be disabled during a period of time (such as night or weekend). - -Example: - -- Weekday: `node_load5 > 10 and ON() (0 < day_of_week() < 6)` -- Day time: `node_load5 > 10 and ON() (8 < hour() < 18)` -- Exclude December: `node_load5 > 10 and ON() (month() != 12)` - -## Advanced time windows and timezones - -```yml -# rules.yml - -groups: - - name: timezones - rules: - - record: european_summer_time_offset - expr: | - (vector(1) and (month() > 3 and month() < 10)) - or - (vector(1) and (month() == 3 and (day_of_month() - day_of_week()) >= 25) and absent((day_of_month() >= 25) and (day_of_week() == 0))) - or - (vector(1) and (month() == 10 and (day_of_month() - day_of_week()) < 25) and absent((day_of_month() >= 25) and (day_of_week() == 0))) - or - (vector(1) and ((month() == 10 and hour() < 1) or (month() == 3 and hour() > 0)) and ((day_of_month() >= 25) and (day_of_week() == 0))) - or - vector(0) - - - record: europe_london_time - expr: time() + 3600 * european_summer_time_offset - - record: europe_paris_time - expr: time() + 3600 * (1 + european_summer_time_offset) - - - record: europe_london_hour - expr: hour(europe_london_time) - - record: europe_paris_hour - expr: hour(europe_paris_time) - - - record: europe_london_weekday - expr: 0 < day_of_week(europe_london_time) < 6 - - record: europe_paris_weekday - expr: 0 < day_of_week(europe_paris_time) < 6 - # opposite - - record: not_europe_london_weekday - expr: absent(europe_london_weekday) - - record: not_europe_paris_weekday - expr: absent(europe_paris_weekday) - - - record: europe_london_business_hours - expr: 9 <= europe_london_hour < 18 - - record: europe_paris_business_hours - expr: 9 <= europe_paris_hour < 18 - # opposite - - record: not_europe_london_business_hours - expr: absent(europe_london_business_hours) - - record: not_europe_paris_business_hours - expr: absent(europe_paris_business_hours) - - # new year's day / xmas / labor day / all saints' day / ... - - record: europe_french_public_holidays - expr: | - (vector(1) and month(europe_paris_time) == 1 and day_of_month(europe_paris_time) == 1) - or - (vector(1) and month(europe_paris_time) == 12 and day_of_month(europe_paris_time) == 25) - or - (vector(1) and month(europe_paris_time) == 5 and day_of_month(europe_paris_time) == 1) - or - (vector(1) and month(europe_paris_time) == 11 and day_of_month(europe_paris_time) == 1) - or - vector(0) - # opposite - - record: not_europe_french_public_holidays - expr: absent(europe_french_public_holidays) -``` - -```yml -# alerts.yml - -groups: - - name: CPU Load - rules: - - alert: HighLoadQuietDuringWeekendAndNight - expr: node_load5 > 10 and ON() (europe_london_weekday and europe_paris_weekday) - - - alert: HighLoadQuietDuringBackup - expr: node_load5 > 10 and ON() absent(hour() == 2) - - - alert: HighLoad - expr: | - node_load5 > 20 and ON() (europe_london_weekday and europe_paris_weekday) - or - node_load5 > 10 -``` - -## Sources - -- [https://medium.com/@tom.fawcett/time-of-day-based-notifications-with-prometheus-and-alertmanager-1bf7a23b7695](https://medium.com/@tom.fawcett/time-of-day-based-notifications-with-prometheus-and-alertmanager-1bf7a23b7695) -- [https://promcon.io/2019-munich/slides/improved-alerting-with-prometheus-and-alertmanager.pdf](https://promcon.io/2019-munich/slides/improved-alerting-with-prometheus-and-alertmanager.pdf)