Samuel Berthe
bb055773b4
feat: add GitHub star nudges across the site
...
- Prepend attribution comment to "Copy all" exporter clipboard
- Show inline ⭐ Star nudge on individual rule copy (3s, dismisses automatically)
- Change StatsBar stars label to "engineers starred" for social proof
- Add milestone progress bar toward 10k stars in StatsBar
- Fix header/StatsBar showing "0" when SSR GitHub API fetch fails (use "—" placeholder)
2026-04-14 21:52:27 +02:00
Samuel Berthe
d38511d7cb
chore: generate pagefind index at build time, not committed to git
...
- Add pagefind run step to build script in site/package.json
- Add site/public/pagefind/ to .gitignore (generated at deploy time)
2026-04-14 20:33:29 +02:00
Samuel Berthe
a56d8cf2a4
feat: refine star toast — brand orange, idle trigger, 15s auto-hide
...
- Style: brand orange background with white text (visible on any bg)
- Trigger: every 5 copies OR after 10 minutes of inactivity on page
- Auto-hide: 15s (reset if toast re-triggers before expiry)
- Idle timer resets on each copy
2026-04-14 20:30:08 +02:00
Samuel Berthe
25418c5db2
feat: add star nudge toast after every 5 rule copies
...
Show a dismissible toast (bottom-right, 20s auto-hide) nudging users
to star the GitHub repo. Fires every 5 copies via a sessionStorage
counter. CopyButton dispatches a copy-success custom event; StarToast
listens for it and manages display logic.
2026-04-14 20:09:30 +02:00
Samuel Berthe
5366d4b9ae
fix: replace invalid top-level return with isFresh flag in star scripts
...
Top-level return is a syntax error in ES modules. Replace the early
return pattern with an isFresh boolean guard. Also revert the hero
"Star on GitHub" button change.
2026-04-14 19:59:36 +02:00
Samuel Berthe
1f8bcca779
feat: add GitHub stars to StatsBar and fix cache early-return
...
Add a 4th stat (⭐ GitHub stars) to StatsBar with build-time fallback
and live client-side fetch. Both Header and StatsBar share the same
sessionStorage cache key and skip the API call when the cache is fresh
(1h TTL), reducing fetches to at most one per session.
2026-04-14 19:51:12 +02:00
Samuel Berthe
954999dfa9
feat: replace GitHub icon with Star button and live star count
...
Replace the plain GitHub icon+count in the header with a proper two-zone
star button (★ Star | 8.4k). The count is seeded at build time from the
GitHub API and refreshed client-side on page load with a 1-hour
sessionStorage cache.
2026-04-14 19:47:49 +02:00
Samuel Berthe
297fd9864c
fix: use https in CC BY URL and trigger site build on _data changes
2026-04-14 16:27:01 +02:00
Samuel Berthe
5c166e8403
docs: update tagline and clean up README
2026-04-10 21:45:27 +02:00
Samuel Berthe
ab87fdcf30
feat/dual license ( #550 )
...
* ci: remove node version pin in site build workflow
* docs: clarify dual license (CC BY 4.0 for content, MIT for site code)
Alert rules and content (_data/rules.yml, dist/) are licensed under
Creative Commons CC BY 4.0. The site source code (site/) is licensed
under MIT. Both are now documented in LICENSE, site/LICENSE, the footer,
and the FAQ.
2026-04-10 21:36:57 +02:00
Samuel Berthe
aa7d93ce95
chore: migrate assets/ to site/public/images/ ( #549 )
...
Remove legacy assets/ directory (pre-Astro era). Images were already
duplicated under site/public/images/; update README sponsor URLs to
point to the new location.
2026-04-10 21:28:38 +02:00
Samuel Berthe
a4d0b1370c
ci: add site build workflow ( #548 )
2026-04-10 21:21:04 +02:00
dependabot[bot]
d31b3f9ba0
build(deps): bump @iconify-json/lucide from 1.2.101 to 1.2.102 in /site ( #545 )
...
Bumps [@iconify-json/lucide](https://github.com/iconify/icon-sets ) from 1.2.101 to 1.2.102.
- [Commits](https://github.com/iconify/icon-sets/commits )
---
updated-dependencies:
- dependency-name: "@iconify-json/lucide"
dependency-version: 1.2.102
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:19:49 +02:00
Samuel Berthe
89d8423d93
build(deps): migrate from @astrojs/tailwind to @tailwindcss/vite for Tailwind v4 ( #547 )
...
@astrojs/tailwind v6 still requires tailwindcss@^3; replace it with the
official @tailwindcss/vite Vite plugin. Update global.css to v4 syntax
(@import "tailwindcss", @custom-variant dark, @theme tokens) and drop
the now-unused tailwind.config.mjs.
2026-04-10 21:18:13 +02:00
dependabot[bot]
814dd5d3fb
build(deps): bump @astrojs/tailwind from 5.1.5 to 6.0.2 in /site ( #543 )
...
Bumps [@astrojs/tailwind](https://github.com/withastro/astro/tree/HEAD/packages/integrations/tailwind ) from 5.1.5 to 6.0.2.
- [Release notes](https://github.com/withastro/astro/releases )
- [Changelog](https://github.com/withastro/astro/blob/main/packages/integrations/tailwind/CHANGELOG.md )
- [Commits](https://github.com/withastro/astro/commits/@astrojs/tailwind@6.0.2/packages/integrations/tailwind )
---
updated-dependencies:
- dependency-name: "@astrojs/tailwind"
dependency-version: 6.0.2
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:11:05 +02:00
dependabot[bot]
e6ea45aec1
build(deps): bump tailwindcss from 3.4.19 to 4.2.2 in /site ( #544 )
...
Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss ) from 3.4.19 to 4.2.2.
- [Release notes](https://github.com/tailwindlabs/tailwindcss/releases )
- [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md )
- [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.2.2/packages/tailwindcss )
---
updated-dependencies:
- dependency-name: tailwindcss
dependency-version: 4.2.2
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:10:56 +02:00
dependabot[bot]
bea2dc45b4
build(deps): bump actions/upload-pages-artifact from 3 to 4 ( #540 )
...
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact ) from 3 to 4.
- [Release notes](https://github.com/actions/upload-pages-artifact/releases )
- [Commits](https://github.com/actions/upload-pages-artifact/compare/v3...v4 )
---
updated-dependencies:
- dependency-name: actions/upload-pages-artifact
dependency-version: '4'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:10:38 +02:00
dependabot[bot]
dd0c8372f9
build(deps): bump actions/setup-node from 4 to 6 ( #541 )
...
Bumps [actions/setup-node](https://github.com/actions/setup-node ) from 4 to 6.
- [Release notes](https://github.com/actions/setup-node/releases )
- [Commits](https://github.com/actions/setup-node/compare/v4...v6 )
---
updated-dependencies:
- dependency-name: actions/setup-node
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:10:31 +02:00
dependabot[bot]
132329abd8
build(deps): bump actions/deploy-pages from 4 to 5 ( #542 )
...
Bumps [actions/deploy-pages](https://github.com/actions/deploy-pages ) from 4 to 5.
- [Release notes](https://github.com/actions/deploy-pages/releases )
- [Commits](https://github.com/actions/deploy-pages/compare/v4...v5 )
---
updated-dependencies:
- dependency-name: actions/deploy-pages
dependency-version: '5'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:10:24 +02:00
dependabot[bot]
9e80bb910e
build(deps): bump actions/checkout from 4 to 6 ( #539 )
...
Bumps [actions/checkout](https://github.com/actions/checkout ) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases )
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md )
- [Commits](https://github.com/actions/checkout/compare/v4...v6 )
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-10 21:10:09 +02:00
Samuel Berthe
79afa21610
feat/astro migration ( #538 )
...
* feat: migrate website from Jekyll to Astro
Rebuilds the site using Astro (SSG) with Tailwind CSS v4, replacing the
Jekyll/Cayman theme. Key changes:
- Splits the monolithic /rules page into 110 statically-generated pages
(92 per-service + 13 group index + homepage + guide pages) for SEO
- URL structure: /rules/[group-slug]/[service-slug]/ with backward-
compatibility redirect map for old anchor-based URLs (/rules#redis)
- Modern UI: Prometheus-orange accent, dark mode (system + toggle),
sticky sidebar, responsive layout, copy-to-clipboard per rule/section
- SEO: per-page <title>, <meta description>, Open Graph, Twitter Card,
canonical URLs, sitemap.xml via @astrojs/sitemap
- GEO: FAQPage JSON-LD schema on each service page (rules as Q&A pairs
for AI search engines), TechArticle schema, BreadcrumbList
- Search: Pagefind (build-time index, lazy-loaded, ~200KB)
- Zero JS by default; copy buttons and theme toggle use inline scripts
- New CI: .github/workflows/deploy.yml builds Astro + Pagefind and
deploys to GitHub Pages via actions/deploy-pages
- Existing dist.yml and test.yml workflows are untouched
- _data/rules.yml remains the single source of truth
Note: GitHub Pages source must be changed from "Build from branch"
(Jekyll) to "GitHub Actions" in repository settings.
* doc: new website based on astro
* refactor: remove previous website
* chore: add npm dependabot for Astro site + scope CI to _data changes
* Update site/astro.config.mjs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update site/src/components/CopyButton.astro
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* oops
* fix: strip trailing slash from BASE_URL to prevent double slashes in URLs
Agent-Logs-Url: https://github.com/samber/awesome-prometheus-alerts/sessions/c85937ba-1855-4b8a-a72b-847eab1c8639
Co-authored-by: samber <2951285+samber@users.noreply.github.com>
* fix: resolve Astro build errors in astro.config.mjs
- Remove assetsInclude yml which caused Vite to treat YAML files as static assets instead of running them through the custom YAML transform plugin; data.groups was undefined at runtime because the import resolved to a URL rather than parsed content
- Deduplicate old-path redirects: emit only the slash-less variant per service to avoid Astro router collision warnings (trailing-slash variant is handled automatically)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: samber <2951285+samber@users.noreply.github.com>
2026-04-10 21:08:06 +02:00
samber
0d148832d3
Publish
2026-04-06 19:14:46 +00:00
Samuel Berthe
c2615fae52
fix/promql rules review 2 ( #534 )
...
* fix(data): fix queries and thresholds across multiple exporters
- Ceph: fix OSD latency metric name (ceph_osd_apply_latency_ms), replace
ceph_osd_utilization with ceph_health_detail{name="OSD_NEARFULL"}, add for: durations
- ZFS: improve description, remove incorrect ON() join on readonly check
- Thanos: filter gRPC errors to actual error codes only (drop NotFound, Cancelled, etc.)
- Loki/Promtail: fix histogram_quantile to aggregate by (namespace, job, route, le)
- Mimir: raise rate()>0 thresholds to >0.05, add missing for: durations
- OTel Collector: raise rate()>0 thresholds to >0.05, add deprecation comments
- Tempo/Cortex: raise >0 thresholds to avoid transient spikes
- APC UPS: add division-by-zero guard on battery voltage ratio
- DigitalOcean: raise increase()>0 to >3
- Grafana Alloy: fix missing name: field on exporter
- Graph Node: add threshold comments
* fix(data): remove official mixin reference from Ceph OSD comment
* fix(data): remove official mixin references from comments
2026-04-06 21:14:15 +02:00
Samuel Berthe
72c9e922c0
docs: update CLAUDE.md with lessons from PromQL review
...
Add guidance on untyped metrics with counter semantics (node_vmstat_*,
MySQL SHOW GLOBAL STATUS), YAML duplicate key pitfall, permanent-firing
cumulative counter alerts, and updated category structure.
2026-04-06 21:08:48 +02:00
samber
ed1515015a
Publish
2026-04-06 18:38:45 +00:00
Samuel Berthe
2258835c30
fix/promql rules review ( #533 )
...
* fix(data): comprehensive PromQL review across all ~937 rules
Query fixes:
- Replace rate()/increase() with deriv()/delta() on gauge metrics exposed
as untyped by exporters (node_vmstat_oom_kill, mysql_global_status_*,
systemd_socket_refused_connections_total)
- Fix Ceph OSD latency metric name: ceph_osd_perf_apply_latency_seconds
→ ceph_osd_apply_latency_ms (Ceph MGR Prometheus module)
- Fix NATS subscriptions metric: gnatsd_connz_subscriptions (per-conn)
→ gnatsd_varz_subscriptions (server total)
- Fix Caddy reverse proxy down query: count()==0 → direct gauge == 0
- Fix RabbitMQ total connections metric: connectionsTotal → connections
- Fix Cilium ClusterMesh/KVStoreMesh: deriv() on failure gauge → direct
gauge comparison (deriv > 0 misses stable non-zero failure states)
- Fix cert-manager ACME metric name: certmanager_http_acme_client_request_count
→ certmanager_acme_client_request_count (renamed in v1.19+)
- Fix Thanos Query gRPC filter: grpc_code!="OK" → explicit error codes
- Fix Flink duplicate comments: field (YAML last-write-wins bug)
- Add datid!="0" filter to PostgreSQL dead locks query
- Fix PostgreSQL high rollback rate: restructure division-by-zero guard
and move ratio calculation outside sum()
- Add division-by-zero guards: Container Low CPU, Hadoop ResourceManager
memory, Hadoop HBase heap, Vault cluster health
- Add for: 1m to Blackbox probe failed/HTTP failure and Ceph State/
OSD Down/PG unavailable
Threshold fixes:
- Replace > 0 with meaningful thresholds on rate()/increase() queries
across: Alertmanager, eBPF decoder errors, systemd refused connections,
Memcached, Cassandra (Instaclustr + Criteo), ClickHouse distributed
inserts, CouchDB log entries, HAProxy healthcheck failures, RabbitMQ
unroutable messages, Spinnaker, Cilium, Mimir TSDB/alertmanager,
OTel Collector receiver refused metrics
- Fix Elasticsearch High Indexing Latency threshold: 0.0005s → 0.01s
(0.5ms was below normal operating range; 10ms is more realistic)
Description fixes:
- Fix MySQL slow queries: remove duplicate "mysql" word
- Fix SMART device description: remove trailing stray ")" (6 rules)
- Fix host disk IO description: remove duplicate "Check storage for issues."
- Fix EDAC correctable errors: "last 5 minutes" → "last 1 minute"
- Fix EDAC uncorrectable errors: remove time-window claim (raw counter)
- Fix Mimir store-gateway sync description: said "10 minutes" but
threshold is 1800s (30 minutes)
- Fix Vault description false "%" suffix on count values
- Improve descriptions across RabbitMQ, Zookeeper, Kafka, Pulsar, Envoy,
Istio rules to include {{ $labels }} and {{ $value }} template vars
- Downgrade Cassandra key cache hit rate: critical → warning
Comments:
- Add note on node_vmstat_oom_kill gauge type (delta vs increase)
- Add note on systemd_socket_refused_connections_total gauge type
- Add note on mysql_global_status_* gauge type (delta/deriv vs rate)
- Add note on pg_txid_current requiring a custom postgres_exporter query
- Add note on pg_stat_ssl_compression availability (PG 9.5-13 only)
- Add note on cert-manager legacy metric name for users on v1.18 and older
- Add threshold rationale for Elasticsearch, Cassandra, CouchDB rules
- Add note on NATS leaf node spurious fires when leaf nodes not configured
* fix(data): PromQL type fixes, job filter cleanup, query correctness review
- Replace rate()/increase() with deriv()/delta() on gauge metrics:
node_vmstat_pgmajfault, cassandra_stats (criteo exporter),
gitlab_ci_pipeline_failure_reasons, flink_taskmanager_job_task_numRecordsIn
- Fix histogram_quantile on non-_bucket metric: cilium_policy_implementation_delay
- Fix Thanos bucket replicate latency: use _count instead of _bucket for guard clause
- Fix Thanos query latency: use _count instead of _bucket for guard clause
- Restore job filter in Thanos objstore guard clauses (compact + store)
- Remove redundant job= filters from unique metrics: ~30 Thanos rules,
kube_persistentvolume_status_phase, otelcol_process_runtime_*
- Fix high-cardinality Istio latency grouping (drop source labels from by())
- Add division-by-zero guard to host context switch ratio
- Raise noisy ClickHouse thresholds: RejectedInserts > 2, DelayedInserts > 10
- Remove redundant for: 1m from HAProxy check failure rules
- Add job rename comments to up{job=...} rules (Hadoop, OpenStack, SNMP, OTel)
- Remove external mixin references from comments
- Fix Tempo dropped spans metric name: add missing _total suffix
- Fix Thanos bucket replicate run latency: add missing le label in by()
2026-04-06 20:38:12 +02:00
Samuel Berthe
b8fd051a55
Update README.md
2026-03-31 16:41:19 +02:00
samber
87d0610246
Publish
2026-03-31 14:40:08 +00:00
Emil Bostijancic
7ba6b2d367
feat: add OpenSearch alerting rules (OpenSearch exporter plugin) ( #532 )
2026-03-31 16:39:38 +02:00
dependabot[bot]
b13d59bce6
build(deps-dev): bump activesupport from 7.2.3 to 7.2.3.1 ( #531 )
...
Bumps [activesupport](https://github.com/rails/rails ) from 7.2.3 to 7.2.3.1.
- [Release notes](https://github.com/rails/rails/releases )
- [Changelog](https://github.com/rails/rails/blob/v8.1.2.1/activesupport/CHANGELOG.md )
- [Commits](https://github.com/rails/rails/compare/v7.2.3...v7.2.3.1 )
---
updated-dependencies:
- dependency-name: activesupport
dependency-version: 7.2.3.1
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 08:24:41 +01:00
dependabot[bot]
9d9c648cdd
build(deps-dev): bump json from 2.18.1 to 2.19.2 ( #530 )
...
Bumps [json](https://github.com/ruby/json ) from 2.18.1 to 2.19.2.
- [Release notes](https://github.com/ruby/json/releases )
- [Changelog](https://github.com/ruby/json/blob/master/CHANGES.md )
- [Commits](https://github.com/ruby/json/compare/v2.18.1...v2.19.2 )
---
updated-dependencies:
- dependency-name: json
dependency-version: 2.19.2
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-20 18:22:18 +01:00
samber
af2f277830
Publish
2026-03-18 20:41:01 +00:00
Samuel Berthe
e3a7165a65
fix(data): remove malformed summary fields, replace increase() by rate(), remove redundant avg_over_time
2026-03-18 21:40:30 +01:00
samber
c0e1f7a5f5
Publish
2026-03-18 17:06:34 +00:00
Samuel Berthe
1aafa40913
fix(data): prevent division by 0
2026-03-18 18:06:00 +01:00
samber
4fb1aa9ae4
Publish
2026-03-18 11:23:25 +00:00
Samuel Berthe
a4581ed322
fix(data): fix tresholds, comments, intervals, units... ( #529 )
2026-03-18 12:22:55 +01:00
samber
f36c23e393
Publish
2026-03-17 12:30:42 +00:00
Samuel Berthe
03963ef6f9
refactor(categories): change categories and move some exporters ( #528 )
2026-03-17 13:30:13 +01:00
Samuel Berthe
06f8b048a3
fix ci
2026-03-16 19:17:05 +01:00
Samuel Berthe
5d099fcae1
fix ci
2026-03-16 17:44:00 +01:00
samber
9d00396bc8
Publish
2026-03-16 16:11:31 +00:00
Samuel Berthe
2b99cf1f76
Feat/cilium alerting rules ( #526 )
...
* Add .worktrees/ to .gitignore
* feat: add Cilium alerting rules (32 rules across agent, operator, ClusterMesh, KVStoreMesh, Hubble)
* fix: use job label instead of k8s_app, switch to single-quoted YAML strings
* remove Cilium agent high restart rate alert
2026-03-16 17:10:59 +01:00
samber
e8eb75c2e2
Publish
2026-03-16 15:53:03 +00:00
Samuel Berthe
5071e01ad9
Feature/spinnaker alerts ( #527 )
...
* Add .worktrees/ to .gitignore
* feat: add Spinnaker alerting rules (12 rules)
Add Prometheus alerting rules for Spinnaker built-in exporter
covering Orca queue health, circuit breakers, Igor polling monitors,
Gate API throttling, Clouddriver errors, and AWS rate limiting.
Metric names validated against uneeq-oss/spinnaker-mixin dashboards.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-16 16:52:31 +01:00
samber
6423f93ba7
Publish
2026-03-16 15:40:08 +00:00
Samuel Berthe
1455e0fd77
feat: add Oracle Database alerting rules (8 rules) ( #525 )
...
Add Prometheus alerting rules for Oracle Database using iamseth/oracledb_exporter.
Rules based on Grafana oracledb-mixin and exporter default metrics:
- DB down, session/process limit, tablespace capacity (warning+critical),
high rollbacks, active sessions, user I/O wait time.
2026-03-16 16:39:35 +01:00
samber
ba5c9a3280
Publish
2026-03-16 14:01:45 +00:00
Samuel Berthe
d8315eb3bc
Feature/cert manager rules ( #524 )
...
* Add .worktrees/ to .gitignore
* feat: add cert-manager alerting rules (4 rules)
Add Prometheus alerting rules for cert-manager under the
"Network, security and storage" category:
- Cert-Manager absent (service down detection)
- Certificate expiring soon (21-day threshold)
- Certificate not ready (readiness check)
- Hitting ACME rate limits (rate limit detection)
Based on imusmanmalik/cert-manager-mixin and official
cert-manager metrics documentation.
* docs: add cert-manager to README
2026-03-16 15:01:07 +01:00
samber
7f346ede99
Publish
2026-03-16 13:37:19 +00:00