mirror of
https://github.com/samber/awesome-prometheus-alerts.git
synced 2026-06-20 16:46:37 +08:00
Add LiteLLM section to Other group with 3 alerting rules (#553)
LiteLLM (https://github.com/BerriAI/litellm) is a popular LLM-gateway/proxy that exposes Prometheus metrics via its built-in callback. There were no existing alerting rules for LiteLLM in this repo, despite its growing adoption as an OpenAI/Anthropic-compatible proxy. Added 3 alerts covering the most common operational concerns: 1. **LiteLLM provider spend over budget** — soft-warning on cumulative 24h spend per model-name regex. Useful when LiteLLM's native `provider_budget_config` hard-cap is unavailable, disabled, or buggy (e.g. BerriAI/litellm#26701). 2. **LiteLLM proxy failed requests rate high** — error-rate ratio alert for downstream LLM provider availability/auth issues. 3. **LiteLLM request latency p95 high** — histogram-quantile alert for downstream provider response-time degradation. All 3 rules tested via `promtool check rules` (SUCCESS) and validated on a real LiteLLM v1.83.7 production deployment. Reference: https://docs.litellm.ai/docs/proxy/prometheus
This commit is contained in:
parent
8ca1fe591f
commit
4c9da9ed24
1 changed files with 25 additions and 0 deletions
|
|
@ -5918,3 +5918,28 @@ groups:
|
||||||
severity: critical
|
severity: critical
|
||||||
comments: |
|
comments: |
|
||||||
Threshold of 20ms. Adjust based on your expected database latency.
|
Threshold of 20ms. Adjust based on your expected database latency.
|
||||||
|
|
||||||
|
- name: LiteLLM
|
||||||
|
exporters:
|
||||||
|
- slug: embedded-exporter
|
||||||
|
doc_url: https://docs.litellm.ai/docs/proxy/prometheus
|
||||||
|
rules:
|
||||||
|
- name: LiteLLM provider spend over budget
|
||||||
|
description: "Cumulative spend for an LLM provider has exceeded the daily budget threshold. Replace the regex `(claude-|anthropic/).*` with your provider's model-name pattern. Useful as a soft-warning when `provider_budget_config` hard-cap is unavailable or disabled."
|
||||||
|
query: 'sum(increase(litellm_spend_metric_total{model=~"(claude-|anthropic/).*"}[24h])) > 1'
|
||||||
|
severity: warning
|
||||||
|
for: 5m
|
||||||
|
comments: |
|
||||||
|
The threshold (1) is in USD. The `model` label carries the resolved model-name (post-routing).
|
||||||
|
PromQL `increase()` requires ≥2 datapoints with growth-difference to extrapolate positive —
|
||||||
|
for brand-new counter series this needs ≥2 distinct request bursts ≥1 scrape-cycle apart.
|
||||||
|
- name: LiteLLM proxy failed requests rate high
|
||||||
|
description: "LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues."
|
||||||
|
query: 'sum(rate(litellm_proxy_failed_requests_metric_total[5m])) / sum(rate(litellm_proxy_total_requests_metric_total[5m])) > 0.05'
|
||||||
|
severity: warning
|
||||||
|
for: 10m
|
||||||
|
- name: LiteLLM request latency p95 high
|
||||||
|
description: "LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth."
|
||||||
|
query: 'histogram_quantile(0.95, sum(rate(litellm_request_total_latency_metric_bucket[5m])) by (le)) > 10'
|
||||||
|
severity: warning
|
||||||
|
for: 10m
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue