From f4ddfc665bff1bf337b377be1d4517ae58e92f31 Mon Sep 17 00:00:00 2001
From: "nuco.cloud" <t.adler@ironeaglecapital.com>
Date: Tue, 28 Apr 2026 15:59:58 +0200
Subject: [PATCH] Add LiteLLM section to Other group with 3 alerting rules
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

LiteLLM (https://github.com/BerriAI/litellm) is a popular LLM-gateway/proxy
that exposes Prometheus metrics via its built-in callback. There were no
existing alerting rules for LiteLLM in this repo, despite its growing
adoption as an OpenAI/Anthropic-compatible proxy.

Added 3 alerts covering the most common operational concerns:

1. **LiteLLM provider spend over budget** — soft-warning on cumulative
   24h spend per model-name regex. Useful when LiteLLM's native
   `provider_budget_config` hard-cap is unavailable, disabled, or
   buggy (e.g. BerriAI/litellm#26701).

2. **LiteLLM proxy failed requests rate high** — error-rate ratio
   alert for downstream LLM provider availability/auth issues.

3. **LiteLLM request latency p95 high** — histogram-quantile alert
   for downstream provider response-time degradation.

All 3 rules tested via `promtool check rules` (SUCCESS) and validated
on a real LiteLLM v1.83.7 production deployment.

Reference: https://docs.litellm.ai/docs/proxy/prometheus
---
 _data/rules.yml | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/_data/rules.yml b/_data/rules.yml
index 4f81ea6..25e65c4 100644
--- a/_data/rules.yml
+++ b/_data/rules.yml
@@ -5918,3 +5918,28 @@ groups:
                 severity: critical
                 comments: |
                   Threshold of 20ms. Adjust based on your expected database latency.
+
+      - name: LiteLLM
+        exporters:
+          - slug: embedded-exporter
+            doc_url: https://docs.litellm.ai/docs/proxy/prometheus
+            rules:
+              - name: LiteLLM provider spend over budget
+                description: "Cumulative spend for an LLM provider has exceeded the daily budget threshold. Replace the regex `(claude-|anthropic/).*` with your provider's model-name pattern. Useful as a soft-warning when `provider_budget_config` hard-cap is unavailable or disabled."
+                query: 'sum(increase(litellm_spend_metric_total{model=~"(claude-|anthropic/).*"}[24h])) > 1'
+                severity: warning
+                for: 5m
+                comments: |
+                  The threshold (1) is in USD. The `model` label carries the resolved model-name (post-routing). 
+                  PromQL `increase()` requires ≥2 datapoints with growth-difference to extrapolate positive — 
+                  for brand-new counter series this needs ≥2 distinct request bursts ≥1 scrape-cycle apart.
+              - name: LiteLLM proxy failed requests rate high
+                description: "LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues."
+                query: 'sum(rate(litellm_proxy_failed_requests_metric_total[5m])) / sum(rate(litellm_proxy_total_requests_metric_total[5m])) > 0.05'
+                severity: warning
+                for: 10m
+              - name: LiteLLM request latency p95 high
+                description: "LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth."
+                query: 'histogram_quantile(0.95, sum(rate(litellm_request_total_latency_metric_bucket[5m])) by (le)) > 10'
+                severity: warning
+                for: 10m