- Oct 12, 2022
-
-
Florian Sesser authored
Refs #129
-
- Sep 13, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
This comes from nixpgks commit 81291cc793cf88bd6eff3fd8512e5eb9d037066c and will be included with nixos 22.11.
-
Florian Sesser authored
-
- Sep 12, 2022
-
-
Florian Sesser authored
... and use a smarter Prometheus query to combine the two.
-
- Sep 10, 2022
-
-
Florian Sesser authored
-
- Sep 08, 2022
-
-
Florian Sesser authored
This should implement my actual intentions: Alert when backups run for longer than 3h, and the the repo check for more than 6h.
-
Florian Sesser authored
-
Florian Sesser authored
Forgot to add a second alert when I added the workaround to not count the ZFS ARC into used memory :/
-
Florian Sesser authored
... which is both governed by our retention policy.
-
- Sep 06, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 05, 2022
-
-
Florian Sesser authored
-
- Aug 31, 2022
-
-
Florian Sesser authored
This is a bit buggy still in our version of Grafana, but already nice to look at / maybe useful. Refs privatestorageops#429
-
Florian Sesser authored
This adds alerting to the backup job duration graph: Grafana alerting works with systemd unit metrics, i.e. a backup job unit being "active" for too long. Use that fact for alerting on long-running backup jobs.
-
Florian Sesser authored
... instead of connected lines default, also with working label for host Refs privatestorageops#429
-
Florian Sesser authored
Refs privatestorageops#429
-
- Aug 29, 2022
-
-
Florian Sesser authored
Failed backups now have a filled red area instead of a thin yellow line. Refs privatestorageops#429.
-
- Aug 17, 2022
-
-
Florian Sesser authored
, a dashboard that "displays a lot of data about one single host". This is
-
Florian Sesser authored
-
Florian Sesser authored
One query for hosts with ZFS and one for those without.
-
- Aug 16, 2022
-
-
Florian Sesser authored
Since ZoL frees ARC under memory pressure, let's not count it as "used" but instead as "free" memory.
-
- Aug 03, 2022
-
-
Florian Sesser authored
-
- Jul 11, 2022
-
-
Florian Sesser authored
-
- Jun 13, 2022
-
-
Florian Sesser authored
node_memory_MemAvailable_bytes is a better estimator than the sum I used before says some Prometheus documentation. It is also almost the same, but reads nicer.
-
- Apr 29, 2022
-
-
Florian Sesser authored
-
- Apr 13, 2022
-
-
Florian Sesser authored
-
- Mar 14, 2022
-
-
Florian Sesser authored
This should fix the current alerts for our RAID arrays. It's only "should" because I can't test it proper without said RAID arrays in the dev or staging machines.
-
- Feb 25, 2022
-
-
Florian Sesser authored
The newer "Time Series" panel does not support two axes.
-
- Feb 22, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
- Also alert on "negative" (== receiving) errors - Use 'rate' so we can get out of the reporting if situation normalizes
-
- Feb 17, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
Grafana 8 recommends these for added performance and capabilities.
-
- Feb 14, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
- (tryfix) Switch to Loki datasource; - Filter requests to metrics instead of all GET; - Show the last week by default - Show log times Interesting and not quickly fixable as it looks like: the GET ... line comes *after* the result. We might be logging this wrong to begin with, but probably this is some next gen AI line ordering intelligence
-