- Sep 08, 2022
-
-
Florian Sesser authored
This should implement my actual intentions: Alert when backups run for longer than 3h, and the the repo check for more than 6h.
-
Florian Sesser authored
Fixes privatestorageops#287
-
Florian Sesser authored
Forgot to add a second alert when I added the workaround to not count the ZFS ARC into used memory :/
-
Florian Sesser authored
... which is both governed by our retention policy.
-
- Sep 07, 2022
-
-
Florian Sesser authored
-
-
- Sep 06, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 05, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Aug 31, 2022
-
-
Florian Sesser authored
This is a bit buggy still in our version of Grafana, but already nice to look at / maybe useful. Refs privatestorageops#429
-
Florian Sesser authored
This adds alerting to the backup job duration graph: Grafana alerting works with systemd unit metrics, i.e. a backup job unit being "active" for too long. Use that fact for alerting on long-running backup jobs.
-
Florian Sesser authored
... instead of connected lines default, also with working label for host Refs privatestorageops#429
-
Florian Sesser authored
Refs privatestorageops#429
-
- Aug 29, 2022
-
-
Florian Sesser authored
Failed backups now have a filled red area instead of a thin yellow line. Refs privatestorageops#429.
-
- Aug 17, 2022
-
-
Florian Sesser authored
, a dashboard that "displays a lot of data about one single host". This is
-
Florian Sesser authored
-
Florian Sesser authored
One query for hosts with ZFS and one for those without.
-
- Aug 16, 2022
-
-
Florian Sesser authored
Since ZoL frees ARC under memory pressure, let's not count it as "used" but instead as "free" memory.
-
- Aug 03, 2022
-
-
Florian Sesser authored
-
- Jul 11, 2022
-
-
Jean-Paul Calderone authored
-
Florian Sesser authored
-
- Jun 13, 2022
-
-
Florian Sesser authored
node_memory_MemAvailable_bytes is a better estimator than the sum I used before says some Prometheus documentation. It is also almost the same, but reads nicer.
-
- May 09, 2022
-
-
Jean-Paul Calderone authored
-
Jean-Paul Calderone authored
-
- Apr 29, 2022
-
-
Florian Sesser authored
-
- Apr 13, 2022
-
-
Florian Sesser authored
This is only semi correct. We keep logs up to what's specified in the privacy policy. Currently, that is implemented as keeping them for up to 30 days on the individual nodes as well as on the central Loki server.
-
Florian Sesser authored
-
Florian Sesser authored
-
Florian Sesser authored
-
- Apr 08, 2022
-
-
Jean-Paul Calderone authored
-
- Mar 14, 2022
-
-
Florian Sesser authored
This should fix the current alerts for our RAID arrays. It's only "should" because I can't test it proper without said RAID arrays in the dev or staging machines.
-
- Feb 27, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Feb 25, 2022
-
-
Florian Sesser authored
-
This includes the host-based metrics collector, and the VPN client setup (including key deployment).
-
Florian Sesser authored
The newer "Time Series" panel does not support two axes.
-
- Feb 24, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
... and fix that option the other commit introduced
-
- Feb 23, 2022
-
-
Florian Sesser authored
Nix sometimes seems peculiar about merging sets?
-