- Sep 08, 2022
-
-
Jean-Paul Calderone authored
Monitoring: Fix backup duration alert See merge request !349
-
Florian Sesser authored
This should implement my actual intentions: Alert when backups run for longer than 3h, and the the repo check for more than 6h.
-
Jean-Paul Calderone authored
Add dashboards for Tahoe Incident Report count and rate Closes privatestorageops#288 See merge request !347
-
Jean-Paul Calderone authored
Publish number of Tahoe Incident Reports Closes privatestorageops#287 See merge request !346
-
Florian Sesser authored
-
Florian Sesser authored
Fixes privatestorageops#287
-
Jean-Paul Calderone authored
Also alert when hosts that run ZFS run out of RAM See merge request !345
-
Jean-Paul Calderone authored
Prometheus should keep metrics as long as Loki keeps logs See merge request !344
-
Florian Sesser authored
Forgot to add a second alert when I added the workaround to not count the ZFS ARC into used memory :/
-
Florian Sesser authored
... which is both governed by our retention policy.
-
- Sep 07, 2022
-
-
Florian Sesser authored
Monitoring: Count Tahoe's corruption advisories Closes privatestorageops#287 See merge request !341
-
Florian Sesser authored
Monitoring: Add Tahoe-LAFS corruption advisory count + rate + alert on rate > 0 Closes privatestorageops#288 See merge request !342
-
Florian Sesser authored
-
-
Jean-Paul Calderone authored
Monitoring: Fix resources dashboard See merge request !343
-
- Sep 06, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 05, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 02, 2022
-
-
Jean-Paul Calderone authored
Monitoring: Backup duration and backup set size Closes privatestorageops#429 See merge request !339
-
- Aug 31, 2022
-
-
Florian Sesser authored
This is a bit buggy still in our version of Grafana, but already nice to look at / maybe useful. Refs privatestorageops#429
-
Florian Sesser authored
This adds alerting to the backup job duration graph: Grafana alerting works with systemd unit metrics, i.e. a backup job unit being "active" for too long. Use that fact for alerting on long-running backup jobs.
-
Florian Sesser authored
... instead of connected lines default, also with working label for host Refs privatestorageops#429
-
Florian Sesser authored
Refs privatestorageops#429
-
- Aug 29, 2022
-
-
Florian Sesser authored
Backup: Borg: All logs as JSON please See merge request !338
-
Florian Sesser authored
That makes working with its output in Grafana easier. Good documentation can be found at https://borgbackup.readthedocs.io/en/stable/internals/frontends.html#json-output
-
Florian Sesser authored
-
Florian Sesser authored
Backup: Add --stats to borgbackup create options See merge request !337
-
Florian Sesser authored
-
Florian Sesser authored
We want to have this in our own monitoring
-
Florian Sesser authored
Monitoring: Backup: Clearer coloring See merge request !336
-
Florian Sesser authored
Failed backups now have a filled red area instead of a thin yellow line. Refs privatestorageops#429.
-
- Aug 24, 2022
-
-
Florian Sesser authored
Disregard ZFS ARC cache when monitoring free RAM Closes #119 See merge request !331
-
Florian Sesser authored
Monitoring: Add a "Node Exporter Full" dashboard See merge request !332
-
- Aug 17, 2022
-
-
Florian Sesser authored
, a dashboard that "displays a lot of data about one single host". This is
-
Florian Sesser authored
-
Florian Sesser authored
One query for hosts with ZFS and one for those without.
-
- Aug 16, 2022
-
-
Florian Sesser authored
Since ZoL frees ARC under memory pressure, let's not count it as "used" but instead as "free" memory.
-
- Aug 15, 2022
-
-
Jean-Paul Calderone authored
Set minimal nixos profile for our grid machines See merge request !330
-
Florian Sesser authored
-