Monitoring: Available memory should include ZFS ARC buffer cache
The "available memory" graph in Grafana currently excludes the ZFS ARC buffer cache. Since that cache is reduced under memory pressure, AFAIK it should be excluded from "unavailable" memory.
I wanted to add that in !304 (merged) , but couldn't finish the job. This issue ticket is a reminder to myself, should I stumble upon a way later.
I couldn't forge a Prometheus query for adding the ARC cache to the count of available memory, since nodes without ZFS won't return a value for node_zfs_arc_size
and the Prometheus query language makes it hard for me to just use 0 if that is not available:
- The or on() vector(0) trick does not work here, for example.
- This StackOverflow answer looks like there's no simple solution unfortunately:
Unfortunately Prometheus doesn't provide an easy ability to fill gaps with zeroes if q returns multiple time series with distinct labelsets. Other Prometheus-like solutions such as VictoriaMetrics provide default operator for this case.
Screen shot:
After staging storage001
ran its first full backup, filling the ARC cache, the "RAM used" graph is forever not showing what I intended it to show - memory that is committed and cannot be freed.