I see there is a sibling config option, x11, which defaults to true. Maybe we can cut down the closure size by setting it to false. This could also be fine as a follow-up. I don't think this changeset adds any x11 dependencies (at least not according to diff-closures).
I'm not sure if there's a way to disable all GUI dependencies at once but a related idea might be having a safety net - we could probably have some part of the system fail loudly if it discovers some basic GUI stuff has made its way into our closure (eg libX11 or something similar). Then at least someone would have to stop and fix the issue before proceeding...
A different (maybe complementary) approach could be to have the closure size of various expressions tracked by a monitoring system? Then we can see how we're doing over time, and maybe understand contributing factors other than GUI stuff.
One thing I noticed while writing up #93, is that we pull in w3m (which has a vulnerabilitiy), as a dependency of nixos-help (since it is the default browser used by that script).
cloud-utils depends on QEMU-utils depends on QEMU depends on Mesa depends on Wayland depends on Graphviz depends on OpenEXR (or so)... and GTK... and Cups,the standards-based, open source printing system.
We could do without many of those *Support options. The expression is such that if we turn off a flag, the related dependencies don't appear in the derivation.
The mechanism to toggling these flags is probably to override the top-level package with a new version. Right now we get our nixpkgs by importing the top-level nixpkgs.nix and calling it with an empty set. I think we can call it with some overlays instead. For example, shell.nix could start with:
self: super: { qemu = super.qemu.override { sdlSupport = false; # also turns off OpenGL and "virgl", whatever that is };}
Or, at least, something like this I haven't messed with overlays in a while. There are other ways to get overlays into a system, I'm not sure which would be most appropriate for us: https://nixos.wiki/wiki/Overlays
When I did mess with overlays, I was mostly using them to upgrade Python libraries and this usually spiraled into unmanageable levels of complexity. However, overriding QEMU to turn off some features is pretty much what the overlay system is for so maybe it is simple enough to be feasible.
Another thing to look out for is build-time dependencies vs runtime dependencies. I think QEMU's Mesa dependency is runtime so it translates into software we actually deploy. I think LeaseReport's dependency on imagemagick is build-time so while it does add cost and complexity to our build system I don't think we will actually deploy imagemagick to our servers as a result (but I'm not sure!).
There is probably a way to ask nix to differentiate between these two things.
Some commands I like to use to find out the size of things.
There must be much better tools out there, but this is what I used so far.
du
[root@storage001:~]# du -ms /nix/store/* | sort -rn | head -23
Executed on a grid machine, this returns the size per nix-store object in megabytes, sorts descending, and returns the first 23 (arbitrarily).
Quite interesting here I find that the biggest six objects, together 3,6 GB (over 40% of the size of /nix/store on staging storage001), look like source packages, and as such might be build dependencies that we deploy by accident?
[root@storage001:~]# du -ms /nix/store/* | sort -rn | head -6992 /nix/store/p735m5inf57wlqxa6zjc0kd8l3147qcm-pypi-deps-db-src773 /nix/store/f8v50iw3r5nc4sdnl7lav24dwa0w7jzw-nix-pypi-fetcher772 /nix/store/bxxh7964pvipmpi5dhld1wc6sxyj4y1l-source623 /nix/store/xpgy7kf9prc5hchbirjyc1iam9piclzv-source261 /nix/store/2g1ppmcribsyshvbxa0l6a81pjbdjzfp-source216 /nix/store/j8yxa77jqxiy4m7aqfx5zfr4a14jcycv-nix-pypi-fetcher
nix why-depends
Should we find a big nix store object that does not look relevant and we want to find out why it's installed at all, nix why-depends can help:
When I find an interesting store object with why-depends I sometimes look into the derivation to get a (bit) better idea of what's going on: nix show-derivation -r /nix/store/zgvf44mgwb3y9i91bp5mbsab3mz8lq9s-qemu-6.1.1.drv
The vulnerability digest
I started to look into this not because of disk space used, but because we install too much software that has security vulnerabilities.
The vulnerability-scan CI job conveniently provides a useful artefact that lists vulnerable software, much of which we shouldn't have installed in the first place...
Find the latest such job and artefact here.
Wanted
A tool (or process) to discern between run-time and build-time dependencies
Knowhow how to turn miscategorized run-time into build-time dependencies
Find out why those source packages above land on our grid machines - why-depends comes up empty. (Probably I am using it wrong?)
In some cases, it may be desirable to take advantage of commonly-used, predefined configurations provided by nixpkgs, but different from those that come as default. [...]
Even if some of these profiles seem only useful in the context of install media, many are actually intended to be used in real installs.
Common configuration for headless machines (e.g., Amazon EC2 instances).
Disables sound, vesa, serial consoles, emergency mode, grub splash images and configures the kernel to reboot automatically on panic.
And a minimal profile that keeps the serial console and might thus be better suited for our bare metal servers:
This profile defines a small NixOS configuration. It does not contain any graphical stuff. It’s a very short file that enables noXlibs, sets i18n.supportedLocales to only support the user-selected locale, disables packages’ documentation, and disables sound.
(sound seems to be disabled by default now already, so the impact of these settings probably isn't huge)
Try to find out how your system is depending on sagedoc
nix path-info -r /run/current-system | grep sage
if it does not appear there then nix path-info -r /run/current-system --derivation | grep sage
then once you have the path
do nix why-depends /run/current-system "storepathhere" --all --precise
if the store path ends in .drv then add --derivation to the end
Enables some optimizations by default to closure size and startup time: - defaults documentation to off - defaults to using systemd in initrd - use systemd-networkd - disables systemd-network-wait-online - disables NixOS system switching if the host store is not mounted This takes a few hundred MB off the closure size, including qemu, allowing for putting MicroVMs inside Docker containers.