[bug#75810,v6,00/16] Rootless guix-daemon

Message ID cover.1742230219.git.ludo@gnu.org
Headers
Series Rootless guix-daemon |

Message

Ludovic Courtès March 17, 2025, 5:02 p.m. UTC
  Hello,

This new version addresses Reepca’s latest comments¹:

  • Close the read end of ‘logPipe’ in ‘commonChildInit’.

  • Explicitly close the ‘readiness’ pipe.

  • Fix ‘--disable-chroot’ warning in the manual that was misleading.

  • Have ‘test-env’ check whether user namespaces are supported at all,
    which fixes non-Linux support (where it would previously fail to
    pass ‘--disable-chroot’.)

  • Change ‘unprivileged-user-namespace-supported?’ similarly.

  • Fix “build root cannot be made world-readable” test, which could
    not possibly fail and was exposing users unnecessarily.

  • Change ‘guix-install.sh’ on systemd machines: warn when unprivileged
    user namespaces are disabled, attempt to enable them, and error out
    if we failed to enable them.

Hopefully I didn’t forget anything.

I checked that the “debian-install” and “guix-daemon” system tests
still pass.

Thanks,
Ludo’.

¹ https://issues.guix.gnu.org/75810#91

Ludovic Courtès (16):
  daemon: Use ‘close_range’ where available.
  daemon: Close the read end of the logging pipe.
  daemon: Bind-mount /etc/nsswitch.conf & co. only if it exists.
  daemon: Bind-mount all the inputs, not just directories.
  daemon: Remount inputs as read-only.
  daemon: Remount root directory as read-only.
  daemon: Allow running as non-root with unprivileged user namespaces.
  daemon: Create /var/guix/profiles/per-user unconditionally.
  daemon: Drop Linux ambient capabilities before executing builder.
  daemon: Move comments where they belong.
  linux-container: ‘unprivileged-user-namespace-supported?’ returns #f
    on non-Linux.
  tests: Add missing derivation inputs.
  tests: Run in a chroot and unprivileged user namespaces.
  etc: systemd services: Run ‘guix-daemon’ as an unprivileged user.
  guix-install.sh: Support the unprivileged daemon where possible.
  DRAFT gnu: guix: Update to 07c5b1b

 build-aux/test-env.in               |  18 +-
 config-daemon.ac                    |   5 +-
 doc/guix.texi                       | 102 ++++++++---
 etc/gnu-store.mount.in              |   3 +-
 etc/guix-daemon.service.in          |  22 ++-
 etc/guix-install.sh                 | 124 +++++++++++---
 gnu/build/linux-container.scm       |   4 +-
 gnu/packages/package-management.scm |   6 +-
 guix/substitutes.scm                |   2 +-
 nix/libstore/build.cc               | 251 +++++++++++++++++++++-------
 nix/libstore/local-store.cc         |  26 ++-
 nix/libutil/util.cc                 |  26 ++-
 tests/derivations.scm               |  24 ++-
 tests/packages.scm                  |  13 +-
 tests/processes.scm                 |   9 +-
 tests/store.scm                     | 247 +++++++++++++++++++++++----
 16 files changed, 698 insertions(+), 184 deletions(-)


base-commit: 0c497c87ac47206b3e8c6dfa2e1e9b5f3e452292
  

Comments

Reepca Russelstein March 18, 2025, 3:07 a.m. UTC | #1
Ludovic Courtès <ludo@gnu.org> writes:

> @@ -2013,23 +2057,36 @@ void DerivationGoal::runChild()
>  
>          _writeToStderr = 0;
>  
> +	if (readiness.writeSide > 0) readiness.writeSide.close();
> +
> +	if (readiness.readSide > 0) {
> +	     /* Wait for the parent process to initialize the UID/GID mapping
> +		of our user namespace.  */
> +	     char str[20] = { '\0' };
> +	     readFull(readiness.readSide, (unsigned char*)str, 3);
> +	     if (strcmp(str, "go\n") != 0)
> +		  throw Error("failed to initialize process in unprivileged user namespace");
> +	}
> +
>          restoreAffinity();

(in patch 7/16, in nix/libstore/build.cc)

Strictly speaking we should check whether the fds are >= 0, not > 0,
since 0 is technically a valid file descriptor, and we use -1 to
indicate the absence of a file descriptor.

Also, readiness.readSide isn't explicitly closed in the child after
we're done with it.

> +
> +	# The unprivileged daemon cannot create the log directory by itself.
> +	mkdir /var/log/guix
> +	chown guix-daemon:guix-daemon /var/log/guix
> +	chmod 755 /var/log/guix

(in patch 15/16, in etc/guix-install.sh)

Should this be 'mkdir -p' or some other conditional creation?  If I
understand correctly this will fail when overwriting an existing install
using GUIX_ALLOW_OVERWRITE.

Concerning guix-install.sh, to be clear, is the intent to specifically
not support installing the rootful daemon on systemd systems?

For my two cents, I do think that it's still a tradeoff - not just
because of the reliance on different kernel mechanisms for security, but
also because the rootless daemon currently causes visible changes to the
build environment (EROFS on /, and nothing owned by root, for example).
Which one should we consider the "canonical" build environment going
forward?


I decided to do some searching for container escapes / vulnerabilities
online, just to be extra careful, and found one that relies on the
container entry point being run as the user that owns the program that
execs the container entry point: CVE-2019-5736.  It exploits the fact
that /proc/self/exe, despite being displayed like a symlink in the
output of ls, does not actually act like a symlink, and indeed acts more
like a hardlink that readlink happens to have some associated data for.

The demo, modified for guix circumstances, would go something like this:

1. A derivation is created whose builder is /proc/self/exe, and whose
   LD_PRELOAD environment variable points into a malicious store item
   for one of its shared libraries - for example, libc.
2. The daemon reads this in, and, to my knowledge, does no verification
   of the builder string.  Note that this aspect isn't actually
   necessary, as the builder could also be a symlink to /proc/self/exe
   from the store.
3. The daemon sets up the build environment, and execs /proc/self/exe.
4. An attacker-controlled load-time function gets run.
5. It opens /proc/self/exe, initially read-only because it can't be
   opened writable while a process is executing it.  It then execs
   another attacker-controlled process, which inherits the open file
   descriptor and subsequently opens it via /proc/self/fd/<fd>, this
   time read-write (it can do this because it owns the file, and even if
   it's not writable, a quick fchmod will fix that, and the filesystem
   it was originally opened from isn't read-only, because guix-daemon
   starts before gnu-store.mount bind-mounts /gnu/store to itself prior
   to making it read-only).  It then overwrites the resulting file
   descriptor with whatever contents it wants.
6. The next time guix-daemon is started outside the container, it runs
   attacker-controlled code.


There are several points at which that particular attack could be
stopped, and I'd like to try to stop it at as many of them as possible.
A good start would be canonicalizing the builder prior to executing it
and then checking to make sure it is in the store.  A more general
solution could look like writing out and then executing a tiny binary,
something like /tmp/runbuilder, that does nothing but unlink itself and
then exec the actual program.

Here's a writeup of the CVE in question:
https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/

Aside from all that, it looks good to me.

- reepca