[bug#75810,v5,06/14] daemon: Allow running as non-root with unprivileged user namespaces.

Message ID 067fd1219aa1b4354c0a321dc7e2a7d414eabf1b.1741973869.git.ludo@gnu.org
State New
Headers
Series Rootless guix-daemon |

Commit Message

Ludovic Courtès March 14, 2025, 5:48 p.m. UTC
  From: Ludovic Courtès <ludovic.courtes@inria.fr>

Many thanks to Reepca Russelstein for their review and guidance on these
changes.

* nix/libstore/build.cc (guestUID, guestGID): New variables.
(DerivationGoal)[readiness]: New field.
(initializeUserNamespace): New function.
(DerivationGoal::runChild): When ‘readiness.readSide’ is positive, read
from it.
(DerivationGoal::startBuilder): Call ‘chown’
only when ‘buildUser.enabled()’ is true.  Pass CLONE_NEWUSER to ‘clone’
when ‘buildUser.enabled()’ is false or not running as root.  Retry
‘clone’ without CLONE_NEWUSER upon EPERM.
(DerivationGoal::registerOutputs): Make ‘actualPath’ writable before
‘rename’.
(DerivationGoal::deleteTmpDir): Catch ‘SysError’ around ‘_chown’ call.
* nix/libstore/local-store.cc (LocalStore::createUser): Do nothing if
‘dirs’ already exists.  Warn instead of failing when failing to chown
‘dir’.
* guix/substitutes.scm (%narinfo-cache-directory): Check for
‘_NIX_OPTIONS’ rather than getuid() == 0 to determine the cache
location.
* doc/guix.texi (Build Environment Setup): Reorganize a bit.  Add
section headings “Daemon Running as Root” and “The Isolated Build
Environment”.  Add “Daemon Running Without Privileges” subsection.
Remove paragraph about ‘--disable-chroot’.
(Invoking guix-daemon): Warn against ‘--disable-chroot’ and explain why.

Reviewed-by: Reepca Russelstein <reepca@russelstein.xyz>
---
 doc/guix.texi               | 102 +++++++++++++++++------
 guix/substitutes.scm        |   2 +-
 nix/libstore/build.cc       | 156 +++++++++++++++++++++++++++++++-----
 nix/libstore/local-store.cc |  18 +++--
 4 files changed, 225 insertions(+), 53 deletions(-)
  

Patch

diff --git a/doc/guix.texi b/doc/guix.texi
index d109877a32..66d0e42112 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -877,6 +877,7 @@  Setting Up the Daemon
 @section Setting Up the Daemon
 
 @cindex daemon
+@cindex build daemon
 During installation, the @dfn{build daemon} that must be running
 to use Guix has already been set up and you can run @command{guix}
 commands in your terminal program, @pxref{Getting Started}:
@@ -921,20 +922,38 @@  Build Environment Setup
 @cindex build environment
 In a standard multi-user setup, Guix and its daemon---the
 @command{guix-daemon} program---are installed by the system
-administrator; @file{/gnu/store} is owned by @code{root} and
-@command{guix-daemon} runs as @code{root}.  Unprivileged users may use
-Guix tools to build packages or otherwise access the store, and the
-daemon will do it on their behalf, ensuring that the store is kept in a
-consistent state, and allowing built packages to be shared among users.
+administrator.  Unprivileged users may use Guix tools to build packages
+or otherwise access the store, and the daemon will do it on their
+behalf, ensuring that the store is kept in a consistent state, and
+allowing built packages to be shared among users.
+
+There are currently two ways to set up and run the build daemon:
+
+@enumerate
+@item
+running @command{guix-daemon} as ``root'', letting it run build
+processes as unprivileged users taken from a pool of build users---this
+is the historical approach;
+
+@item
+running @command{guix-daemon} as a separate unprivileged user, relying
+on Linux's @dfn{unprivileged user namespace} functionality to set up
+isolated environments---this is the option chosen when installing Guix
+on a systemd-based distribution with the installation script
+(@pxref{Binary Installation}).
+@end enumerate
+
+The sections below describe each of these two configurations in more
+detail and summarize the kind of build isolation they provide.
+
+@unnumberedsubsubsec Daemon Running as Root
 
 @cindex build users
 When @command{guix-daemon} runs as @code{root}, you may not want package
 build processes themselves to run as @code{root} too, for obvious
 security reasons.  To avoid that, a special pool of @dfn{build users}
 should be created for use by build processes started by the daemon.
-These build users need not have a shell and a home directory: they will
-just be used when the daemon drops @code{root} privileges in build
-processes.  Having several such users allows the daemon to launch
+Having several such users allows the daemon to launch
 distinct build processes under separate UIDs, which guarantees that they
 do not interfere with each other---an essential feature since builds are
 regarded as pure functions (@pxref{Introduction}).
@@ -977,11 +996,45 @@  Build Environment Setup
 # guix-daemon --build-users-group=guixbuild
 @end example
 
+In this setup, @file{/gnu/store} is owned by @code{root}.
+
+@unnumberedsubsubsec Daemon Running Without Privileges
+
+@cindex rootless build daemon
+@cindex unprivileged build daemon
+@cindex build daemon, unprivileged
+The second and preferred option is to run @command{guix-daemon}
+@emph{as an unprivileged user}.  It has the advantage of reducing the
+harm that can be done should a build process manage to exploit a
+vulnerability in the daemon.  This option requires the user of Linux's
+unprivileged user namespace mechanism; today it is available and enabled
+by most GNU/Linux distributions but can still be disabled.  The
+installation script automatically determines whether this option is
+available on your system (@pxref{Binary Installation}).
+
+When using this option, you only need to create one user account, and
+@command{guix-daemon} will run with the authority of that account:
+
+@example
+# groupadd --system guix-daemon
+# useradd -g guix-daemon -G guix-daemon              \
+          -d /var/empty -s $(which nologin)          \
+          -c "Guix daemon privilege separation user" \
+          --system guix-daemon
+@end example
+
+In this configuration, @file{/gnu/store} is owned by the
+@code{guix-daemon} user.
+
+@unnumberedsubsubsec The Isolated Build Environment
+
 @cindex chroot
-@noindent
-This way, the daemon starts build processes in a chroot, under one of
-the @code{guixbuilder} users.  On GNU/Linux, by default, the chroot
-environment contains nothing but:
+@cindex build environment isolation
+@cindex isolated build environment
+@cindex hermetic build environment
+In both cases, the daemon starts build processes without privileges in
+an @emph{isolated} or @emph{hermetic} build environment---a ``chroot''.
+On GNU/Linux, by default, the build environment contains nothing but:
 
 @c Keep this list in sync with libstore/build.cc! -----------------------
 @itemize
@@ -1015,7 +1068,7 @@  Build Environment Setup
 @file{/homeless-shelter}.  This helps to highlight inappropriate uses of
 @env{HOME} in the build scripts of packages.
 
-All this usually enough to ensure details of the environment do not
+All this is usually enough to ensure details of the environment do not
 influence build processes.  In some exceptional cases where more control
 is needed---typically over the date, kernel, or CPU---you can resort to
 a virtual build machine (@pxref{build-vm, virtual build machines}).
@@ -1035,14 +1088,6 @@  Build Environment Setup
 for fixed-output derivations (@pxref{Derivations}) or for substitutes
 (@pxref{Substitutes}).
 
-If you are installing Guix as an unprivileged user, it is still possible
-to run @command{guix-daemon} provided you pass @option{--disable-chroot}.
-However, build processes will not be isolated from one another, and not
-from the rest of the system.  Thus, build processes may interfere with
-each other, and may access programs, libraries, and other files
-available on the system---making it much harder to view them as
-@emph{pure} functions.
-
 
 @node Daemon Offload Setup
 @subsection Using the Offload Facility
@@ -1567,10 +1612,17 @@  Invoking guix-daemon
 @item --disable-chroot
 Disable chroot builds.
 
-Using this option is not recommended since, again, it would allow build
-processes to gain access to undeclared dependencies.  It is necessary,
-though, when @command{guix-daemon} is running under an unprivileged user
-account.
+@quotation Warning
+Using this option is not recommended since it allows build processes to
+gain access to undeclared dependencies, to interfere with one another,
+and more generally to do anything that can be done with the authority of
+the daemon---which includes at least the ability to tamper with any file
+in the store!
+
+You may find it necessary, though, when support for Linux unprivileged
+user namespaces is missing (@pxref{Build Environment Setup}).  Use at
+your own risk!
+@end quotation
 
 @item --log-compression=@var{type}
 Compress build logs according to @var{type}, one of @code{gzip},
diff --git a/guix/substitutes.scm b/guix/substitutes.scm
index 7ca55788d5..86b9f5472a 100644
--- a/guix/substitutes.scm
+++ b/guix/substitutes.scm
@@ -79,7 +79,7 @@  (define %narinfo-cache-directory
   ;; time, 'guix substitute' is called by guix-daemon as root and stores its
   ;; cached data in /var/guix/….  However, when invoked from 'guix challenge'
   ;; as a user, it stores its cache in ~/.cache.
-  (if (zero? (getuid))
+  (if (getenv "_NIX_OPTIONS")                     ;invoked by guix-daemon
       (or (and=> (getenv "XDG_CACHE_HOME")
                  (cut string-append <> "/guix/substitute"))
           (string-append %state-directory "/substitute/cache"))
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index c8b778362a..76f75e00df 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -744,6 +744,10 @@  private:
 
     friend int childEntry(void *);
 
+    /* Pipe to notify readiness to the child process when using unprivileged
+       user namespaces.  */
+    Pipe readiness;
+
     /* Check that the derivation outputs all exist and register them
        as valid. */
     void registerOutputs();
@@ -1619,6 +1623,24 @@  int childEntry(void * arg)
 }
 
 
+/* UID and GID of the build user inside its own user namespace.  */
+static const uid_t guestUID = 30001;
+static const gid_t guestGID = 30000;
+
+/* Initialize the user namespace of CHILD.  */
+static void initializeUserNamespace(pid_t child,
+				    uid_t hostUID = getuid(),
+				    gid_t hostGID = getgid())
+{
+    writeFile("/proc/" + std::to_string(child) + "/uid_map",
+	      (format("%d %d 1") % guestUID % hostUID).str());
+
+    writeFile("/proc/" + std::to_string(child) + "/setgroups", "deny");
+
+    writeFile("/proc/" + std::to_string(child) + "/gid_map",
+	      (format("%d %d 1") % guestGID % hostGID).str());
+}
+
 void DerivationGoal::startBuilder()
 {
     auto f = format(
@@ -1682,7 +1704,7 @@  void DerivationGoal::startBuilder()
 	   then an attacker could create in it a hardlink to a root-owned file
 	   such as /etc/shadow.  If 'keepFailed' is true, the daemon would
 	   then chown that hardlink to the user, giving them write access to
-	   that file.  */
+	   that file.  See CVE-2021-27851.  */
 	tmpDir += "/top";
 	if (mkdir(tmpDir.c_str(), 0700) == 1)
 	    throw SysError("creating top-level build directory");
@@ -1799,7 +1821,7 @@  void DerivationGoal::startBuilder()
         if (mkdir(chrootRootDir.c_str(), 0750) == -1)
             throw SysError(format("cannot create ‘%1%’") % chrootRootDir);
 
-        if (chown(chrootRootDir.c_str(), 0, buildUser.getGID()) == -1)
+        if (buildUser.enabled() && chown(chrootRootDir.c_str(), 0, buildUser.getGID()) == -1)
             throw SysError(format("cannot change ownership of ‘%1%’") % chrootRootDir);
 
         /* Create a writable /tmp in the chroot.  Many builders need
@@ -1818,8 +1840,8 @@  void DerivationGoal::startBuilder()
             (format(
                 "nixbld:x:%1%:%2%:Nix build user:/:/noshell\n"
                 "nobody:x:65534:65534:Nobody:/:/noshell\n")
-                % (buildUser.enabled() ? buildUser.getUID() : getuid())
-                % (buildUser.enabled() ? buildUser.getGID() : getgid())).str());
+                % (buildUser.enabled() ? buildUser.getUID() : guestUID)
+                % (buildUser.enabled() ? buildUser.getGID() : guestGID)).str());
 
         /* Declare the build user's group so that programs get a consistent
            view of the system (e.g., "id -gn"). */
@@ -1854,7 +1876,7 @@  void DerivationGoal::startBuilder()
         createDirs(chrootStoreDir);
         chmod_(chrootStoreDir, 01775);
 
-        if (chown(chrootStoreDir.c_str(), 0, buildUser.getGID()) == -1)
+        if (buildUser.enabled() && chown(chrootStoreDir.c_str(), 0, buildUser.getGID()) == -1)
             throw SysError(format("cannot change ownership of ‘%1%’") % chrootStoreDir);
 
         foreach (PathSet::iterator, i, inputPaths) {
@@ -1960,14 +1982,34 @@  void DerivationGoal::startBuilder()
     if (useChroot) {
 	char stack[32 * 1024];
 	int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | SIGCHLD;
-	if (!fixedOutput) flags |= CLONE_NEWNET;
+	if (!fixedOutput) {
+	    flags |= CLONE_NEWNET;
+	}
+	if (!buildUser.enabled() || getuid() != 0) {
+	    flags |= CLONE_NEWUSER;
+	    readiness.create();
+	}
+
 	/* Ensure proper alignment on the stack.  On aarch64, it has to be 16
 	   bytes.  */
-	pid = clone(childEntry,
+ 	pid = clone(childEntry,
 		    (char *)(((uintptr_t)stack + sizeof(stack) - 8) & ~(uintptr_t)0xf),
 		    flags, this);
-	if (pid == -1)
-	    throw SysError("cloning builder process");
+	if (pid == -1) {
+	    if ((flags & CLONE_NEWUSER) != 0 && getuid() != 0)
+		/* 'clone' fails with EPERM on distros where unprivileged user
+		   namespaces are disabled.  Error out instead of giving up on
+		   isolation.  */
+		throw SysError("cannot create process in unprivileged user namespace");
+	    else
+		throw SysError("cloning builder process");
+	}
+
+	if ((flags & CLONE_NEWUSER) != 0) {
+	     /* Initialize the UID/GID mapping of the child process.  */
+	     initializeUserNamespace(pid);
+	     writeFull(readiness.writeSide, (unsigned char*)"go\n", 3);
+	}
     } else
 #endif
     {
@@ -2013,23 +2055,34 @@  void DerivationGoal::runChild()
 
         _writeToStderr = 0;
 
+	if (readiness.readSide > 0) {
+	     /* Wait for the parent process to initialize the UID/GID mapping
+		of our user namespace.  */
+	     char str[20] = { '\0' };
+	     readFull(readiness.readSide, (unsigned char*)str, 3);
+	     if (strcmp(str, "go\n") != 0)
+		  throw Error("failed to initialize process in unprivileged user namespace");
+	}
+
         restoreAffinity();
 
         commonChildInit(builderOut);
 
 #if CHROOT_ENABLED
         if (useChroot) {
-            /* Initialise the loopback interface. */
-            AutoCloseFD fd(socket(PF_INET, SOCK_DGRAM, IPPROTO_IP));
-            if (fd == -1) throw SysError("cannot open IP socket");
+	    if (!fixedOutput) {
+		/* Initialise the loopback interface. */
+		AutoCloseFD fd(socket(PF_INET, SOCK_DGRAM, IPPROTO_IP));
+		if (fd == -1) throw SysError("cannot open IP socket");
 
-            struct ifreq ifr;
-            strcpy(ifr.ifr_name, "lo");
-            ifr.ifr_flags = IFF_UP | IFF_LOOPBACK | IFF_RUNNING;
-            if (ioctl(fd, SIOCSIFFLAGS, &ifr) == -1)
-                throw SysError("cannot set loopback interface flags");
+		struct ifreq ifr;
+		strcpy(ifr.ifr_name, "lo");
+		ifr.ifr_flags = IFF_UP | IFF_LOOPBACK | IFF_RUNNING;
+		if (ioctl(fd, SIOCSIFFLAGS, &ifr) == -1)
+		    throw SysError("cannot set loopback interface flags");
 
-            fd.close();
+		fd.close();
+	    }
 
             /* Set the hostname etc. to fixed values. */
             char hostname[] = "localhost";
@@ -2180,6 +2233,27 @@  void DerivationGoal::runChild()
 	    /* Remount root as read-only.  */
             if (mount("/", "/", 0, MS_BIND | MS_REMOUNT | MS_RDONLY, 0) == -1)
                 throw SysError(format("read-only remount of build root '%1%' failed") % chrootRootDir);
+
+	    if (getuid() != 0) {
+		/* Create a new mount namespace to "lock" previous mounts.
+		   See mount_namespaces(7).  */
+		auto uid = getuid();
+		auto gid = getgid();
+
+		if (unshare(CLONE_NEWNS | CLONE_NEWUSER) == -1)
+		    throw SysError(format("creating new user and mount namespaces"));
+
+		initializeUserNamespace(getpid(), uid, gid);
+
+		/* Check that mounts within the build environment are "locked"
+		   together and cannot be separated from within the build
+		   environment namespace.  Since
+		   umount(2) is documented to fail with EINVAL when attempting
+		   to unmount one of the mounts that are locked together,
+		   check that this is what we get.  */
+		int ret = umount(tmpDirInSandbox.c_str());
+		assert(ret == -1 && errno == EINVAL);
+	    }
         }
 #endif
 
@@ -2476,8 +2550,16 @@  void DerivationGoal::registerOutputs()
             if (buildMode == bmRepair)
               replaceValidPath(path, actualPath);
             else
-              if (buildMode != bmCheck && rename(actualPath.c_str(), path.c_str()) == -1)
-                throw SysError(format("moving build output `%1%' from the chroot to the store") % path);
+		if (buildMode != bmCheck) {
+		    if (S_ISDIR(st.st_mode))
+			/* Change mode on the directory to allow for
+			   rename(2).  */
+			chmod(actualPath.c_str(), st.st_mode | 0700);
+		    if (rename(actualPath.c_str(), path.c_str()) == -1)
+			throw SysError(format("moving build output `%1%' from the chroot to the store") % path);
+		    if (S_ISDIR(st.st_mode) && chmod(path.c_str(), st.st_mode) == -1)
+			throw SysError(format("restoring permissions on directory `%1%'") % actualPath);
+		}
           }
           if (buildMode != bmCheck) actualPath = path;
         }
@@ -2736,16 +2818,46 @@  void DerivationGoal::deleteTmpDir(bool force)
             // Change the ownership if clientUid is set. Never change the
             // ownership or the group to "root" for security reasons.
             if (settings.clientUid != (uid_t) -1 && settings.clientUid != 0) {
-                _chown(tmpDir, settings.clientUid,
-                       settings.clientGid != 0 ? settings.clientGid : -1);
+		uid_t uid = settings.clientUid;
+		gid_t gid = settings.clientGid != 0 ? settings.clientGid : -1;
+		bool reown = false;
+
+		/* First remove setuid/setgid bits.  */
+		secureFilePerms(tmpDir);
+
+		try {
+		    _chown(tmpDir, uid, gid);
+
+		    if (getuid() != 0) {
+			/* If, without being root, the '_chown' call above
+			   succeeded, then it means we have CAP_CHOWN.  Retake
+			   ownership of tmpDir itself so it can be renamed
+			   below.  */
+			reown = true;
+		    }
+
+		} catch (SysError & e) {
+		    /* When running as an unprivileged user and without
+		       CAP_CHOWN, we cannot chown the build tree.  Print a
+		       message and keep going.  */
+		    printMsg(lvlInfo, format("cannot change ownership of build directory '%1%': %2%")
+			     % tmpDir % strerror(e.errNo));
+		}
 
 		if (top != tmpDir) {
+		    if (reown) chown(tmpDir.c_str(), getuid(), getgid());
+
 		    // Rename tmpDir to its parent, with an intermediate step.
 		    string pivot = top + ".pivot";
 		    if (rename(top.c_str(), pivot.c_str()) == -1)
 			throw SysError("pivoting failed build tree");
 		    if (rename((pivot + "/top").c_str(), top.c_str()) == -1)
 			throw SysError("renaming failed build tree");
+
+		    if (reown)
+			/* Running unprivileged but with CAP_CHOWN.  */
+			chown(top.c_str(), uid, gid);
+
 		    rmdir(pivot.c_str());
 		}
             }
diff --git a/nix/libstore/local-store.cc b/nix/libstore/local-store.cc
index 0883a4bbce..83e6c3e16e 100644
--- a/nix/libstore/local-store.cc
+++ b/nix/libstore/local-store.cc
@@ -1614,11 +1614,19 @@  void LocalStore::createUser(const std::string & userName, uid_t userId)
 {
     auto dir = settings.nixStateDir + "/profiles/per-user/" + userName;
 
-    createDirs(dir);
-    if (chmod(dir.c_str(), 0755) == -1)
-	throw SysError(format("changing permissions of directory '%s'") % dir);
-    if (chown(dir.c_str(), userId, -1) == -1)
-	throw SysError(format("changing owner of directory '%s'") % dir);
+    auto created = createDirs(dir);
+    if (!created.empty()) {
+	if (chmod(dir.c_str(), 0755) == -1)
+	    throw SysError(format("changing permissions of directory '%s'") % dir);
+
+	/* The following operation requires CAP_CHOWN or can be handled
+	   manually by a user with CAP_CHOWN.  */
+	if (chown(dir.c_str(), userId, -1) == -1) {
+	    rmdir(dir.c_str());
+	    string message = strerror(errno);
+	    printMsg(lvlInfo, format("failed to change owner of directory '%1%' to %2%: %3%") % dir % userId % message);
+	}
+    }
 }