From patchwork Tue Apr 8 12:24:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ludovic_Court=C3=A8s?= X-Patchwork-Id: 41452 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 85AD927BC4A; Tue, 8 Apr 2025 13:28:35 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_CERTIFIED,RCVD_IN_VALIDITY_RPBL,RCVD_IN_VALIDITY_SAFE, SPF_HELO_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id E046827BC49 for ; Tue, 8 Apr 2025 13:28:32 +0100 (BST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1u283x-0008Jw-7o; Tue, 08 Apr 2025 08:28:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u282l-0007zZ-Ps for guix-patches@gnu.org; Tue, 08 Apr 2025 08:27:08 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1u282k-00083b-R4 for guix-patches@gnu.org; Tue, 08 Apr 2025 08:27:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:Date:From:To:Subject; bh=+3KDyNNvr5aGTEmqp56z0rPbuHltUJYAVfoAJFcohuk=; b=rEeu4aHaKFcobzwWwXAXeZkyvqBjp7mM0wyzHqBv7wr66CjDDV/o5D15Nxet2TAdOmB10hgqyNirXYw9AopYZXrltmxyu0NzfCoDULBIQYXvP4CXp+176mLfuKpKB2j88K4UKhZDPRgsLPBkuwHUOxNzIH0oRnKOWu7qM2a7i7DfmHKqAP6DlVmgTCirmIjWO+PZ+Ip1U+EJ1nhk87iz0uyKfK5bw8jP5xi55IZJJVs9yxbdGyS+RZNY3eGv2sxq/TwlhCRwknP2FTjxecGHwgv6PkZo3G7UYSWCuPDvo5FjLQo79En7YSeG7d/IEaxbZj2Jv6YX7SVvzmemB9ib2w==; Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1u282k-0001QL-Jq for guix-patches@gnu.org; Tue, 08 Apr 2025 08:27:06 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#77638] [PATCH 8/8] linux-container: Lock mounts by default. Resent-From: Ludovic =?utf-8?q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Tue, 08 Apr 2025 12:27:06 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 77638 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 77638@debbugs.gnu.org Cc: Ludovic =?utf-8?q?Court=C3=A8s?= Received: via spool by 77638-submit@debbugs.gnu.org id=B77638.17441152055296 (code B ref 77638); Tue, 08 Apr 2025 12:27:06 +0000 Received: (at 77638) by debbugs.gnu.org; 8 Apr 2025 12:26:45 +0000 Received: from localhost ([127.0.0.1]:59705 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1u282O-0001N7-0H for submit@debbugs.gnu.org; Tue, 08 Apr 2025 08:26:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51422) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1u2822-0001Jh-3g for 77638@debbugs.gnu.org; Tue, 08 Apr 2025 08:26:26 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u281v-0007qD-LX; Tue, 08 Apr 2025 08:26:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:References:In-Reply-To:Date:Subject:To: From; bh=+3KDyNNvr5aGTEmqp56z0rPbuHltUJYAVfoAJFcohuk=; b=icTPExz3UnBJ7lKAU8B4 mjBCpB5MTyRMLba5EWyrZQYnt1IZigIjKrenNkrXBzhEatYzowxWG0gJZU4kMT5rhq3YrTrTJ3TF/ 03r9FIelzP5AequhB3KQt44J8g1D8y6ErTd8dNVuo3ArkR7hDWMgiEKRoa2o+5QclLXIiX2ggLEnz gbwIMyM/c7KQ3Mk3fXOHgwzGC+5QDUeF3+QZ74FAjnTKHoI4KScAL5Dv7tvpSljCN5htEKVNmXJ+J FUdRtnhfQ5EYmG+eaIVeGNNY8Rq4gM+qPqfiFz20TcGbG0t4SkQXREKuTiOQDx5swj4/xNY0Pr/xc gN3u6g5XLBLF0g==; From: Ludovic =?utf-8?q?Court=C3=A8s?= Date: Tue, 8 Apr 2025 14:24:48 +0200 Message-ID: <1b6a3534339319d7e792f5888b51e02b27607d65.1744114408.git.ludo@gnu.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: References: MIME-Version: 1.0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches This makes it impossible to unmount or remount things from within ‘call-with-container’. * gnu/build/linux-container.scm (initialize-user-namespace): Add #:host-uid and #:host-gid. and honor them. (run-container): Add #:lock-mounts?. Honor it by calling ‘unshare’ followed by ‘initialize-user-namespace’. (call-with-container): Add #:lock-mounts? and pass it down. (container-excursion): Get the user namespace owning the PID namespace and join it, then join the remaining namespaces. * tests/containers.scm ("call-with-container, mnt namespace, locked mounts"): New test. ("container-excursion"): Pass #:lock-mounts? #f. Change-Id: I13be982aef99e68a653d472f0e595c81cfcfa392 --- gnu/build/linux-container.scm | 111 ++++++++++++++++++++++------------ tests/containers.scm | 33 ++++++++-- 2 files changed, 103 insertions(+), 41 deletions(-) diff --git a/gnu/build/linux-container.scm b/gnu/build/linux-container.scm index 345ce2de08..51f04bc249 100644 --- a/gnu/build/linux-container.scm +++ b/gnu/build/linux-container.scm @@ -189,7 +189,10 @@ (define* (mount-file-systems root mounts #:key mount-/sys? mount-/proc? (remount-read-only "/")))) (define* (initialize-user-namespace pid host-uids - #:key (guest-uid 0) (guest-gid 0)) + #:key + (host-uid (getuid)) + (host-gid (getgid)) + (guest-uid 0) (guest-gid 0)) "Configure the user namespace for PID. HOST-UIDS specifies the number of host user identifiers to map into the user namespace. GUEST-UID and GUEST-GID specify the first UID (respectively GID) that host UIDs (respectively GIDs) @@ -200,24 +203,21 @@ (define* (initialize-user-namespace pid host-uids (define (scope file) (string-append proc-dir file)) - (let ((uid (getuid)) - (gid (getgid))) - - ;; Only root can write to the gid map without first disabling the - ;; setgroups syscall. - (unless (and (zero? uid) (zero? gid)) - (call-with-output-file (scope "/setgroups") - (lambda (port) - (display "deny" port)))) - - ;; Map the user/group that created the container to the root user - ;; within the container. - (call-with-output-file (scope "/uid_map") + ;; Only root can write to the gid map without first disabling the + ;; setgroups syscall. + (unless (and (zero? host-uid) (zero? host-gid)) + (call-with-output-file (scope "/setgroups") (lambda (port) - (format port "~d ~d ~d" guest-uid uid host-uids))) - (call-with-output-file (scope "/gid_map") - (lambda (port) - (format port "~d ~d ~d" guest-gid gid host-uids))))) + (display "deny" port)))) + + ;; Map the user/group that created the container to the root user + ;; within the container. + (call-with-output-file (scope "/uid_map") + (lambda (port) + (format port "~d ~d ~d" guest-uid host-uid host-uids))) + (call-with-output-file (scope "/gid_map") + (lambda (port) + (format port "~d ~d ~d" guest-gid host-gid host-uids)))) (define (namespaces->bit-mask namespaces) "Return the number suitable for the 'flags' argument of 'clone' that @@ -238,12 +238,14 @@ (define* (run-container root mounts namespaces host-uids thunk #:key (guest-uid 0) (guest-gid 0) (populate-file-system (const #t)) (loopback-network? #t) + (lock-mounts? #t) writable-root?) "Run THUNK in a new container process and return its PID. ROOT specifies the root directory for the container. MOUNTS is a list of objects that specify file systems to mount inside the container. NAMESPACES is a list of symbols that correspond to the possible Linux namespaces: mnt, -ipc, uts, user, and net. +ipc, uts, user, and net. When LOCK-MOUNTS? is true, arrange so that none of +MOUNTS can be unmounted or remounted individually from within THUNK. When LOOPBACK-NETWORK? is true and 'net is amount NAMESPACES, set up the loopback device (\"lo\") and a minimal /etc/hosts. @@ -303,6 +305,28 @@ (define* (run-container root mounts namespaces host-uids thunk ;; cannot be 'read' so they shouldn't be written as is. (write args child) (primitive-exit 3)))) + + (when (and lock-mounts? + (memq 'mnt namespaces) + (memq 'user namespaces)) + ;; Create a new mount namespace owned by a new user + ;; namespace to "lock" together previous mounts, such that + ;; they cannot be unmounted or remounted separately--see + ;; mount_namespaces(7). + ;; + ;; Note: at this point, the process is single-threaded (no + ;; GC mark threads, no finalization thread, etc.) which is + ;; why unshare(CLONE_NEWUSER) can be used. + (let ((uid (getuid)) (gid (getgid))) + (unshare (logior CLONE_NEWUSER CLONE_NEWNS)) + (when (file-exists? "/proc/self") + (initialize-user-namespace (getpid) + host-uids + #:host-uid uid + #:host-gid gid + #:guest-uid guest-uid + #:guest-gid guest-gid)))) + ;; TODO: Manage capabilities. (write 'ready child) (close-port child) @@ -365,6 +389,7 @@ (define (status->exit-status status) (define* (call-with-container mounts thunk #:key (namespaces %namespaces) (host-uids 1) (guest-uid 0) (guest-gid 0) + (lock-mounts? #t) (relayed-signals (list SIGINT SIGTERM)) (child-is-pid1? #t) (populate-file-system (const #t)) @@ -449,6 +474,7 @@ (define* (call-with-container mounts thunk #:key (namespaces %namespaces) (call-with-temporary-directory (lambda (root) (let ((pid (run-container root mounts namespaces host-uids thunk* + #:lock-mounts? lock-mounts? #:guest-uid guest-uid #:guest-gid guest-gid #:populate-file-system populate-file-system @@ -469,24 +495,35 @@ (define (container-excursion pid thunk) (0 (call-with-clean-exit (lambda () - (for-each (lambda (ns) - (let ((source (namespace-file (getpid) ns)) - (target (namespace-file pid ns))) - ;; Joining the namespace that the process already - ;; belongs to would throw an error so avoid that. - ;; XXX: This /proc interface leads to TOCTTOU. - (unless (string=? (readlink source) (readlink target)) - (call-with-input-file source - (lambda (current-ns-port) - (call-with-input-file target - (lambda (new-ns-port) - (setns (fileno new-ns-port) 0)))))))) - ;; It's important that the user namespace is joined first, - ;; so that the user will have the privileges to join the - ;; other namespaces. Furthermore, it's important that the - ;; mount namespace is joined last, otherwise the /proc mount - ;; point would no longer be accessible. - '("user" "ipc" "uts" "net" "pid" "mnt")) + ;; First, determine the user namespace that owns the pid namespace and + ;; join that user namespace (the assumption is that it also owns all + ;; the other namespaces). It's important that the user namespace is + ;; joined first, so that the user will have the privileges to join the + ;; other namespaces. + (let* ((pid-ns (open-fdes (namespace-file pid "pid") + (logior O_CLOEXEC O_RDONLY))) + (user-ns (get-user-ns pid-ns))) + (close-fdes pid-ns) + (unless (equal? (stat user-ns) + (stat (namespace-file (getpid) "user"))) + (setns user-ns 0)) + (close-fdes user-ns) + + ;; Then join all the remaining namespaces. + (for-each (lambda (ns) + (let ((source (namespace-file (getpid) ns)) + (target (namespace-file pid ns))) + ;; Joining the namespace that the process already + ;; belongs to would throw an error so avoid that. + ;; XXX: This /proc interface leads to TOCTTOU. + (unless (string=? (readlink source) (readlink target)) + (call-with-input-file target + (lambda (new-ns-port) + (setns (fileno new-ns-port) 0)))))) + ;; It's important that the mount namespace is joined last, + ;; otherwise the /proc mount point would no longer be + ;; accessible. + '("ipc" "uts" "net" "pid" "mnt"))) (purify-environment) (chdir "/") diff --git a/tests/containers.scm b/tests/containers.scm index 1e915d517e..6edea9631d 100644 --- a/tests/containers.scm +++ b/tests/containers.scm @@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2015 David Thompson -;;; Copyright © 2016, 2017, 2019, 2023 Ludovic Courtès +;;; Copyright © 2016-2017, 2019, 2023, 2025 Ludovic Courtès ;;; ;;; This file is part of GNU Guix. ;;; @@ -110,6 +110,26 @@ (define (skip-if-unsupported) (assert-exit (file-exists? "/testing"))) #:namespaces '(user mnt)))) +(skip-if-unsupported) +(test-equal "call-with-container, mnt namespace, locked mounts" + EINVAL + ;; umount(2) fails with EINVAL when targeting a mount point that is + ;; "locked". + (status:exit-val + (call-with-container (list (file-system + (device "none") + (mount-point "/testing") + (type "tmpfs") + (check? #f))) + (lambda () + (primitive-exit (catch 'system-error + (lambda () + (umount "/testing") + 0) + (lambda args + (system-error-errno args))))) + #:namespaces '(user mnt)))) + (skip-if-unsupported) (test-equal "call-with-container, mnt namespace, wrong bind mount" `(system-error ,ENOENT) @@ -169,7 +189,8 @@ (define (skip-if-unsupported) #:namespaces '(user mnt)))) (skip-if-unsupported) -(test-assert "container-excursion" +(test-equal "container-excursion" + 0 (call-with-temporary-directory (lambda (root) ;; Two pipes: One for the container to signal that the test can begin, @@ -193,7 +214,11 @@ (define (skip-if-unsupported) (readlink (string-append "/proc/" pid "/ns/" ns))) '("user" "ipc" "uts" "net" "pid" "mnt")))) - (let* ((pid (run-container root '() %namespaces 1 container)) + (let* ((pid (run-container root '() %namespaces 1 container + ;; Do not lock mounts so the user namespace + ;; appears to be the same seen from inside + ;; and from outside. + #:lock-mounts? #f)) (container-namespaces (namespaces pid)) (result (begin @@ -213,7 +238,7 @@ (define (skip-if-unsupported) (write 'done end-out) (close end-out) (waitpid pid) - (zero? result))))))) + result)))))) (skip-if-unsupported) (test-equal "container-excursion, same namespaces"