diff mbox series

[bug#65221,1/2] service: make EXTRA-PORTS work as advertised.

Message ID 20230811090615.3707-1-striness@tilde.club
State New
Headers show
Series Fix EXTRA-PORTS edge cases | expand

Commit Message

ulfvonbelow Aug. 11, 2023, 9:06 a.m. UTC
EXEC-COMMAND (and, by extension, FORK+EXEC-COMMAND) has several issues:
1. Despite it being documented that "all other file descriptors are closed
   prior to yielding control to COMMAND", this is not currently the case -
   only other file descriptors that are already marked as FD_CLOEXEC are
   closed.  For example, if user code happens to have a file descriptor open,
   for example with call-with-input-file, while EXEC-COMMAND is run, the new
   process image will inherit that file descriptor.  This may cause some
   resource waste, but more importantly may cause security issues in certain
   situations.
2. EXTRA-PORTS is only honored when either LOG-PORT or LOG-FILE is passed.  I
   have no idea why this is the case, it isn't documented anywhere, and it
   isn't intuitive.
3. Even when LOG-PORT or LOG-FILE is passed, EXTRA-PORTS may not work as
   described, because it copies file descriptor contents in an arbitrary
   order.  For example, suppose that (map fileno EXTRA-PORTS) is (7 6 5 4 3).
   If the underlying file originally stored in fd N is represented by F(N), it
   will assign
   3 <-- F(7)
   4 <-- F(6)
   5 <-- F(5)
   6 <-- F(6)
   7 <-- F(7)

   In other words, the copying of earlier FDs in EXTRA-PORTS may overwrite
   later FDs in EXTRA-PORTS.

Because the process of properly and safely copying those FDs involves many
steps, we've split it, along with marking all file descriptors not being
preserved as FD_CLOEXEC, into a separate procedure named PRESERVE-PORTS.

* modules/shepherd/service.scm (preserve-ports): new procedure.
  (exec-command): use it.
---
 modules/shepherd/service.scm | 119 +++++++++++++++++++++++------------
 1 file changed, 78 insertions(+), 41 deletions(-)

Comments

Ludovic Courtès Aug. 15, 2023, 10:55 a.m. UTC | #1
Hi,

ulfvonbelow <striness@tilde.club> skribis:

> EXEC-COMMAND (and, by extension, FORK+EXEC-COMMAND) has several issues:
> 1. Despite it being documented that "all other file descriptors are closed
>    prior to yielding control to COMMAND", this is not currently the case -
>    only other file descriptors that are already marked as FD_CLOEXEC are
>    closed.  For example, if user code happens to have a file descriptor open,
>    for example with call-with-input-file, while EXEC-COMMAND is run, the new
>    process image will inherit that file descriptor.  This may cause some
>    resource waste, but more importantly may cause security issues in certain
>    situations.

Yes.  This has been the case since 0.9.2, as noted in ‘NEWS’:

  Previously, services started indirectly with ‘exec-command’ (which is usually
  the case) would not inherit any file descriptor from shepherd because
  ‘exec-command’ would explicitly close all of them.  However, services started
  with ‘make-system-constructor’ and processes created by some other means, such
  as calling ‘system*’, would inherit some of those descriptors, giving them
  more authority than intended.

  The change here consists in marking all internally-used file descriptors as
  “close-on-exec” (O_CLOEXEC), a feature that’s been available on GNU/Linux and
  GNU/Hurd for years but that so far wasn’t used consistently in shepherd.  This
  is now fixed.  As a side-effect, the file-descriptor-closing loop in
  ‘exec-command’ is now gone.

The FD-closing loop was removed on purpose, in
2c0354258047133db8b885bcc11afdf0def5d885.

Now, as you write, it means that service writers must be careful now and
not leave any non-CLOEXEC file descriptor behind them.

At the time I audited Guix System to check that this was a reasonable
thing to expect and that we could indeed ensure no file descriptors were
leaked.  There’s also ‘tests/close-on-exec.sh’.

If you found cases where it would be necessary, what we could do is have
‘shepherd’ replace ‘call-with-input-file’ & co. with a variant that
opens files as O_CLOEXEC by default.  WDYT?

> 2. EXTRA-PORTS is only honored when either LOG-PORT or LOG-FILE is passed.  I
>    have no idea why this is the case, it isn't documented anywhere, and it
>    isn't intuitive.

#:extra-ports wasn’t really made to be exposed I guess; it was added for
use by systemd-style services in 965f6b61a473ee57a1fc6ec3ea1ad6e35d596031.

> 3. Even when LOG-PORT or LOG-FILE is passed, EXTRA-PORTS may not work as
>    described, because it copies file descriptor contents in an arbitrary
>    order.  For example, suppose that (map fileno EXTRA-PORTS) is (7 6 5 4 3).
>    If the underlying file originally stored in fd N is represented by F(N), it
>    will assign
>    3 <-- F(7)
>    4 <-- F(6)
>    5 <-- F(5)
>    6 <-- F(6)
>    7 <-- F(7)
>
>    In other words, the copying of earlier FDs in EXTRA-PORTS may overwrite
>    later FDs in EXTRA-PORTS.

Good catch!

Could you make a more minimal patch fixing this specific issue, also
adding a test reproducing the problem being fixed?

Thanks for your work!

Ludo’.
ulfvonbelow Aug. 18, 2023, 8:21 p.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> writes:

>  Previously, services started indirectly with ‘exec-command’ (which is usually
>  the case) would not inherit any file descriptor from shepherd because
>  ‘exec-command’ would explicitly close all of them.  However, services started
>  with ‘make-system-constructor’ and processes created by some other means, such
>  as calling ‘system*’, would inherit some of those descriptors, giving them
>  more authority than intended.

Interestingly enough, guile's system* closes all file descriptors aside
from the first 3, and we now provide a replacement for system* that uses
fork+exec-command and therefore does not.

> The FD-closing loop was removed on purpose, in
> 2c0354258047133db8b885bcc11afdf0def5d885.
>
> Now, as you write, it means that service writers must be careful now and
> not leave any non-CLOEXEC file descriptor behind them.
>
> At the time I audited Guix System to check that this was a reasonable
> thing to expect and that we could indeed ensure no file descriptors were
> leaked.  There’s also ‘tests/close-on-exec.sh’.
>
> If you found cases where it would be necessary, what we could do is have
> ‘shepherd’ replace ‘call-with-input-file’ & co. with a variant that
> opens files as O_CLOEXEC by default.  WDYT?
The problem is that there isn't a way for the user to replicate the old
behavior of default-non-inheriting short of wrapping the desired command
with another file-descriptor-closing program, by which point the user
and group may already have been changed.

O_CLOEXEC is good, but to use it to prevent leaks to any particular
program is a matter of global correctness, while closing everything not
explicitly allowed for a program is a matter of local correctness.  For
example, if I have a program whose quality (and, to some extent,
trustworthiness) I am wary of, I would really prefer to be certain that
it doesn't get any file descriptors it shouldn't.  The only way to be
certain of this currently is to examine the entire shepherd process -
including all services - to be sure there aren't any leaks.

Replacing 'call-with-input-file' and such - as I now see is done in
guix's shepherd config file - would probably be a good idea (I see we
use call-with-input-file in primitive-load* and read-pid-file, for
example), but it's still no guarantee.

For example, I didn't even know SOCK_CLOEXEC was a thing, and so didn't
use it in one of my services that calls socketpair.  I did use fcntl
afterward, but the time between those two introduces all kinds of
questions, like "have you called system, system*, spawn-command, sleep,
done any I/O, or had the misfortune to receive a SIGCHLD on a system
without signalfd support".

And if I hadn't happened to have looked at the source of exec-command, I
wouldn't have even tried fcntl, because the manual entry for
fork+exec-command and exec-command still clearly states:

  File descriptors 1 and 2 are kept as is or redirected to either
  LOG-PORT or LOG-FILE if it’s true, whereas file descriptor 0 (standard
  input) points to INPUT-PORT or ‘/dev/null’; all other file descriptors
  are closed prior to yielding control to COMMAND.

Whatever course of action is taken, it is imperative that the
documentation and code be aligned on this matter.  Personally I'm
inclined to bring the code in line with the documentation.

>>    In other words, the copying of earlier FDs in EXTRA-PORTS may overwrite
>>    later FDs in EXTRA-PORTS.
>
> Good catch!
>
> Could you make a more minimal patch fixing this specific issue, also
> adding a test reproducing the problem being fixed?

I'm sending a more granular v2 series that adds the requested test,
fixes the clobbering, makes extra-ports be honored regardless of the
values of log-port and log-file, fixes edge cases around file
descriptors 0-2, and finally adds support for closing all other file
descriptors to exec-command.  That last one also makes closing all other
file descriptors the default, mostly because it makes the most sense for
the default value of extra-ports, and otherwise I'd have to update the
manual.  It does include the option of #:extra-ports #t, though, in
which case no fds are closed.

One alternative would be to have the close-all-others behavior
controlled by an independent argument; that way the file descriptor
rearranging functionality can be independent of whether other file
descriptors are closed.

Speaking of file descriptor rearranging functionality, the v2 replaces
'preserve-ports' with 'reconfigure-fds', which uses a different,
graph-based approach to avoiding clobbering while using fewer system
calls.  It's also more robust, and will function as long as there is
even a single unused file descriptor, and won't touch any open file
descriptors other than those in the target range.

- ulfvonbelow
diff mbox series

Patch

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 68553d4..ffbd03c 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -1434,6 +1434,52 @@  FILE."
   (list->vector (map (lambda (group) (group:gid (getgr group)))
                      supplementary-groups)))
 
+(define (preserve-ports extra-ports)
+  "Duplicate the FDs (fd1 fd2 ... fdN) corresponding to the N ports in
+EXTRA-PORTS into the FD range (3 4 ... 3+N).  This will work regardless of the
+numeric values of fd1 ... fdN.  Any open file descriptors not in EXTRA-PORTS
+and numbered 3 or higher WILL be closed or marked FD_CLOEXEC."
+  ;; We employ the following strategy: copy FDs as high as possible, in
+  ;; descending order of FD, so as to avoid clobbering, then copy the high FDs
+  ;; to low FDs, in the order specified in EXTRA-PORTS.  If more than half of
+  ;; the FD range is included in EXTRA-PORTS, this still won't work, and we
+  ;; may reach a point where copying low will require us to copy the
+  ;; still-uncopied FDs high again.  This should be sufficiently rare as to
+  ;; not be a concern.
+  (let* ((max-fds-count (max-file-descriptors))
+         (highest-fd (- max-fds-count 1))
+         (extra-fds-count (length extra-ports))
+         (preserved-fds-count (+ 3 extra-fds-count))
+         (extra-fds (map fileno extra-ports))
+         (index+fd (map cons
+                        (iota extra-fds-count)
+                        extra-fds))
+         (index+fd-by-fileno (sort index+fd
+                                   (lambda (pair1 pair2)
+                                     (> (cdr pair1)
+                                        (cdr pair2)))))
+         (index2+fd-by-fileno (map cons
+                                   (iota extra-fds-count)
+                                   index+fd-by-fileno))
+         (index2+fd (sort index2+fd-by-fileno
+                          (lambda (spec1 spec2)
+                            (< (second spec1) (second spec2))))))
+    (for-each dup2
+              (map cdr index+fd-by-fileno)
+              (iota extra-fds-count highest-fd -1))
+    (for-each (match-lambda
+                ((by-fileno-index original-index . original-fd)
+                 (dup2 (- highest-fd by-fileno-index)
+                       (+ 3 original-index))))
+              index2+fd)
+    (for-each (lambda (fd)
+                (catch-system-error
+                 (let ((flags (fcntl fd F_GETFD)))
+                   (when (zero? (logand flags FD_CLOEXEC))
+                     (fcntl fd F_SETFD (logior FD_CLOEXEC flags))))))
+              (iota (- max-fds-count preserved-fds-count)
+                    preserved-fds-count))))
+
 (define* (exec-command command
                        #:key
                        (user #f)
@@ -1479,48 +1525,39 @@  false."
      (chdir directory)
      (environ environment-variables)
 
-     ;; Close all the file descriptors except stdout and stderr.
-     (let ((max-fd (max-file-descriptors)))
+     ;; Redirect stdin.
+     (catch-system-error (close-fdes 0))
+     ;; Make sure file descriptor zero is used, so we don't end up reusing
+     ;; it for something unrelated, which can confuse some packages.
+     (dup2 (if input-port
+               (fileno input-port)
+               (open-fdes "/dev/null" O_RDONLY))
+           0)
 
-       ;; Redirect stdin.
-       (catch-system-error (close-fdes 0))
-       ;; Make sure file descriptor zero is used, so we don't end up reusing
-       ;; it for something unrelated, which can confuse some packages.
-       (dup2 (if input-port
-                 (fileno input-port)
-                 (open-fdes "/dev/null" O_RDONLY))
-             0)
+     (when (or log-port log-file)
+       (catch #t
+         (lambda ()
+           ;; Redirect stdout and stderr to use LOG-FILE.
+           (catch-system-error (close-fdes 1))
+           (catch-system-error (close-fdes 2))
+           (dup2 (if log-file
+                     (open-fdes log-file (logior O_CREAT O_WRONLY O_APPEND)
+                                #o640)
+                     (fileno log-port))
+                 1)
+           (dup2 1 2))
 
-       (when (or log-port log-file)
-         (catch #t
-           (lambda ()
-             ;; Redirect stout and stderr to use LOG-FILE.
-             (catch-system-error (close-fdes 1))
-             (catch-system-error (close-fdes 2))
-             (dup2 (if log-file
-                       (open-fdes log-file (logior O_CREAT O_WRONLY O_APPEND)
-                                  #o640)
-                       (fileno log-port))
-                   1)
-             (dup2 1 2)
-
-             ;; Make EXTRA-PORTS available starting from file descriptor 3.
-             ;; This clears their FD_CLOEXEC flag.
-             (let loop ((fd    3)
-                        (ports extra-ports))
-               (match ports
-                 (() #t)
-                 ((port rest ...)
-                  (catch-system-error (close-fdes fd))
-                  (dup2 (fileno port) fd)
-                  (loop (+ 1 fd) rest)))))
-
-           (lambda (key . args)
-             (when log-file
-               (format (current-error-port)
-                       "failed to open log-file ~s:~%" log-file))
-             (print-exception (current-error-port) #f key args)
-             (primitive-exit 1))))
+         (lambda (key . args)
+           (when log-file
+             (format (current-error-port)
+                     "failed to open log-file ~s:~%" log-file))
+           (print-exception (current-error-port) #f key args)
+           (primitive-exit 1))))
+
+     ;; Close all the file descriptors except stdout, stderr, and EXTRA-PORTS.
+     ;; Make EXTRA-PORTS available starting from file descriptor 3.
+     ;; This clears their FD_CLOEXEC flag.
+     (preserve-ports extra-ports)
 
      ;; setgid must be done *before* setuid, otherwise the user will
      ;; likely no longer have permissions to setgid.
@@ -1558,7 +1595,7 @@  false."
          (format (current-error-port)
                  "exec of ~s failed: ~a~%"
                  program (strerror (system-error-errno args)))
-         (primitive-exit 1)))))))
+         (primitive-exit 1))))))
 
 (define %precious-signals
   ;; Signals that the shepherd process handles.