diff mbox series

[bug#41507,Shepherd,2/2] shepherd: Use 'signalfd' when possible.

Message ID 20200524143700.6378-2-ludo@gnu.org
State Accepted
Headers show
Series Use 'signalfd' on GNU/Linux | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job

Commit Message

Ludovic Courtès May 24, 2020, 2:37 p.m. UTC
This eliminates possible race conditions related to signal delivery and
'select', which in turn means we can pass 'select' an infinite timeout
without any risk of missing a signal.  It also allows us to structure
the code around a single event loop.

* modules/shepherd.scm (maybe-signal-port): New procedure.
(main): Use it and define 'signal-port'.  Add 'ports' argument to
'next-command'.  Pass it to 'select'.  Change the timeout argument of
'select' to #f when SIGNAL-PORT is true.
* modules/shepherd/service.scm (fork+exec-command): Call
'unblock-signals' in the child process, right before 'exec-command'.
---
 modules/shepherd.scm         | 73 ++++++++++++++++++++++++++++++++----
 modules/shepherd/service.scm | 19 ++++++----
 2 files changed, 78 insertions(+), 14 deletions(-)

Comments

Mathieu Othacehe May 25, 2020, 12:31 p.m. UTC | #1
Hello Ludo,

This is a big improvement, thank you!

> +    (lambda args
> +      (if (= ENOSYS (system-error-errno args))
> +          #f
> +          (apply throw args)))))

Maybe:

(and (= ENOSYS (system-error-errno args))
     (apply throw args))

> +
> +(define (handle-signal-port port)
> +  "Read from PORT, a signalfd port, and handle the signal accordingly."
> +  (let ((signal (consume-signalfd-siginfo port)))
> +    (cond ((= signal SIGCHLD)
> +           (handle-SIGCHLD))
> +          ((= signal SIGINT)
> +           (catch 'quit
> +             (lambda ()
> +               (stop root-service))
> +             quit-exception-handler))

Maybe we should create a handle-SIGINT, to make sure that the same code
is shared with the sigaction handler?

> +        (begin
> +          ;; Unblock any signals that might have been blocked by the parent
> +          ;; process.
> +          (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))

This made me realize that we may want to disable/reset SIGINT and
SIGHUP, in the same way that we do for SIGTERM? This is not related to
your patch anyway.

You could also extend the comment to say that it is only necessary if
using the signalfd mechanism (because signals are not blocked
otherwise).

In any case, this looks fine!

Thanks,

Mathieu
Ludovic Courtès May 26, 2020, 10:13 p.m. UTC | #2
Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> +    (lambda args
>> +      (if (= ENOSYS (system-error-errno args))
>> +          #f
>> +          (apply throw args)))))
>
> Maybe:
>
> (and (= ENOSYS (system-error-errno args))
>      (apply throw args))

It’s not equivalent.  :-)

>> +(define (handle-signal-port port)
>> +  "Read from PORT, a signalfd port, and handle the signal accordingly."
>> +  (let ((signal (consume-signalfd-siginfo port)))
>> +    (cond ((= signal SIGCHLD)
>> +           (handle-SIGCHLD))
>> +          ((= signal SIGINT)
>> +           (catch 'quit
>> +             (lambda ()
>> +               (stop root-service))
>> +             quit-exception-handler))
>
> Maybe we should create a handle-SIGINT, to make sure that the same code
> is shared with the sigaction handler?

Good idea, done!

>> +        (begin
>> +          ;; Unblock any signals that might have been blocked by the parent
>> +          ;; process.
>> +          (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
>
> This made me realize that we may want to disable/reset SIGINT and
> SIGHUP, in the same way that we do for SIGTERM? This is not related to
> your patch anyway.

Right, I’ll let you look into it if that’s fine with you.

> You could also extend the comment to say that it is only necessary if
> using the signalfd mechanism (because signals are not blocked
> otherwise).

Done.

I did:

  make dist
  guix build shepherd --with-source=shepherd-0.8.0.tar.gz \
    -s x86_64-linux -s i686-linux -s armhf-linux -s aarch64-linux
  guix build shepherd --with-source=shepherd-0.8.0.tar.gz \
    --target=i586-pc-gnu

and it all passes.

Janneke, could you check that it works on GNU/Hurd?

Ludo’.
Janneke Nieuwenhuizen May 27, 2020, 5:45 a.m. UTC | #3
Ludovic Courtès writes:

Hello,

> Janneke, could you check that it works on GNU/Hurd?

I haven't caught-up with the discussion, but I just verified that
upstream master~

    399bea2 shepherd: Use 'signalfd' when possible.

works on latest wip-hurd-vm!

I did a make dist after finding out (well that was pretty clear from the
patches, but...) that applying the patches need autotools rebootstrap,
etc.

Janneke
Ludovic Courtès May 27, 2020, 9:23 a.m. UTC | #4
Hi,

Jan Nieuwenhuizen <janneke@gnu.org> skribis:

> Ludovic Courtès writes:
>
> Hello,
>
>> Janneke, could you check that it works on GNU/Hurd?
>
> I haven't caught-up with the discussion, but I just verified that
> upstream master~
>
>     399bea2 shepherd: Use 'signalfd' when possible.
>
> works on latest wip-hurd-vm!

Yay, awesome, thanks for checking!

Ludo’.
diff mbox series

Patch

diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 46faab6..ac60070 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -24,6 +24,7 @@ 
   #:use-module (ice-9 format)
   #:use-module (ice-9 rdelim)   ;; Line-based I/O.
   #:autoload   (ice-9 readline) (activate-readline) ;for interactive use
+  #:use-module ((ice-9 threads) #:select (all-threads))
   #:use-module (oop goops)      ;; Defining classes and methods.
   #:use-module (srfi srfi-1)    ;; List library.
   #:use-module (srfi srfi-26)
@@ -60,6 +61,53 @@  socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
         (close sock)
         (catch-system-error (delete-file file-name))))))
 
+(define (maybe-signal-port signals)
+  "Return a signal port for SIGNALS, using 'signalfd' on GNU/Linux, or #f if
+that is not supported."
+  (catch 'system-error
+    (lambda ()
+      (let ((port (signalfd -1 signals)))
+        ;; As per the signalfd(2) man page, block SIGNALS.  The tricky bit is
+        ;; that SIGNALS must be blocked for all the threads; new threads will
+        ;; inherit the signal mask, but we must ensure that neither Guile's
+        ;; signal delivery thread nor its finalization thread are already
+        ;; running, because if they do, they are not blocking SIGNALS.  The
+        ;; signal delivery thread is started on the first call to 'sigaction'
+        ;; so we arrange to not call 'sigaction' beforehand; as for the
+        ;; finalization thread, use 'without-automatic-finalization' to
+        ;; temporarily stop it.
+        (without-automatic-finalization
+         (let ((count (length (all-threads))))
+           (if (= 1 count)
+               (begin
+                 (block-signals signals)
+                 port)
+               (begin
+                 (local-output (l10n "warning: \
+already ~a threads running, disabling 'signalfd' support")
+                               count)
+                 (close-port port)
+                 #f))))))
+    (lambda args
+      (if (= ENOSYS (system-error-errno args))
+          #f
+          (apply throw args)))))
+
+(define (handle-signal-port port)
+  "Read from PORT, a signalfd port, and handle the signal accordingly."
+  (let ((signal (consume-signalfd-siginfo port)))
+    (cond ((= signal SIGCHLD)
+           (handle-SIGCHLD))
+          ((= signal SIGINT)
+           (catch 'quit
+             (lambda ()
+               (stop root-service))
+             quit-exception-handler))
+          ((memv signal (list SIGTERM SIGHUP))
+           (stop root-service))
+          (else
+           #f))))
+
 
 ;; Main program.
 (define (main . args)
@@ -82,6 +130,12 @@  socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
                    (= EINVAL errno)        ;PR_SET_CHILD_SUBREAPER unavailable
                    (apply throw args)))))))
 
+  (define signal-port
+    ;; Attempt to create a "signal port" via 'signalfd'.  This must be called
+    ;; before the 'sigaction' procedure is called, because 'sigaction' spawns
+    ;; the signal thread.
+    (maybe-signal-port (list SIGCHLD SIGINT SIGTERM SIGHUP)))
+
   (initialize-cli)
 
   (let ((config-file #f)
@@ -281,7 +335,9 @@  socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
              ;; "Failed to autoload handle-SIGCHLD in (ice-9 readline):"
              (handle-SIGCHLD)
 
-             (let next-command ()
+             (let next-command ((ports (if signal-port
+                                           (list signal-port sock)
+                                           (list sock))))
                (define (read-from sock)
                  (match (accept sock)
                    ((command-source . client-address)
@@ -289,14 +345,17 @@  socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
                     (process-connection command-source))
                    (_ #f)))
 
-               ;; XXX: Until we use signalfd(2), there's always a time window
+               ;; When not using signalfd(2), there's always a time window
                ;; before 'select' during which a handler async can be queued
                ;; but not executed.  Work around it by exiting 'select' every
                ;; few seconds.
-               (match (select (list sock) (list) (list)
-                              (if poll-services? 0.5 30))
-                 (((sock) _ _)
-                  (read-from sock))
+               (match (select ports (list) (list)
+                              (and (not signal-port)
+                                   (if poll-services? 0.5 30)))
+                 (((port _ ...) _ _)
+                  (if (and signal-port (eq? port signal-port))
+                      (handle-signal-port port)
+                      (read-from sock)))
                  (_
                   ;; 'select' returned an empty set, probably due to EINTR.
                   ;; Explicitly call the SIGCHLD handler because we cannot be
@@ -306,7 +365,7 @@  socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
 
                (when poll-services?
                  (check-for-dead-services))
-               (next-command))))))))
+               (next-command ports))))))))
 
 ;; Start all of SERVICES, which is a list of canonical names (FIXME?),
 ;; but in a order where all dependencies are fulfilled before we
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 45fcf32..a202a98 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -898,13 +898,18 @@  its PID."
         (pid (and (sigaction SIGTERM SIG_DFL)
                   (primitive-fork))))
     (if (zero? pid)
-        (exec-command command
-                      #:user user
-                      #:group group
-                      #:log-file log-file
-                      #:directory directory
-                      #:file-creation-mask file-creation-mask
-                      #:environment-variables environment-variables)
+        (begin
+          ;; Unblock any signals that might have been blocked by the parent
+          ;; process.
+          (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
+
+          (exec-command command
+                        #:user user
+                        #:group group
+                        #:log-file log-file
+                        #:directory directory
+                        #:file-creation-mask file-creation-mask
+                        #:environment-variables environment-variables))
         (begin
           ;; Restore the initial SIGTERM handler.
           (sigaction SIGTERM term-handler)