[bug#33508] gnu: Add ability to restart services on system reconfigure

Message ID 87efb8m5gy.fsf@zancanaro.id.au
State Accepted
Headers show
Series [bug#33508] gnu: Add ability to restart services on system reconfigure | expand

Checks

Context Check Description
cbaines/applying patch success Successfully applied

Commit Message

Carlo Zancanaro Nov. 26, 2018, 11:41 a.m. UTC
Hey Guix!

A few months ago I mentioned the idea of adding the ability to 
have services automatically restarted when running "guix system 
reconfigure". These patches are a start on making that happen. 
They're incomplete (in particular, documentation is missing), but 
I'm offering them up for comment.

The broad idea is to add a new field to our guix shepherd 
services: restart-strategy. There are three valid values:

- always: this service is always safe to restart when running 
  reconfigure
 
- manual: this service may not be safe to restart when running 
  reconfigure - a message will be printed telling the user to 
  restart the service manually, or they can provide the 
  --restart-services flag to reconfigure to automatically restart 
  them

- never: this service is never safe to restart when running 
  reconfigure (eg. udev)

I have added the flag to the guix daemon's shepherd service to 
show how it works. I tested this by changing my substitute servers 
in config.scm, and after running "reconfigure" I saw my updated 
substitute servers in ps without having to run "sudo herd restart 
guix-daemon".

If nobody has any feedback in the next few days then I'll update 
the manual and send through another patch.

Carlo

Comments

Clément Lassieur Nov. 26, 2018, 12:42 p.m. UTC | #1
Hi Carlo,

It might be safer to 'reload' some services, rather than 'restarting'
them.  E.g. for nginx and prosody.  Do you think it would be possible
add a 'custom' value that would point to a custom Shepherd action?

Thank you for your work on this!
Clément


Carlo Zancanaro <carlo@zancanaro.id.au> writes:

> Hey Guix!
>
> A few months ago I mentioned the idea of adding the ability to 
> have services automatically restarted when running "guix system 
> reconfigure". These patches are a start on making that happen. 
> They're incomplete (in particular, documentation is missing), but 
> I'm offering them up for comment.
>
> The broad idea is to add a new field to our guix shepherd 
> services: restart-strategy. There are three valid values:
>
> - always: this service is always safe to restart when running 
>   reconfigure
>  
> - manual: this service may not be safe to restart when running 
>   reconfigure - a message will be printed telling the user to 
>   restart the service manually, or they can provide the 
>   --restart-services flag to reconfigure to automatically restart 
>   them
>
> - never: this service is never safe to restart when running 
>   reconfigure (eg. udev)
>
> I have added the flag to the guix daemon's shepherd service to 
> show how it works. I tested this by changing my substitute servers 
> in config.scm, and after running "reconfigure" I saw my updated 
> substitute servers in ps without having to run "sudo herd restart 
> guix-daemon".
>
> If nobody has any feedback in the next few days then I'll update 
> the manual and send through another patch.
>
> Carlo
>
> From 8b92ebac4fa13a2a89f279b249be152051f31d94 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Mon, 26 Nov 2018 22:38:08 +1100
> Subject: [PATCH 1/3] gnu: Add ability to restart services on system
>  reconfigure
>
> * gnu/services/herd.scm (restart-service): New procedure.
> * gnu/services/shepherd.scm (<shepherd-service>)[restart-strategy]: New
>   field.
>   (shepherd-service-upgrade): Return lists of services to automatically and
>   manually restart.
> * guix/scripts/system.scm (call-with-service-upgrade-info): Pass through
>   services to be automatically and manually restarted.
>   (upgrade-shepherd-services): Automatically restart services that should be
>   automatically restarted, and print a message about manually restarting
>   services that should be manually restarted.
> ---
>  gnu/services/herd.scm     |  5 +++++
>  gnu/services/shepherd.scm | 35 ++++++++++++++++++++++--------
>  guix/scripts/system.scm   | 45 ++++++++++++++++++++++++---------------
>  3 files changed, 59 insertions(+), 26 deletions(-)
>
> diff --git a/gnu/services/herd.scm b/gnu/services/herd.scm
> index 8ff817759..c8d6eb04e 100644
> --- a/gnu/services/herd.scm
> +++ b/gnu/services/herd.scm
> @@ -52,6 +52,7 @@
>              load-services
>              load-services/safe
>              start-service
> +            restart-service
>              stop-service))
>  
>  ;;; Commentary:
> @@ -256,6 +257,10 @@ when passed a service with an already-registered name."
>    (with-shepherd-action name ('start) result
>      result))
>  
> +(define (restart-service name)
> +  (with-shepherd-action name ('restart) result
> +    result))
> +
>  (define (stop-service name)
>    (with-shepherd-action name ('stop) result
>      result))
> diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
> index 49d08cc30..0c80e44f2 100644
> --- a/gnu/services/shepherd.scm
> +++ b/gnu/services/shepherd.scm
> @@ -159,7 +159,9 @@ DEFAULT is given, use it as the service's default value."
>    (auto-start?   shepherd-service-auto-start?          ;Boolean
>                   (default #t))
>    (modules       shepherd-service-modules              ;list of module names
> -                 (default %default-modules)))
> +                 (default %default-modules))
> +  (restart-strategy shepherd-service-restart-strategy
> +                    (default 'manual)))
>  
>  (define-record-type* <shepherd-action>
>    shepherd-action make-shepherd-action
> @@ -344,9 +346,10 @@ symbols provided/required by a service."
>                                      #t))))))
>  
>  (define (shepherd-service-upgrade live target)
> -  "Return two values: the subset of LIVE (a list of <live-service>) that needs
> -to be unloaded, and the subset of TARGET (a list of <shepherd-service>) that
> -need to be restarted to complete their upgrade."
> +  "Return three values: (a) the subset of LIVE (a list of <live-service>) that
> +needs to be unloaded, (b) the subset of TARGET (a list of <shepherd-service>)
> +that can be restarted automatically, and (c) the subset of TARGET that must be
> +restarted manually."
>    (define (essential? service)
>      (memq (first (live-service-provision service))
>            '(root shepherd)))
> @@ -373,14 +376,28 @@ need to be restarted to complete their upgrade."
>        (#f (every obsolete? (live-service-dependents service)))
>        (_  #f)))
>  
> -  (define to-restart
> -    ;; Restart services that are currently running.
> -    (filter running? target))
> -
>    (define to-unload
>      ;; Unload services that are no longer required.
>      (remove essential? (filter obsolete? live)))
>  
> -  (values to-unload to-restart))
> +  (define to-automatically-restart
> +    ;; Automatically restart services that are currently running and can
> +    ;; always be restarted.
> +    (filter (lambda (service)
> +              (and (running? service)
> +                   (eq? (shepherd-service-restart-strategy service)
> +                        'always)))
> +            target))
> +
> +  (define to-manually-restart
> +    ;; Manually restart services that are currently running and must be
> +    ;; manually restarted.
> +    (filter (lambda (service)
> +              (and (running? service)
> +                   (eq? (shepherd-service-restart-strategy service)
> +                        'manual)))
> +            target))
> +
> +  (values to-unload to-automatically-restart to-manually-restart))
>  
>  ;;; shepherd.scm ends here
> diff --git a/guix/scripts/system.scm b/guix/scripts/system.scm
> index d92ec7d5a..6f14b1395 100644
> --- a/guix/scripts/system.scm
> +++ b/guix/scripts/system.scm
> @@ -322,11 +322,12 @@ names of services to load (upgrade), and the list of names of services to
>  unload."
>    (match (current-services)
>      ((services ...)
> -     (let-values (((to-unload to-restart)
> +     (let-values (((to-unload to-automatically-restart to-manually-restart)
>                     (shepherd-service-upgrade services new-services)))
> -       (mproc to-restart
> -              (map (compose first live-service-provision)
> -                   to-unload))))
> +       (mproc (map (compose first live-service-provision)
> +                   to-unload)
> +              to-automatically-restart
> +              to-manually-restart)))
>      (#f
>       (with-monad %store-monad
>         (warning (G_ "failed to obtain list of shepherd services~%"))
> @@ -347,7 +348,7 @@ bring the system down."
>    ;; Arrange to simply emit a warning if the service upgrade fails.
>    (with-shepherd-error-handling
>     (call-with-service-upgrade-info new-services
> -     (lambda (to-restart to-unload)
> +     (lambda (to-unload to-automatically-restart to-manually-restart)
>          (for-each (lambda (unload)
>                      (info (G_ "unloading service '~a'...~%") unload)
>                      (unload-service unload))
> @@ -355,27 +356,37 @@ bring the system down."
>  
>          (with-monad %store-monad
>            (munless (null? new-services)
> -            (let ((new-service-names  (map shepherd-service-canonical-name new-services))
> -                  (to-restart-names   (map shepherd-service-canonical-name to-restart))
> -                  (to-start           (filter shepherd-service-auto-start? new-services)))
> -              (info (G_ "loading new services:~{ ~a~}...~%") new-service-names)
> -              (unless (null? to-restart-names)
> -                ;; Listing TO-RESTART-NAMES in the message below wouldn't help
> -                ;; because many essential services cannot be meaningfully
> -                ;; restarted.  See <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22039#30>.
> -                (format #t (G_ "To complete the upgrade, run 'herd restart SERVICE' to stop,
> -upgrade, and restart each service that was not automatically restarted.\n")))
> +            (let ((new-service-names              (map shepherd-service-canonical-name new-services))
> +                  (to-start-names                 (map shepherd-service-canonical-name (filter shepherd-service-auto-start? new-services)))
> +                  (to-automatically-restart-names (map shepherd-service-canonical-name to-automatically-restart))
> +                  (to-manually-restart-names      (map shepherd-service-canonical-name to-manually-restart)))
> +              (set! to-start-names
> +                (remove (lambda (name)
> +                          (or (member name to-automatically-restart-names)
> +                              (member name to-manually-restart-names)))
> +                        to-start-names))
> +
>                (mlet %store-monad ((files (mapm %store-monad
>                                                 (compose lower-object
>                                                          shepherd-service-file)
>                                                 new-services)))
> +                (for-each restart-service to-automatically-restart-names)
> +
>                  ;; Here we assume that FILES are exactly those that were computed
>                  ;; as part of the derivation that built OS, which is normally the
>                  ;; case.
> +                (info (G_ "loading new services:~{ ~a~}~%") new-service-names)
>                  (load-services/safe (map derivation->output-path files))
> -
> +                (info (G_ "starting services:~{ ~a~}~%") to-start-names)
>                  (for-each start-service
> -                          (map shepherd-service-canonical-name to-start))
> +                          to-start-names)
> +                (info (G_ "restarting services:~{ ~a~}~%") to-automatically-restart-names)
> +                (for-each restart-service
> +                          to-automatically-restart-names)
> +
> +                (unless (null? to-manually-restart-names)
> +                  (format #t (G_ "To complete the upgrade, the following services need to be manually restarted:~{ ~a~}~%")
> +                          to-manually-restart-names))
>                  (return #t)))))))))
>  
>  (define* (switch-to-system os
> -- 
> 2.19.1
>
> From 3fdef27c8f11b6a0f013afa9b6e619659ce78dec Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Mon, 26 Nov 2018 22:38:18 +1100
> Subject: [PATCH 2/3] system: Add --restart-services flag for reconfigure
>
> * guix/scripts/system.scm (upgrade-shepherd-services): Add parameter to
>   automatically restart services marked as needing manual restart.
>   (switch-to-system): Pass through restart-services? flag.
>   (perform-action): Pass through restart-services? flag.
>   (%options): Add --restart-services flag.
>   (%default-options): Add #f as default value for --restart-services flag.
>   (process-action): Pass through restart-services? flag.
> ---
>  guix/scripts/system.scm | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/guix/scripts/system.scm b/guix/scripts/system.scm
> index 6f14b1395..bf632c534 100644
> --- a/guix/scripts/system.scm
> +++ b/guix/scripts/system.scm
> @@ -333,7 +333,7 @@ unload."
>         (warning (G_ "failed to obtain list of shepherd services~%"))
>         (return #f)))))
>  
> -(define (upgrade-shepherd-services os)
> +(define (upgrade-shepherd-services os restart-services?)
>    "Upgrade the Shepherd (PID 1) by unloading obsolete services and loading new
>  services specified in OS and not currently running.
>  
> @@ -360,6 +360,10 @@ bring the system down."
>                    (to-start-names                 (map shepherd-service-canonical-name (filter shepherd-service-auto-start? new-services)))
>                    (to-automatically-restart-names (map shepherd-service-canonical-name to-automatically-restart))
>                    (to-manually-restart-names      (map shepherd-service-canonical-name to-manually-restart)))
> +              (when restart-services?
> +                (set! to-automatically-restart-names (append to-automatically-restart-names
> +                                                             to-manually-restart-names))
> +                (set! to-manually-restart-names '()))
>                (set! to-start-names
>                  (remove (lambda (name)
>                            (or (member name to-automatically-restart-names)
> @@ -389,7 +393,7 @@ bring the system down."
>                            to-manually-restart-names))
>                  (return #t)))))))))
>  
> -(define* (switch-to-system os
> +(define* (switch-to-system os restart-services?
>                             #:optional (profile %system-profile))
>    "Make a new generation of PROFILE pointing to the directory of OS, switch to
>  it atomically, and then run OS's activation script."
> @@ -417,7 +421,7 @@ it atomically, and then run OS's activation script."
>          (primitive-load (derivation->output-path script))))
>  
>        ;; Finally, try to update system services.
> -      (upgrade-shepherd-services os))))
> +      (upgrade-shepherd-services os restart-services?))))
>  
>  (define-syntax-rule (unless-file-not-found exp)
>    (catch 'system-error
> @@ -825,7 +829,8 @@ and TARGET arguments."
>                           use-substitutes? bootloader-target target
>                           image-size file-system-type full-boot?
>                           (mappings '())
> -                         (gc-root #f))
> +                         (gc-root #f)
> +                         (restart-services? #f))
>    "Perform ACTION for OS.  INSTALL-BOOTLOADER? specifies whether to install
>  bootloader; BOOTLOADER-TAGET is the target for the bootloader; TARGET is the
>  target root directory; IMAGE-SIZE is the size of the image to be built, for
> @@ -907,7 +912,7 @@ static checks."
>            (case action
>              ((reconfigure)
>               (mbegin %store-monad
> -               (switch-to-system os)
> +               (switch-to-system os restart-services?)
>                 (mwhen install-bootloader?
>                   (install-bootloader bootloader-script
>                                       #:bootcfg bootcfg
> @@ -1090,6 +1095,9 @@ Some ACTIONS support additional ARGS.\n"))
>           (option '(#\r "root") #t #f
>                   (lambda (opt name arg result)
>                     (alist-cons 'gc-root arg result)))
> +         (option '("restart-services") #f #f
> +                 (lambda (opt name arg result)
> +                   (alist-cons 'restart-services? #t result)))
>           %standard-build-options))
>  
>  (define %default-options
> @@ -1104,7 +1112,8 @@ Some ACTIONS support additional ARGS.\n"))
>      (verbosity . 0)
>      (file-system-type . "ext4")
>      (image-size . guess)
> -    (install-bootloader? . #t)))
> +    (install-bootloader? . #t)
> +    (restart-services? . #f)))
>  
>  
>  ;;;
> @@ -1177,7 +1186,8 @@ resulting from command-line parsing."
>                               #:install-bootloader? bootloader?
>                               #:target target
>                               #:bootloader-target bootloader-target
> -                             #:gc-root (assoc-ref opts 'gc-root)))))
> +                             #:gc-root (assoc-ref opts 'gc-root)
> +                             #:restart-services? (assoc-ref opts 'restart-services?)))))
>          #:system system))
>      (warn-about-disk-space)))
>  
> -- 
> 2.19.1
>
> From 099a8e2e6e28b38816ed1ba895c407f1d9efe62e Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Mon, 26 Nov 2018 22:38:26 +1100
> Subject: [PATCH 3/3] services: Always restart guix daemon
>
> * gnu/services/base.scm (guix-shepherd-service): Add restart-strategy of
>   'always.
> ---
>  gnu/services/base.scm | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gnu/services/base.scm b/gnu/services/base.scm
> index 228d3c592..7e0fdcb3e 100644
> --- a/gnu/services/base.scm
> +++ b/gnu/services/base.scm
> @@ -1573,6 +1573,7 @@ failed to register hydra.gnu.org public key: ~a~%" status))))))))
>             (documentation "Run the Guix daemon.")
>             (provision '(guix-daemon))
>             (requirement '(user-processes))
> +           (restart-strategy 'always)
>             (modules '((srfi srfi-1)))
>             (start
>              #~(make-forkexec-constructor
Carlo Zancanaro Nov. 26, 2018, 8:11 p.m. UTC | #2
Hey Clément!

On Mon, Nov 26 2018, Clément Lassieur wrote:
> It might be safer to 'reload' some services, rather than 
> 'restarting' them.  E.g. for nginx and prosody.  Do you think it 
> would be possible add a 'custom' value that would point to a 
> custom Shepherd action?

I can add this, but I don't think this is as useful as it 
initially sounds. Most of our services are a specific version of a 
service pointing to a specific version of a configuration file 
(ie. that's in the store). That means that a "reload" shepherd 
action won't be able to know where the new configuration file is 
to load it.

We could solve this in one of two ways:

1) by allowing an arbitrary procedure as the value of 
restart-strategy, because it can then call a shepherd action with 
the appropriate configuration file, but then our action will have 
to detect whether the binary has been changed (which would also 
detect any dependencies changing). This may also lead to an 
inconsistent user experience where a "reconfigure" might lead to a 
reload, or might lead to a restart, and it's not obvious which it 
will be.

2) by changing our services to create configuration files in a 
known location (ie. /etc/nginx/nginx.conf). This would make it so 
a simple "reload" action in the service could meaningfully reload 
the service, but only if the binary was unchanged (because the old 
binary might not be able to read the new configuration format, for 
instance). This still leads to the above problem around the 
inconsistent user experience, and adds some complexity in terms of 
how configuration files are managed.

I lean towards option (1), because it gives us the ability to call 
a shepherd action if we want, but also allows us to do arbitrary 
other things with the extra knowledge on the Guix side.

In the end, though, I'm unconvinced that this is a useful thing to 
add. What do you think?

Carlo
Clément Lassieur Nov. 26, 2018, 9:02 p.m. UTC | #3
Hey Carlo!

Carlo Zancanaro <carlo@zancanaro.id.au> writes:

> I can add this, but I don't think this is as useful as it initially
> sounds. Most of our services are a specific version of a service pointing to a
> specific version of a configuration file (ie. that's in the store). That means
> that a "reload" shepherd action won't be able to know where the new
> configuration file is to load it.
>
> We could solve this in one of two ways:
>
> 1) by allowing an arbitrary procedure as the value of restart-strategy,
> because it can then call a shepherd action with the appropriate configuration
> file

> but then our action will have to detect whether the binary has been
> changed (which would also detect any dependencies changing).

I don't think it needs to detect whether the binary has changed, because
'reload' signals are usually implemented so that they can safely fail.
So, if the configuration file has changed in an incompatible way, the
'reload' action won't work, but the service will keep running.

> This may also lead to an inconsistent user experience where a
> "reconfigure" might lead to a reload, or might lead to a restart, and
> it's not obvious which it will be.

Your patch also leads to this inconsistency, because it allows a service
to either be restarted or not, in my opinion :-)

> 2) by changing our services to create configuration files in a known location
> (ie. /etc/nginx/nginx.conf). This would make it so a simple "reload" action in
> the service could meaningfully reload the service, but only if the binary was
> unchanged (because the old binary might not be able to read the new
> configuration format, for instance). This still leads to the above problem
> around the inconsistent user experience, and adds some complexity in terms of
> how configuration files are managed.
>
> I lean towards option (1), because it gives us the ability to call a shepherd
> action if we want, but also allows us to do arbitrary other things with the
> extra knowledge on the Guix side.

I think both (1) and (2) make sense because both kind of services (the
ones pointing to configuration files in the store and the ones using
/etc/some-file.conf) already exist.  Ideally, the mechanism should be
generic enough to handle both cases.

> In the end, though, I'm unconvinced that this is a useful thing to add. What
> do you think?

I don't agree :-).  A 'restart' is inherently dangerous because there is
a chance for the restart to fail (say, if the new configuration file is
erroneous), whereas the 'reload' action cannot fail (if it is correctly
implemented).

That being said, I agree that adding support for 'reload' would lead to
more complexity, and I would understand if you don't add it :-), but I
still think it's a very useful feature.

One question though: my understanding is that the default value for
'restart-strategy' is set in the Guix repository, but a user would be
able to customize it in their config.scm.  Can you confirm it?

Thank you,
Clément
Carlo Zancanaro Nov. 26, 2018, 9:59 p.m. UTC | #4
Hey Clément,

Thanks for your thoughts! I think you're right that the approach 
I've implemented isn't flexible enough. I potentially haven't 
thought through the failure cases enough. I was more thinking of 
reload as providing "zero downtime" upgrades, rather than 
providing a safer way to upgrade.

I'll respond more specifically inline.

On Tue, Nov 27 2018, Clément Lassieur wrote:
> I don't think it needs to detect whether the binary has changed, 
> because 'reload' signals are usually implemented so that they 
> can safely fail.  So, if the configuration file has changed in 
> an incompatible way, the 'reload' action won't work, but the 
> service will keep running.

We do need to detect whether the binary has changed for the sake 
of security updates, or similar. It would be bad if a user 
reconfigured their system and it didn't upgrade the version of 
nginx (or its dependencies) that they're running.

Broadly speaking, I conceptualise reconfigure as "bring my system 
into this state". Now, thus far we haven't been able to do that, 
because we have lacked the ability to restart services properly, 
but in my mind the ideal situation is that after running "guix 
system reconfigure" our system is completely put into the state 
specified by the config.scm file used.

Although, now that I type that out, I notice that there is one 
obvious way in which that is not true: the kernel. We can't 
hot-swap the kernel, so there can always be a difference between 
what the configuration file specifies and what the system is 
actually running.

At any rate, even if we give services the ability to reload 
without restarting, they would need to print out a message to 
prompt the user to manually restart them if the binary has 
changed. I would also then expect the --restart-services flag to 
fully restart those services, rather than just reloading them.

> Your patch also leads to this inconsistency, because it allows a 
> service to either be restarted or not, in my opinion :-)

Yes, that's true, but then there's no middle-ground. Reloading is 
"new configuration, old binary", whereas the current options are 
"old configuration, old binary" or "new configuration, new binary" 
(or, I guess, "not running because of a failed restart").

> I think both (1) and (2) make sense because both kind of 
> services (the ones pointing to configuration files in the store 
> and the ones using /etc/some-file.conf) already exist.  Ideally, 
> the mechanism should be generic enough to handle both cases.

(1) actually subsumes (2), so I think I'll implement that. It 
actually ends up being slightly easier, because the restart 
strategy can just always be a procedure, with three predefined 
procedures: always-restart, manually-restart, and never-restart.

> That being said, I agree that adding support for 'reload' would 
> lead to more complexity, and I would understand if you don't add 
> it :-), but I still think it's a very useful feature.

I think you've convinced me that there's value in having more 
flexibility around restarts. I'll change the restart-strategy 
value to be a procedure rather than a bare symbol. The downside is 
that we'll lose the ability to statically analyse how services 
will behave when restarting, but the increased flexibility is 
useful to us.

> One question though: my understanding is that the default value 
> for 'restart-strategy' is set in the Guix repository, but a user 
> would be able to customize it in their config.scm.  Can you 
> confirm it?

That is not the current implementation. Individual services can 
add a field to their configuration objects if it's meaningful for 
them to customise their restart behaviour, but there isn't a 
general-purpose mechanism for a user to change the restart 
behaviour of any service (beyond the ability to write arbitrary 
Scheme to do it).

Carlo
Clément Lassieur Nov. 30, 2018, 12:12 p.m. UTC | #5
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> writes:

> Hey Clément,
>
> Thanks for your thoughts! I think you're right that the approach I've
> implemented isn't flexible enough. I potentially haven't thought through the
> failure cases enough. I was more thinking of reload as providing "zero
> downtime" upgrades, rather than providing a safer way to upgrade.
>
> I'll respond more specifically inline.
>
> On Tue, Nov 27 2018, Clément Lassieur wrote:
>> I don't think it needs to detect whether the binary has changed, because
>> 'reload' signals are usually implemented so that they can safely fail.  So,
>> if the configuration file has changed in an incompatible way, the 'reload'
>> action won't work, but the service will keep running.
>
> We do need to detect whether the binary has changed for the sake of security
> updates, or similar. It would be bad if a user reconfigured their system and
> it didn't upgrade the version of nginx (or its dependencies) that they're
> running.

If there is a risk for a service to be broken on reconfigure, a user
might want to do a safe reconfigure, and later on deal with each
critical service one after another, so to avoid having several services
down at the same time.  I think we should at least allow a user to do a
'safe reconfigure' if they want.

> Broadly speaking, I conceptualise reconfigure as "bring my system into this
> state". Now, thus far we haven't been able to do that, because we have lacked
> the ability to restart services properly, but in my mind the ideal situation
> is that after running "guix system reconfigure" our system is completely put
> into the state specified by the config.scm file used.

This is ideal, but most services depend on a state (Cuirass, mail
servers...).

> Although, now that I type that out, I notice that there is one obvious way in
> which that is not true: the kernel. We can't hot-swap the kernel, so there can
> always be a difference between what the configuration file specifies and what
> the system is actually running.

And I don't think you can restart Xorg either...

> At any rate, even if we give services the ability to reload without
> restarting, they would need to print out a message to prompt the user to
> manually restart them if the binary has changed. I would also then expect the
> --restart-services flag to fully restart those services, rather than just
> reloading them.

Agreed :-)

>> Your patch also leads to this inconsistency, because it allows a service to
>> either be restarted or not, in my opinion :-)
>
> Yes, that's true, but then there's no middle-ground. Reloading is "new
> configuration, old binary", whereas the current options are "old
> configuration, old binary" or "new configuration, new binary" (or, I guess,
> "not running because of a failed restart").
>
>> I think both (1) and (2) make sense because both kind of services (the ones
>> pointing to configuration files in the store and the ones using
>> /etc/some-file.conf) already exist.  Ideally, the mechanism should be
>> generic enough to handle both cases.
>
> (1) actually subsumes (2), so I think I'll implement that. It actually ends up
> being slightly easier, because the restart strategy can just always be a
> procedure, with three predefined procedures: always-restart, manually-restart,
> and never-restart.
>
>> That being said, I agree that adding support for 'reload' would lead to more
>> complexity, and I would understand if you don't add it :-), but I still
>> think it's a very useful feature.
>
> I think you've convinced me that there's value in having more flexibility
> around restarts. I'll change the restart-strategy value to be a procedure
> rather than a bare symbol. The downside is that we'll lose the ability to
> statically analyse how services will behave when restarting, but the increased
> flexibility is useful to us.

It could also detect whether you pass a symbol or a procedure.  In most
cases that would be a symbol which would allow static analysis.  But one
could still customize it with a procedure.

>> One question though: my understanding is that the default value for
>> 'restart-strategy' is set in the Guix repository, but a user would be able
>> to customize it in their config.scm.  Can you confirm it?
>
> That is not the current implementation. Individual services can add a field to
> their configuration objects if it's meaningful for them to customise their
> restart behaviour, but there isn't a general-purpose mechanism for a user to
> change the restart behaviour of any service (beyond the ability to write
> arbitrary Scheme to do it).

Ok thank you!

Clément
Ludovic Courtès Dec. 9, 2018, 4:59 p.m. UTC | #6
Hi Carlo,

Sorry for not commenting earlier!

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

> The broad idea is to add a new field to our guix shepherd services:
> restart-strategy. There are three valid values:
>
> - always: this service is always safe to restart when running
> reconfigure
>
> - manual: this service may not be safe to restart when running
> reconfigure - a message will be printed telling the user to restart
> the service manually, or they can provide the --restart-services flag
> to reconfigure to automatically restart them
>
> - never: this service is never safe to restart when running
> reconfigure (eg. udev)
>
> I have added the flag to the guix daemon's shepherd service to show
> how it works. I tested this by changing my substitute servers in
> config.scm, and after running "reconfigure" I saw my updated
> substitute servers in ps without having to run "sudo herd restart
> guix-daemon".

In what sense is guix-daemon “always safe to restart”?  It’s actually a
difficult question for me.

You could argue that its child guix-daemon processes will remain live
when we restart it, meaning that client connections remain active and
valid.  I believe this is indeed the case, though it would be worth
double-checking.

Now, if safe-to-restart means that we automatically invoke the “restart”
action on guix-daemon, that means that anything that depends on it
(‘guix-publish’, ‘cuirass’, ‘hpcguix-web’, etc.) would be restarted as
well (even though I *think* we don’t have to in this case.)  But these
may not be safe to restart: for example, on may want ‘guix-publish’ to
run uninterrupted.

Furthermore, whether something is “safe to restart” is really user
policy.

So the notion here should probably not be “safe to restart” but rather
“live-upgradable”.

sshd, nginx, and maybe guix-daemon can more or less be live-upgraded,
meaning that (1) existing connections are preserved but future
connections will talk to the new daemon, and as a corollary, (2)
dependent services do not need to be stopped & restarted.

Does that make sense?

Thanks,
Ludo’.
Carlo Zancanaro Jan. 1, 2019, 11:25 a.m. UTC | #7
Hey Ludo’,

Sorry for not responding to this email for so long. I've been 
trying to think through some of the issues around this, and I'm 
not confident that I have thought through the issues well enough 
to actually decide on a good course of action, beyond what I have 
already written. I'll respond to a few specific things in your 
message, but I don't even know what a good solution would look 
like, let alone how to build it.

On Mon, Dec 10 2018, Ludovic Courtès wrote:
> In what sense is guix-daemon “always safe to restart”?  It’s 
> actually a difficult question for me.

I agree it's tricky. I had mostly intended that as an example, 
because I used guix-daemon for my testing, but ...

> You could argue that its child guix-daemon processes will remain 
> live when we restart it, meaning that client connections remain 
> active and valid.  I believe this is indeed the case, though it 
> would be worth double-checking.

... this is what I was thinking. I'm fairly sure this is the case, 
given my observations while I was testing these patches.

> Now, if safe-to-restart means that we automatically invoke the 
> “restart” action on guix-daemon, that means that anything that 
> depends on it (‘guix-publish’, ‘cuirass’, ‘hpcguix-web’, etc.) 
> would be restarted as well (even though I *think* we don’t have 
> to in this case.)  But these may not be safe to restart: for 
> example, on may want ‘guix-publish’ to run uninterrupted.

At the moment we have no way to capture this, particularly in the 
Shepherd. There's no way to restart a service without restarting 
dependent services, but I particularly want to pick up on the 
"uninterrupted" by talking about nginx below.

> ...

> sshd, nginx, and maybe guix-daemon can more or less be 
> live-upgraded, meaning that (1) existing connections are 
> preserved but future connections will talk to the new daemon, 
> and as a corollary, (2) dependent services do not need to be 
> stopped & restarted.

I did some research into nginx, and it turns out that it is 
possible to upgrade nginx with zero-downtime by running two 
daemons simultaneously listening on the same port(s), then 
shutting down the old daemon after the new one has successfully 
started[1]. This allows for an "uninterrupted" upgrade, but I'm 
not confident that I would be able to implement it within our 
current framework.

In all, I haven't done anything with this in the last month. I've 
thought about it a few times, but it just feels a bit 
overwhelming.

Carlo

[1]: https://nginx.org/en/docs/control.html#upgrade
Ludovic Courtès Jan. 5, 2019, 2 p.m. UTC | #8
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

> Sorry for not responding to this email for so long. I've been trying
> to think through some of the issues around this, and I'm not confident
> that I have thought through the issues well enough to actually decide
> on a good course of action, beyond what I have already written. I'll
> respond to a few specific things in your message, but I don't even
> know what a good solution would look like, let alone how to build it.

Sure, we can take more time to think through it.  You earlier work in
this area has already greatly improved the situation so I feel less
pressure now.

> I did some research into nginx, and it turns out that it is possible
> to upgrade nginx with zero-downtime by running two daemons
> simultaneously listening on the same port(s), then shutting down the
> old daemon after the new one has successfully started[1]. This allows
> for an "uninterrupted" upgrade, but I'm not confident that I would be
> able to implement it within our current framework.

Nginx does all this for us.  Basically if you run “nginx -s restart”,
IIRC, it automatically does the multi-process dance and you eventually
end up with only upgraded nginx processes.  However it relies on being
able to read its new configuration file from the same location as
before, which is something that doesn’t quite work in our setting,
unless we make the file available at a fixed location like
/etc/nginx.conf.  Clément looked into this a while back but I cannot
find the reference.

Anyway I think we should probably special-case the ‘restart’ action for
those live-upgradable services.  That doesn’t require any change in the
Shepherd.

Thoughts?

Ludo’.

Patch

From 099a8e2e6e28b38816ed1ba895c407f1d9efe62e Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Mon, 26 Nov 2018 22:38:26 +1100
Subject: [PATCH 3/3] services: Always restart guix daemon

* gnu/services/base.scm (guix-shepherd-service): Add restart-strategy of
  'always.
---
 gnu/services/base.scm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index 228d3c592..7e0fdcb3e 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -1573,6 +1573,7 @@  failed to register hydra.gnu.org public key: ~a~%" status))))))))
            (documentation "Run the Guix daemon.")
            (provision '(guix-daemon))
            (requirement '(user-processes))
+           (restart-strategy 'always)
            (modules '((srfi srfi-1)))
            (start
             #~(make-forkexec-constructor
-- 
2.19.1