mbox

[bug#36555,v3,0/3] Refactor out common behavior for system reconfiguration.

Message ID 87v9w1zgon.fsf_-_@sdf.lonestar.org
Headers show

Message

Jakob L. Kreuze July 16, 2019, 11:46 p.m. UTC
Hi, all.

Submitting this reroll to ask for some further feedback. Here's a
summary of the more significant changes since v2:

- All of the system tests for the reconfiguration procedures have been
  implemented.
- 'upgrade-services-program' has been completely reimplemented; '(gnu
  machine ssh)' is now capable of (partially) serializing the
  <live-service> objects returned by 'current-services', so we can use
  'shepherd-service-upgrade' to traverse the service dependency graph.
- Procedures in '(guix scripts system reconfigure)' now use
  'program-file' instead of 'gexp->script'. I hadn't realized the
  difference, but this makes invocations of 'remote-eval' a bit cleaner.
- Thanks to Ludovic's patches to '(guix remote)', the reconfiguration
  procedures no longer need to capture output from the
  activation/installation scripts.
- I've removed my awful hack of a solution for handling Shepherd errors
  in 'upgrade-services-program' in favor of handling exceptions on the
  host side. I have some questions about this.
- 'upgrade-services-program' comes after 'install-bootloader-program' in
  'guix deploy' and 'guix system reconfigure' now, as it's the procedure
  most likely to fail trivially.

I still need to handle failed deployments in 'guix deploy'. I suspect
that, for now, it would make sense to implement remote roll-backs and
just roll-back the system on failure, at least until we've have some
dialog about the proper way to do atomic deployments.

My biggest concern at the moment is error handling reporting in the new
'guix system reconfigure'. I'd like to emulate what was done with the
previous version, but I'm at somewhat of a loss for how I'd go about
that, since the error reporting was mixed with the reconfiguration code.
So I'd like to ask for some suggestions: is the best way to catch errors
in '%store-monad' to do what 'with-shepherd-error-handling' does, and
then 'leave' on failure?

Ludovic suggested guarding against 'message-condition' and having the
expression I send to 'remote-eval' return either ('error message) or
('success). Would it make sense to just do this in all of the
reconfiguration procedures? Or is raising exceptions in the
reconfiguration procedures and catching them in the scripts' code the
way to go?

There's also a slight bug in the new 'guix system reconfigure' that I'll
need to figure out. At the moment, it installs a bootloader entry for
all but the newest generation.

Jakob L. Kreuze <zerodaysfordays@sdf.lonestar.org> writes:

> Noted. That should be a relatively small change, so I'll see about
> tackling that in my next revision for this series.

Oh, how naïve I was four days ago. This reroll doesn't address this.
Having the procedures "parameterized by an evaluation procedure" can be
done in so many ways, and I think it would be best I put some serious
thought into which of those ways would be the best. A 'local-eval' would
clearly be much better than what I'm doing at the present in
'system.scm', but the solution I came up with today involved three
layers of 'primitive-load', which I doubt is the way to go about it. I
had the idea to parameterize on a procedure that takes a
'<program-file>' rather than a G-Expression as I was making dinner
tonight, which seems to me like a sound idea, but we'll see if it works
tomorrow when I try to implement it.

Also, it hit me today that the safety checks done in 'guix system
reconfigure' -- 'check-mapped-devices',
'check-file-system-availability', and 'check-initrd-modules' -- should
also be done in 'guix deploy'. It might make sense for me to submit that
change as a separate patch series so the code review for this doesn't
get too complicated, but since we're on the topic of unifying the code
between 'guix deploy' and 'guix system reconfigure', should I perhaps
reimplement those procedures as '<program-file>' objects like everything
else in '(guix scripts system reconfigure)'? They aren't really
effectful, but they concern system reconfiguration.

And, on the same note, should I go ahead and refactor the rest of the
reconfiguration code in 'system.scm' out into '(guix scripts system
reconfigure)'? I mean, this will probably be a separate patch series for
the same reason that the safety checks would be a separate patch series,
and I'll likely do this _after_ I come up with a decent way to
parameterize on an evaluation procedure, but I'd like to know if it's a
good idea or not before going ahead and ripping apart 'system.scm'.

Regards, and TYIA for reviewing this.
Jakob

Jakob L. Kreuze (3):
  guix system: Add 'reconfigure' module.
  guix system: Reimplement 'reconfigure'.
  tests: Add reconfigure system test.

 Makefile.am                         |   1 +
 gnu/local.mk                        |   1 +
 gnu/machine/ssh.scm                 | 266 ++++++++++-----------------
 gnu/services/herd.scm               |   6 +
 gnu/tests/reconfigure.scm           | 268 ++++++++++++++++++++++++++++
 guix/scripts/system.scm             | 152 +++++-----------
 guix/scripts/system/reconfigure.scm | 122 +++++++++++++
 tests/services.scm                  |   4 -
 8 files changed, 538 insertions(+), 282 deletions(-)
 create mode 100644 gnu/tests/reconfigure.scm
 create mode 100644 guix/scripts/system/reconfigure.scm

Comments

Jakob L. Kreuze July 18, 2019, 10:50 p.m. UTC | #1
Hello to anyone reviewing this patch,

I probably should've held off on sending this reroll out. After taking
some more time to experiment with possible solutions, I was able to
figure most of this out. Comments would still be appreciated, but the
points I specifically asked for comments on no longer need special
treatment. Also, if you haven't already started reviewing this, v4 will
likely hit the mailing list tomorrow; everything's there, it just needs
to be cleaned up.

zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) writes:

> I still need to handle failed deployments in 'guix deploy'. I suspect
> that, for now, it would make sense to implement remote roll-backs and
> just roll-back the system on failure, at least until we've have some
> dialog about the proper way to do atomic deployments.

Well, except for this. I'll submit a separate patch series addressing
this.

> My biggest concern at the moment is error handling reporting in the
> new 'guix system reconfigure'. I'd like to emulate what was done with
> the previous version, but I'm at somewhat of a loss for how I'd go
> about that, since the error reporting was mixed with the
> reconfiguration code. So I'd like to ask for some suggestions: is the
> best way to catch errors in '%store-monad' to do what
> 'with-shepherd-error-handling' does, and then 'leave' on failure?
>
> Ludovic suggested guarding against 'message-condition' and having the
> expression I send to 'remote-eval' return either ('error message) or
> ('success). Would it make sense to just do this in all of the
> reconfiguration procedures? Or is raising exceptions in the
> reconfiguration procedures and catching them in the scripts' code the
> way to go?

Comments, if anyone has them, would be appreciated, but I feel that I'm
in a good spot in terms of error handling now.

> There's also a slight bug in the new 'guix system reconfigure' that
> I'll need to figure out. At the moment, it installs a bootloader entry
> for all but the newest generation.

It wasn't actually a bug, I was misinterpreting the intended behavior of
'guix system reconfigure'. :)

> Oh, how naïve I was four days ago. This reroll doesn't address this.
> Having the procedures "parameterized by an evaluation procedure" can
> be done in so many ways, and I think it would be best I put some
> serious thought into which of those ways would be the best. A
> 'local-eval' would clearly be much better than what I'm doing at the
> present in 'system.scm', but the solution I came up with today
> involved three layers of 'primitive-load', which I doubt is the way to
> go about it. I had the idea to parameterize on a procedure that takes
> a '<program-file>' rather than a G-Expression as I was making dinner
> tonight, which seems to me like a sound idea, but we'll see if it
> works tomorrow when I try to implement it.

Actually, a more generalized 'eval' (taking a G-Expression) was the
better way to go: it allowed me to simplify the interface to the
reconfiguration procedures even further. And, thanks to Ludovic's recent
patches with 'lower-gexp', I was able to collapse the Russian nesting
doll of 'primitive-load' calls.

> Also, it hit me today that the safety checks done in 'guix system
> reconfigure' -- 'check-mapped-devices',
> 'check-file-system-availability', and 'check-initrd-modules' -- should
> also be done in 'guix deploy'. It might make sense for me to submit that
> change as a separate patch series so the code review for this doesn't
> get too complicated, but since we're on the topic of unifying the code
> between 'guix deploy' and 'guix system reconfigure', should I perhaps
> reimplement those procedures as '<program-file>' objects like everything
> else in '(guix scripts system reconfigure)'? They aren't really
> effectful, but they concern system reconfiguration.

Again, separate patch series.

> And, on the same note, should I go ahead and refactor the rest of the
> reconfiguration code in 'system.scm' out into '(guix scripts system
> reconfigure)'? I mean, this will probably be a separate patch series for
> the same reason that the safety checks would be a separate patch series,
> and I'll likely do this _after_ I come up with a decent way to
> parameterize on an evaluation procedure, but I'd like to know if it's a
> good idea or not before going ahead and ripping apart 'system.scm'.

I'd still like comments on this, though.

Regards,
Jakob
Jakob L. Kreuze July 19, 2019, 5:54 p.m. UTC | #2
This addresses nearly everything I mentioned in my v3 cover letter;
we're now parameterizing on an 'eval' procedure and we've got error
handling where it counts.

Happy Friday!

Jakob L. Kreuze (3):
  guix system: Add 'reconfigure' module.
  guix system: Reimplement 'reconfigure'.
  tests: Add reconfigure system test.

 Makefile.am                         |   1 +
 gnu/local.mk                        |   1 +
 gnu/machine/ssh.scm                 | 189 ++------------------
 gnu/services/herd.scm               |   6 +
 gnu/tests/reconfigure.scm           | 263 ++++++++++++++++++++++++++++
 guix/scripts/system.scm             | 182 +++++--------------
 guix/scripts/system/reconfigure.scm | 241 +++++++++++++++++++++++++
 tests/services.scm                  |   4 -
 8 files changed, 563 insertions(+), 324 deletions(-)
 create mode 100644 gnu/tests/reconfigure.scm
 create mode 100644 guix/scripts/system/reconfigure.scm
Christine Lemmer-Webber July 19, 2019, 7:36 p.m. UTC | #3
Jakob L. Kreuze writes:

> Hello to anyone reviewing this patch,
>
> I probably should've held off on sending this reroll out. After taking
> some more time to experiment with possible solutions, I was able to
> figure most of this out. Comments would still be appreciated, but the
> points I specifically asked for comments on no longer need special
> treatment. Also, if you haven't already started reviewing this, v4 will
> likely hit the mailing list tomorrow; everything's there, it just needs
> to be cleaned up.
>
> zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) writes:
>
>> I still need to handle failed deployments in 'guix deploy'. I suspect
>> that, for now, it would make sense to implement remote roll-backs and
>> just roll-back the system on failure, at least until we've have some
>> dialog about the proper way to do atomic deployments.
>
> Well, except for this. I'll submit a separate patch series addressing
> this.

I think that's fine to do in a separate series, and a good idea too.

>> My biggest concern at the moment is error handling reporting in the
>> new 'guix system reconfigure'. I'd like to emulate what was done with
>> the previous version, but I'm at somewhat of a loss for how I'd go
>> about that, since the error reporting was mixed with the
>> reconfiguration code. So I'd like to ask for some suggestions: is the
>> best way to catch errors in '%store-monad' to do what
>> 'with-shepherd-error-handling' does, and then 'leave' on failure?
>>
>> Ludovic suggested guarding against 'message-condition' and having the
>> expression I send to 'remote-eval' return either ('error message) or
>> ('success). Would it make sense to just do this in all of the
>> reconfiguration procedures? Or is raising exceptions in the
>> reconfiguration procedures and catching them in the scripts' code the
>> way to go?
>
> Comments, if anyone has them, would be appreciated, but I feel that I'm
> in a good spot in terms of error handling now.

Or even:

  ('error <error-type-symbol> "error message here")

(I suppose in case of success, a value would never be returned?)

>> There's also a slight bug in the new 'guix system reconfigure' that
>> I'll need to figure out. At the moment, it installs a bootloader entry
>> for all but the newest generation.
>
> It wasn't actually a bug, I was misinterpreting the intended behavior of
> 'guix system reconfigure'. :)

Heh :)

>> Oh, how naïve I was four days ago. This reroll doesn't address this.
>> Having the procedures "parameterized by an evaluation procedure" can
>> be done in so many ways, and I think it would be best I put some
>> serious thought into which of those ways would be the best. A
>> 'local-eval' would clearly be much better than what I'm doing at the
>> present in 'system.scm', but the solution I came up with today
>> involved three layers of 'primitive-load', which I doubt is the way to
>> go about it. I had the idea to parameterize on a procedure that takes
>> a '<program-file>' rather than a G-Expression as I was making dinner
>> tonight, which seems to me like a sound idea, but we'll see if it
>> works tomorrow when I try to implement it.
>
> Actually, a more generalized 'eval' (taking a G-Expression) was the
> better way to go: it allowed me to simplify the interface to the
> reconfiguration procedures even further. And, thanks to Ludovic's recent
> patches with 'lower-gexp', I was able to collapse the Russian nesting
> doll of 'primitive-load' calls.

Yay!  Generalization!

>> Also, it hit me today that the safety checks done in 'guix system
>> reconfigure' -- 'check-mapped-devices',
>> 'check-file-system-availability', and 'check-initrd-modules' -- should
>> also be done in 'guix deploy'. It might make sense for me to submit that
>> change as a separate patch series so the code review for this doesn't
>> get too complicated, but since we're on the topic of unifying the code
>> between 'guix deploy' and 'guix system reconfigure', should I perhaps
>> reimplement those procedures as '<program-file>' objects like everything
>> else in '(guix scripts system reconfigure)'? They aren't really
>> effectful, but they concern system reconfiguration.
>
> Again, separate patch series.

Yes, please do.

My main worry is that such a patch series may be forgotten.  Would it be
inappropriate to make a "stub" patch issue for both of the followup
patch series, since both seem important and we don't want to forget them?

>> And, on the same note, should I go ahead and refactor the rest of the
>> reconfiguration code in 'system.scm' out into '(guix scripts system
>> reconfigure)'? I mean, this will probably be a separate patch series for
>> the same reason that the safety checks would be a separate patch series,
>> and I'll likely do this _after_ I come up with a decent way to
>> parameterize on an evaluation procedure, but I'd like to know if it's a
>> good idea or not before going ahead and ripping apart 'system.scm'.
>
> I'd still like comments on this, though.

I guess see above.

But I think we shouldn't wait, since I'd like to keep the energy up and
get this merged in.
 - Chris
Jakob L. Kreuze July 22, 2019, 4:18 p.m. UTC | #4
Hey, Chris!

Christopher Lemmer Webber <cwebber@dustycloud.org> writes:

> My main worry is that such a patch series may be forgotten. Would it
> be inappropriate to make a "stub" patch issue for both of the followup
> patch series, since both seem important and we don't want to forget
> them?

Alternatively, because these patches address existing issues with 'guix
deploy', should we open tickets on the issue tracker? I don't have too
much of a preference: either way should work fine for ensuring that we
don't forget about them.

Regards,
Jakob
Christine Lemmer-Webber July 22, 2019, 4:39 p.m. UTC | #5
Jakob L. Kreuze writes:

> Hey, Chris!
>
> Christopher Lemmer Webber <cwebber@dustycloud.org> writes:
>
>> My main worry is that such a patch series may be forgotten. Would it
>> be inappropriate to make a "stub" patch issue for both of the followup
>> patch series, since both seem important and we don't want to forget
>> them?
>
> Alternatively, because these patches address existing issues with 'guix
> deploy', should we open tickets on the issue tracker? I don't have too
> much of a preference: either way should work fine for ensuring that we
> don't forget about them.
>
> Regards,
> Jakob

That's a good call.  Yeah, I think put them there.