mbox

[bug#37083,0/1] (Help needed!) machine: Implement 'digital-ocean-environment-type'.

Message ID 87ftlxf6q3.fsf@sdf.lonestar.org
Headers show

Message

Jakob L. Kreuze Aug. 19, 2019, 4:41 p.m. UTC
Hi all,

I've spent the past couple of days attempting to add rudimentary support
to 'guix deploy' for some more complicated use-cases. I think I've made
some decent progress, but I've reached a point where I'm having an issue
that's beyond my abilities.

'deploy-digital-ocean' gets to a point where there's a droplet running a
"bootstrap" configuration of the Guix System, but I can't keep an open
SSH channel for sending over the operating-system configuration
specified for the deployment.
sending 3 store items (0 MiB) to '167.71.253.223'...
;;; [2019/08/19 12:21:33.409456, 0] write_to_channel_port: [GSSH ERROR] Remote channel is closed: #<input-output: channel (open) d3b2e0>
Backtrace:
In ice-9/eval.scm:
    619:8 19 (_ #(#(#<directory (guile-user) e17140>)))
In guix/ui.scm:
  1692:12 18 (run-guix-command _ . _)
In guix/store.scm:
   623:10 17 (call-with-store _)
In srfi/srfi-1.scm:
    640:9 16 (for-each #<procedure 48d21c0 at guix/scripts/deploy.s…> …)
In guix/scripts/deploy.scm:
    96:20 15 (_ _)
In ice-9/boot-9.scm:
    829:9 14 (catch _ _ #<procedure 48d4980 at guix/scripts/deploy.…> …)
In guix/store.scm:
  1803:24 13 (run-with-store #<store-connection 256.99 43d6420> _ # _ …)
In unknown file:
          12 (_ #<procedure 48fe260 at ice-9/eval.scm:330:13 ()> #<…> …)
          11 (_ #<procedure 4975a20 at ice-9/eval.scm:330:13 ()> #<…> …)
          10 (_ #<procedure 4975840 at ice-9/eval.scm:330:13 ()> #<…> …)
In guix/monads.scm:
    482:9  9 (_ _)
In unknown file:
           8 (_ #<procedure 4975660 at ice-9/eval.scm:330:13 ()> #<…> …)
In guix/remote.scm:
   134:10  7 (_ _)
In guix/store.scm:
  1696:38  6 (_ #<store-connection 256.99 3606720>)
In guix/ssh.scm:
    358:4  5 (send-files #<store-connection 256.99 3606720> _ _ # _ # …)
In guix/store.scm:
  1568:12  4 (export-paths #<store-connection 256.99 3606720> _ #<i…> …)
  1548:22  3 (export-path #<store-connection 256.99 3606720> _ #<in…> …)
   697:13  2 (process-stderr _ _)
   660:10  1 (dump-port #<input-output: socket 15> #<input-output: …> …)
In unknown file:
           0 (put-bytevector #<input-output: channel (open) d3b2e0> # …)

ERROR: In procedure put-bytevector:
Throw to key `guile-ssh-error' with args `("write_to_channel_port" "Remote channel is closed" #<input-output: channel (open) d3b2e0> #f)'.
I can connect to the droplet over SSH, but trying to manually deploy to
the droplet with 'managed-host-environment-type' fails with the same
error. I am still able to deploy to my various Guix QEMU guests using
'managed-host-environment-type' without fail -- this seems to be
specific to Digital Ocean droplets running this configuration.
(use-modules (gnu))
(use-service-modules networking ssh)

(operating-system
  (host-name "gnu-bootstrap")
  (timezone "Etc/UTC")
  (bootloader (bootloader-configuration
               (bootloader grub-bootloader)
               (target "/dev/vda")
               (terminal-outputs '(console))))
  (file-systems (cons (file-system
                        (mount-point "/")
                        (device "/dev/vda1")
                        (type "ext4"))
                      %base-file-systems))
  (services
   (append (list (static-networking-service "eth0" "~a"
                    #:netmask "~a"
                    #:gateway "~a"
                    #:name-servers '("84.200.69.80" "84.200.70.40"))
                 (service openssh-service-type
                          (openssh-configuration
                           (permit-root-login 'without-password))))
           %base-services)))
I suspect there may an issue with the configuration of the bootstrap
system's SSH daemon, but the logs are devoid of anything particularly
telling. If anyone is willing to offer up their knowledge of SSH to
suggest what could be going wrong, I would appreciate it greatly.

Thank you,
Jakob

Jakob L. Kreuze (1):
  machine: Implement 'digital-ocean-environment-type'.

 doc/guix.texi                 |  21 +-
 gnu/local.mk                  |   1 +
 gnu/machine/digital-ocean.scm | 409 ++++++++++++++++++++++++++++++++++
 3 files changed, 428 insertions(+), 3 deletions(-)
 create mode 100644 gnu/machine/digital-ocean.scm

Comments

Ludovic Courtès Aug. 27, 2019, 10:38 a.m. UTC | #1
Hi Jakob,

Nice that you’re working on Digital Ocean support!

zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) skribis:

> 'deploy-digital-ocean' gets to a point where there's a droplet running a
> "bootstrap" configuration of the Guix System, but I can't keep an open
> SSH channel for sending over the operating-system configuration
> specified for the deployment.

[...]

>   (services
>    (append (list (static-networking-service "eth0" "~a"
>                     #:netmask "~a"
>                     #:gateway "~a"
>                     #:name-servers '("84.200.69.80" "84.200.70.40"))
>                  (service openssh-service-type
>                           (openssh-configuration
>                            (permit-root-login 'without-password))))
>            %base-services)))

Could you add (log-level 'debug) to ‘openssh-configuration’, then try
again ‘guix deploy’, and finally grab the OpenSSH log from that machine?
That would allow us to see if there’s something wrong with SSH.

Hmm now that I think about it, ‘send-files’ may be failing because the
(guix …) modules aren’t in GUILE_LOAD_PATH on the remote side.  On the
berlin build machines, we have this:

  (simple-service 'guile-load-path-in-global-env
                  session-environment-service-type
                  `(("GUILE_LOAD_PATH"
                     . "/run/current-system/profile/share/guile/site/2.2")
                    ("GUILE_LOAD_COMPILED_PATH"
                     . ,(string-append "/run/current-system/profile/lib/guile/2.2/site-ccache:"
                                       "/run/current-system/profile/share/guile/site/2.2"))))

It’s ridiculous that we have to do this, but that’s how it is.

Can you try that?

HTH,
Ludo’.
Ludovic Courtès Sept. 4, 2019, 12:08 p.m. UTC | #2
Hi Jakob,

Did you have a chance to try this out?

Thanks,
Ludo’.

Ludovic Courtès <ludo@gnu.org> skribis:

> Hi Jakob,
>
> Nice that you’re working on Digital Ocean support!
>
> zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) skribis:
>
>> 'deploy-digital-ocean' gets to a point where there's a droplet running a
>> "bootstrap" configuration of the Guix System, but I can't keep an open
>> SSH channel for sending over the operating-system configuration
>> specified for the deployment.
>
> [...]
>
>>   (services
>>    (append (list (static-networking-service "eth0" "~a"
>>                     #:netmask "~a"
>>                     #:gateway "~a"
>>                     #:name-servers '("84.200.69.80" "84.200.70.40"))
>>                  (service openssh-service-type
>>                           (openssh-configuration
>>                            (permit-root-login 'without-password))))
>>            %base-services)))
>
> Could you add (log-level 'debug) to ‘openssh-configuration’, then try
> again ‘guix deploy’, and finally grab the OpenSSH log from that machine?
> That would allow us to see if there’s something wrong with SSH.
>
> Hmm now that I think about it, ‘send-files’ may be failing because the
> (guix …) modules aren’t in GUILE_LOAD_PATH on the remote side.  On the
> berlin build machines, we have this:
>
>   (simple-service 'guile-load-path-in-global-env
>                   session-environment-service-type
>                   `(("GUILE_LOAD_PATH"
>                      . "/run/current-system/profile/share/guile/site/2.2")
>                     ("GUILE_LOAD_COMPILED_PATH"
>                      . ,(string-append "/run/current-system/profile/lib/guile/2.2/site-ccache:"
>                                        "/run/current-system/profile/share/guile/site/2.2"))))
>
> It’s ridiculous that we have to do this, but that’s how it is.
>
> Can you try that?
>
> HTH,
> Ludo’.
Jakob L. Kreuze Sept. 5, 2019, 2:15 p.m. UTC | #3
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Did you have a chance to try this out?

So sorry about this -- I've been busy moving in for fall semester and
the little bit of time I had to work on this was spent migrating the
code to the newer guile-json API. I will have some time this weekend to
see if it fixes the issue.

Regards,
Jakob
Jakob L. Kreuze Sept. 7, 2019, 8:10 p.m. UTC | #4
zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) writes:

> So sorry about this -- I've been busy moving in for fall semester and
> the little bit of time I had to work on this was spent migrating the
> code to the newer guile-json API. I will have some time this weekend to
> see if it fixes the issue.

Indeed, it does :)  Now, to fix the other issues with this. I'm getting a
"more than one target service of type 'shepherd-root'" error, which is
unusual. I'll investigate further.

Regards,
Jakob
Ludovic Courtès Sept. 8, 2019, 7:37 p.m. UTC | #5
Hi Jakob,

zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) skribis:

> Indeed, it does :)

Yay!

> Now, to fix the other issues with this. I'm getting a "more than one
> target service of type 'shepherd-root'" error, which is unusual. I'll
> investigate further.

Presumably there’s more than one service of type
‘shepherd-root-service-type’ in the ‘services’ field?  Let me know if I
can help.

Good luck with your other endeavors!

Thanks,
Ludo’.
Jakob L. Kreuze Sept. 21, 2019, 8:56 p.m. UTC | #6
Hey Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Presumably there’s more than one service of type
> ‘shepherd-root-service-type’ in the ‘services’ field? Let me know if I
> can help.

Sorry about how long this has been taking, I've been plucking away at it
on the weekends, but I've reached the point where I have to admit that
I'm stuck and I really need help if I'm ever going to finish this.

I have this procedure to create a static networking service for the
Digital Ocean droplet based on an API response:

(define (add-static-networking target network)
  "Return an <operating-system> based on TARGET with a static networking
configuration for the public IPv4 network described by the alist NETWORK."
  (operating-system
    (inherit (machine-operating-system target))
    (services (cons (static-networking-service "eth0"
                        (assoc-ref network "ip_address")
                        #:netmask (assoc-ref network "netmask")
                        #:gateway (assoc-ref network "gateway")
                        #:name-servers '("84.200.69.80" "84.200.70.40"))
                    (operating-system-services
                     (machine-operating-system target))))))

And when this operating system is deployed with the basic SSH
environment-type, I get the following backtrace:
Backtrace:
           6 (apply-smob/1 #<catch-closure 23ab600>)
In ice-9/boot-9.scm:
    705:2  5 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8  4 (_ #(#(#<directory (guile-user) 24a1140>)))
In guix/ui.scm:
  1692:12  3 (run-guix-command _ . _)
In guix/store.scm:
   623:10  2 (call-with-store _)
In srfi/srfi-1.scm:
    640:9  1 (for-each #<procedure 4fbf800 at guix/scripts/deploy.s…> …)
In guix/scripts/deploy.scm:
    96:20  0 (_ _)

guix/scripts/deploy.scm:96:20: Throw to key `srfi-34' with args `(#<condition %compound [service: #<<service> type: #<service-type openssh 4246960> value: #<<openssh-configuration> openssh: #<package openssh@8.0p1 gnu/packages/ssh.scm:165 3315210> pid-file: "/var/run/sshd.pid" port-number: 22 permit-root-login: #t allow-empty-passwords?: #f password-authentication?: #t public-key-authentication?: #t x11-forwarding?: #f allow-agent-forwarding?: #t allow-tcp-forwarding?: #t gateway-ports?: #f challenge-response-authentication?: #f use-pam?: #t print-last-log?: #t subsystems: (("sftp" "internal-sftp")) accepted-environment: () log-level: info extra-content: "" authorized-keys: () %auto-start?: #t>> target-type: #<service-type shepherd-root 2c4ac30> message: "more than one target service of type 'shepherd-root'"] 5579510>)'.
I have no idea where to begin with this. Why would the OpenSSH service
be giving me this "more than one target service of type 'shepherd-root'"
error?

Regards,
Jakob
Ludovic Courtès Sept. 23, 2019, 8:24 a.m. UTC | #7
Hi Jakob!

zerodaysfordays@sdf.lonestar.org (Jakob L. Kreuze) skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Presumably there’s more than one service of type
>> ‘shepherd-root-service-type’ in the ‘services’ field? Let me know if I
>> can help.
>
> Sorry about how long this has been taking, I've been plucking away at it
> on the weekends, but I've reached the point where I have to admit that
> I'm stuck and I really need help if I'm ever going to finish this.
>
> I have this procedure to create a static networking service for the
> Digital Ocean droplet based on an API response:
>
> (define (add-static-networking target network)
>   "Return an <operating-system> based on TARGET with a static networking
> configuration for the public IPv4 network described by the alist NETWORK."
>   (operating-system
>     (inherit (machine-operating-system target))
>     (services (cons (static-networking-service "eth0"
>                         (assoc-ref network "ip_address")
>                         #:netmask (assoc-ref network "netmask")
>                         #:gateway (assoc-ref network "gateway")
>                         #:name-servers '("84.200.69.80" "84.200.70.40"))
>                     (operating-system-services
>                      (machine-operating-system target))))))

Oooh, got it: right above, you should call
‘operating-system-user-services’, not ‘operating-system-services’.

The latter includes “essential” services like ‘etc’ and ‘shepherd-root’,
which is why we’d end up with two copies of each of these.

Admittedly quite error-prone!

Let me know if there are other stumbling blocks.  I look forward to
seeing Digital Ocean support in ‘guix deploy’!

Thanks,
Ludo’.
Jakob L. Kreuze Sept. 28, 2019, 8:46 p.m. UTC | #8
Ludovic Courtès <ludo@gnu.org> writes:

> Oooh, got it: right above, you should call
> ‘operating-system-user-services’, not ‘operating-system-services’.
>
> The latter includes “essential” services like ‘etc’ and ‘shepherd-root’,
> which is why we’d end up with two copies of each of these.
>
> Admittedly quite error-prone!

Ah, thank you. I feel like I've been bitten by that once before and just
forgot.

> Let me know if there are other stumbling blocks.  I look forward to
> seeing Digital Ocean support in ‘guix deploy’!

With that, I think we've got working support for Digital Ocean :)  Patch
to follow.

Regards,
Jakob