diff mbox series

[bug#54723] Check URI when verifying narinfo validity.

Message ID 87a6czbzvh.fsf@kitej
State Accepted
Headers show
Series [bug#54723] Check URI when verifying narinfo validity. | expand

Commit Message

Guillaume Le Vaillant April 5, 2022, 9:58 a.m. UTC
When trying to upgrade a machine using a substitute server on the same
LAN, I get this crash a lot:

--8<---------------cut here---------------start------------->8---
# guix system reconfigure --substitute-urls="http://192.168.0.22:8080 https://ci.guix.gnu.org" /etc/guix/config.scm
substitute: mise à jour des substituts depuis « http://192.168.0.22:8080 »... 100.0 %
substitute: Backtrace:
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In unknown file:
substitute:           16 (apply-smob/0 #<thunk 7fe08afb72e0>)
substitute: In ice-9/boot-9.scm:
substitute:     724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
substitute: In ice-9/eval.scm:
substitute:     619:8 14 (_ #(#(#<directory (guile-user) 7fe08afbcc80>)))
substitute: In guix/ui.scm:
substitute:    2209:7 13 (run-guix . _)
substitute:   2172:10 12 (run-guix-command _ . _)
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
substitute:   1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In guix/scripts/substitute.scm:
substitute:    757:18  9 (_)
substitute:    348:26  8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
substitute: In guix/substitutes.scm:
substitute:    369:45  7 (lookup-narinfos/diverse _ _ #<procedure 7fe088c9cbc0 …> …)
substitute: In unknown file:
substitute:            6 (filter #<procedure 7fe088c9cbc0 at guix/scripts/subst…> …)
substitute: In guix/narinfo.scm:
substitute:    215:32  5 (valid-narinfo? _ _ #:verbose? _)
substitute: In ice-9/boot-9.scm:
substitute:   1685:16  4 (raise-exception _ #:continuable? _)
substitute:   1685:16  3 (raise-exception _ #:continuable? _)
substitute:   1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail…>)
substitute:   1685:16  1 (raise-exception _ #:continuable? _)
substitute:   1685:16  0 (raise-exception _ #:continuable? _)
substitute: 
substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
substitute: In procedure car: Wrong type argument in position 1 (expecting pair): ()
guix system: erreur : `/gnu/store/wgygsxcdy1z3pfvwhpgyl5vjp4xvwhhh-guix-1.3.0-23.a27e47f/bin/guix substitute' died unexpectedly
--8<---------------cut here---------------end--------------->8---

It looks like the 'narinfo-uri' field is an empty list instead of a list
of URIs. Is that supposed to be possible?
Does the the attached patch adding a check for the validity of
this field in the 'valid-narinfo?' function make sense?


The substitute server configuration is:

--8<---------------cut here---------------start------------->8---
(service guix-publish-service-type
         (guix-publish-configuration
          (host "0.0.0.0")
          (port 8080)
          (compression '(("zstd" 3)))
          (advertise? #t)))
--8<---------------cut here---------------end--------------->8---

Comments

Ludovic Courtès April 5, 2022, 5:08 p.m. UTC | #1
Hi,

Guillaume Le Vaillant <glv@posteo.net> skribis:

> When trying to upgrade a machine using a substitute server on the same
> LAN, I get this crash a lot:
>
> # guix system reconfigure --substitute-urls="http://192.168.0.22:8080 https://ci.guix.gnu.org" /etc/guix/config.scm
> substitute: mise à jour des substituts depuis « http://192.168.0.22:8080 »... 100.0 %

[...]

> It looks like the 'narinfo-uri' field is an empty list instead of a list
> of URIs. Is that supposed to be possible?

I don’t think so.  Could you grab a narinfo and share it?

  wget -qO - http://192.168.0.22:8080/HASH.narinfo

where HASH is the hash part of a store item.

What could happen though is a situation where ‘guix publish’ only offers
a compression method not supported by the client.  In that case,
‘narinfo-best-uri’ throws a match-error because ‘choices’ is the empty
list.  We should fix that.

> Does the the attached patch adding a check for the validity of
> this field in the 'valid-narinfo?' function make sense?

Maybe, but I’d like to make sure we understand the issue.

Thanks,
Ludo’.
Guillaume Le Vaillant April 5, 2022, 5:51 p.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> skribis:

> Hi,
>
> [...]
>
> Could you grab a narinfo and share it?
>
>   wget -qO - http://192.168.0.22:8080/HASH.narinfo
>
> where HASH is the hash part of a store item.
>
> What could happen though is a situation where ‘guix publish’ only offers
> a compression method not supported by the client.  In that case,
> ‘narinfo-best-uri’ throws a match-error because ‘choices’ is the empty
> list.  We should fix that.

I tried downloading a few narinfos and they don't look broken (2 of them
in attachment).

However for one of them it looks like the guix-publish server got stuck
on the request for several minutes (the second attempt worked):

--8<---------------cut here---------------start------------->8---
wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
--2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
Connexion à 192.168.0.22:8080… connecté.
requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
Nouvel essai.

--2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
Connexion à 192.168.0.22:8080… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 7428 (7,3K) [application/x-nix-narinfo]
Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »

184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo        100%[=====================================================================================================>]   7,25K  --.-KB/s    ds 0,02s   

2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
--8<---------------cut here---------------end--------------->8---

Could that be the cause of the issue?
Ludovic Courtès April 9, 2022, 8:32 p.m. UTC | #3
Hi,

Guillaume Le Vaillant <glv@posteo.net> skribis:

> However for one of them it looks like the guix-publish server got stuck
> on the request for several minutes (the second attempt worked):
>
> wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> --2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> Connexion à 192.168.0.22:8080… connecté.
> requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
> Nouvel essai.
>
> --2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> Connexion à 192.168.0.22:8080… connecté.
> requête HTTP transmise, en attente de la réponse… 200 OK
> Taille : 7428 (7,3K) [application/x-nix-narinfo]
> Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »
>
> 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo        100%[=====================================================================================================>]   7,25K  --.-KB/s    ds 0,02s   
>
> 2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
>
> Could that be the cause of the issue?

Yes, it could be the issue.  Is there any clue as to why ‘guix publish’
hung up first?  Something in its log maybe?

Ludo’.
Guillaume Le Vaillant April 9, 2022, 9:06 p.m. UTC | #4
Ludovic Courtès <ludo@gnu.org> skribis:

> Hi,
>
> Guillaume Le Vaillant <glv@posteo.net> skribis:
>
>> However for one of them it looks like the guix-publish server got stuck
>> on the request for several minutes (the second attempt worked):
>>
>> wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> --2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> Connexion à 192.168.0.22:8080… connecté.
>> requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
>> Nouvel essai.
>>
>> --2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> Connexion à 192.168.0.22:8080… connecté.
>> requête HTTP transmise, en attente de la réponse… 200 OK
>> Taille : 7428 (7,3K) [application/x-nix-narinfo]
>> Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »
>>
>> 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> 100%[=====================================================================================================>]
>> 7,25K --.-KB/s ds 0,02s
>>
>> 2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
>>
>> Could that be the cause of the issue?
>
> Yes, it could be the issue.  Is there any clue as to why ‘guix publish’
> hung up first?  Something in its log maybe?
>
> Ludo’.

There are 2 errors that occur a lot in the guix-publish log files:

 - "In procedure fport_write: Broken pipe"
    It happens when trying to write to a socket apparently.

 - "In procedure sign: gcrypt: Cannot allocate memory"
    The machine has 64 GiB of RAM, of which at least 50 GiB is free, so
    gcrypt should have enough to make a signature...

Note: full log file with complete error messages in attachment.
Guillaume Le Vaillant April 11, 2022, 1:31 p.m. UTC | #5
Guillaume Le Vaillant <glv@posteo.net> skribis:

> There are 2 errors that occur a lot in the guix-publish log files:
>
>  - "In procedure fport_write: Broken pipe"
>     It happens when trying to write to a socket apparently.
>
>  - "In procedure sign: gcrypt: Cannot allocate memory"
>     The machine has 64 GiB of RAM, of which at least 50 GiB is free, so
>     gcrypt should have enough to make a signature...

I captured the network traffic between the "guix publish" server and
a "guix upgrade" client to see if the "broken pipe" errors could
come from real networking issues.
According to wireshark the problem doesn't come from there, the TCP
stream didn't have any error.

However, looking at the full TCP stream in wireshark I saw that the
"guix publish" server sends some bad narinfo responses.
Sometimes some parts of the response are missing (here, Signature
incomplete, URL and Compression fields missing):
--8<---------------cut here---------------start------------->8---
HTTP/1.1 200 OK
Content-Length: 959
Content-Type: application/x-nix-narinfo;charset=UTF-8

StorePath: /gnu/store/dxpaqmix7zixm8pwcvvmq8q969q50jpp-pngload-2.0.0-2.91f1d70-checkout
NarHash: sha256:0s94fdbrbqj12qvgyn2g4lfwvz7qhhzbclrpz5ni7adwxgrmvxl1
NarSize: 245224
References: 
Deriver: ybdimrfjs090kzmimf5j1x5hs8y4d24p-pngload-2.0.0-2.91f1d70-checkout.drv
Signature: 1;kitej;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjNDY3NDk2RTJEOTZBMzc0QzFGN0M1MzJCNjc3MTM1NzVFOTkyRjQ0Qzc3MzQwRDUwQTcyRTkyMDJGRURDQkQxMyMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDZEQTAwMkQyNjE3MEQ3ODVDNkM3NkMyMUEwM0UzNDlCMkUwMDc4MTUyQzFBQURFNjhFMEZGOUJDRkUyMUFDNSMpCiAgIChzICMwNjNDM0UyNjg2MEU2OTIzNDdEMjNGNTQ4RUM3RDJGRUZGQjc0Q0I4NjNEMjlDMUE3QjA4REFCQjEzQjZDRjAxIykKICAgKQogICkKIChwdWJsaWMta2V5IAogIC
--8<---------------cut here---------------end--------------->8---

Sometimes the response looks like almost complete garbage:
--8<---------------cut here---------------start------------->8---
HTTP/1.1 200 OK
Content-Length: 970
Content-Type: application/x-nix-narinfo;charsetcharsetHTTP/=UTF-8

1
1

1
.S
--8<---------------cut here---------------end--------------->8---

When the client receives these bad narinfos, it often makes it crash
with errors like:
 - Wrong type (expecting exact integer): #f
 - unmatched line "1\r"
 - Wrong type argument in position 1 (expecting pair): ()

So it looks like the broken pipe problem comes from the "guix publish"
server, or from Guile... And making the code reconstructing narinfos
from HTTP responses more robust in case of bad input would be useful.
Ludovic Courtès April 12, 2022, 7:47 a.m. UTC | #6
Hi,

Guillaume Le Vaillant <glv@posteo.net> skribis:

> However, looking at the full TCP stream in wireshark I saw that the
> "guix publish" server sends some bad narinfo responses.
> Sometimes some parts of the response are missing (here, Signature
> incomplete, URL and Compression fields missing):
>
> HTTP/1.1 200 OK
> Content-Length: 959
> Content-Type: application/x-nix-narinfo;charset=UTF-8
>
> StorePath: /gnu/store/dxpaqmix7zixm8pwcvvmq8q969q50jpp-pngload-2.0.0-2.91f1d70-checkout
> NarHash: sha256:0s94fdbrbqj12qvgyn2g4lfwvz7qhhzbclrpz5ni7adwxgrmvxl1
> NarSize: 245224
> References: 
> Deriver: ybdimrfjs090kzmimf5j1x5hs8y4d24p-pngload-2.0.0-2.91f1d70-checkout.drv
> Signature: 1;kitej;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjNDY3NDk2RTJEOTZBMzc0QzFGN0M1MzJCNjc3MTM1NzVFOTkyRjQ0Qzc3MzQwRDUwQTcyRTkyMDJGRURDQkQxMyMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDZEQTAwMkQyNjE3MEQ3ODVDNkM3NkMyMUEwM0UzNDlCMkUwMDc4MTUyQzFBQURFNjhFMEZGOUJDRkUyMUFDNSMpCiAgIChzICMwNjNDM0UyNjg2MEU2OTIzNDdEMjNGNTQ4RUM3RDJGRUZGQjc0Q0I4NjNEMjlDMUE3QjA4REFCQjEzQjZDRjAxIykKICAgKQogICkKIChwdWJsaWMta2V5IAogIC
>
>
> Sometimes the response looks like almost complete garbage:
>
> HTTP/1.1 200 OK
> Content-Length: 970
> Content-Type: application/x-nix-narinfo;charsetcharsetHTTP/=UTF-8
>
> 1
> 1
>
> 1
> .S
>
> When the client receives these bad narinfos, it often makes it crash
> with errors like:
>  - Wrong type (expecting exact integer): #f
>  - unmatched line "1\r"
>  - Wrong type argument in position 1 (expecting pair): ()

Woow.  How do you build and run ‘guix publish’?  Is it a distro package
or is it coming straight from Guix?  What command-line options are you
passing?

I’ve never seen this, although we have it running on several servers,
notably ci.guix.  I wonder what could cause this.

Thanks,
Ludo’.
Guillaume Le Vaillant April 12, 2022, 8:54 a.m. UTC | #7
Ludovic Courtès <ludo@gnu.org> skribis:

> Woow.  How do you build and run ‘guix publish’?  Is it a distro package
> or is it coming straight from Guix?  What command-line options are you
> passing?
>
> I’ve never seen this, although we have it running on several servers,
> notably ci.guix.  I wonder what could cause this.
>
> Thanks,
> Ludo’.

I'm using guix-publish-service-type in the operating-system definition
to manage the "guix publish" server, using the on-the-fly mode and
fast Zstandard compression:

--8<---------------cut here---------------start------------->8---
(service guix-publish-service-type
         (guix-publish-configuration
          (host "0.0.0.0")
          (port 8080)
          (compression '(("zstd" 3)))
          (advertise? #t)))
--8<---------------cut here---------------end--------------->8---

When booting the machine, shepherd starts the server with the following
command-line options:

--8<---------------cut here---------------start------------->8---
/gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/libexec/guix/guile \ /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/bin/guix publish -u guix-publish -p 8080 -C zstd:3 --nar-path=nar --listen=0.0.0.0 --advertise
--8<---------------cut here---------------end--------------->8---

There's another report about this at <https://issues.guix.gnu.org/53668>
I had forgotten about, where Simon Streit and Maxim Cournoyer indicate
that they have seen this issue too.
diff mbox series

Patch

From 8d9a45b2f38809fb3acfacf6f83532b7b556e78c Mon Sep 17 00:00:00 2001
From: Guillaume Le Vaillant <glv@posteo.net>
Date: Tue, 5 Apr 2022 11:50:48 +0200
Subject: [PATCH] narinfo: Check URI when verifying narinfo validity.

* guix/narinfo.scm (valid-narinfo?): Check if the 'uri' field is valid.
---
 guix/narinfo.scm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/guix/narinfo.scm b/guix/narinfo.scm
index 4fc550aa6c..466ce20deb 100644
--- a/guix/narinfo.scm
+++ b/guix/narinfo.scm
@@ -209,11 +209,13 @@  (define %mandatory-fields
 
 (define* (valid-narinfo? narinfo #:optional (acl (current-acl))
                          #:key verbose?)
-  "Return #t if NARINFO's signature is not valid."
+  "Return #t if NARINFO's signature is valid."
   (let ((hash      (narinfo-sha256 narinfo))
         (signature (narinfo-signature narinfo))
-        (uri       (uri->string (first (narinfo-uris narinfo)))))
-    (and hash signature
+        (uri       (if (null? (narinfo-uris narinfo))
+                       #f
+                       (uri->string (first (narinfo-uris narinfo))))))
+    (and hash signature uri
          (signature-case (signature hash acl)
            (valid-signature #t)
            (invalid-signature
-- 
2.35.1