diff mbox series

[bug#54723] Check URI when verifying narinfo validity.

Message ID 87sfqfc0fq.fsf@kitej
State Accepted
Headers show
Series [bug#54723] Check URI when verifying narinfo validity. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Guillaume Le Vaillant April 14, 2022, 12:18 p.m. UTC
Guillaume Le Vaillant <glv@posteo.net> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Woow.  How do you build and run ‘guix publish’?  Is it a distro package
>> or is it coming straight from Guix?  What command-line options are you
>> passing?
>>
>> I’ve never seen this, although we have it running on several servers,
>> notably ci.guix.  I wonder what could cause this.
>>
>> Thanks,
>> Ludo’.
>
> I'm using guix-publish-service-type in the operating-system definition
> to manage the "guix publish" server, using the on-the-fly mode and
> fast Zstandard compression:
>
> (service guix-publish-service-type
>          (guix-publish-configuration
>           (host "0.0.0.0")
>           (port 8080)
>           (compression '(("zstd" 3)))
>           (advertise? #t)))
>
>
> When booting the machine, shepherd starts the server with the following
> command-line options:
>
> /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/libexec/guix/guile \ /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/bin/guix publish -u guix-publish -p 8080 -C zstd:3 --nar-path=nar --listen=0.0.0.0 --advertise
>
> There's another report about this at <https://issues.guix.gnu.org/53668>
> I had forgotten about, where Simon Streit and Maxim Cournoyer indicate
> that they have seen this issue too.

I tested a few things, and it looks like there is a concurrency issue in
the 'http-write' function of the publish server.
After removing the 'call-with-new-thread' in the part of the function
sending narinfos, I can't reproduce the "broken pipe" error anymore.

--8<---------------cut here---------------start------------->8---
--8<---------------cut here---------------end--------------->8---

Comments

Ludovic Courtès April 18, 2022, 7:39 p.m. UTC | #1
Hi!

Guillaume Le Vaillant <glv@posteo.net> skribis:

> I tested a few things, and it looks like there is a concurrency issue in
> the 'http-write' function of the publish server.
> After removing the 'call-with-new-thread' in the part of the function
> sending narinfos, I can't reproduce the "broken pipe" error anymore.

Ah ha!  Little has changed in this code, so I think the culprit must be
f743f2046be2c5a338ab871ae8666d8f6de7440b.

Could you try running it from master with this patch reverted?

You can spawn it from the build tree with:

  sudo -E ./pre-inst-env guix publish -u nobody

Bonus points if you have a script to reproduce the issue.  I guess that
involves making pipelined keep-alive narinfo requests the way ‘guix
substitute’ and ‘guix weather’ do.

Thanks,
Ludo’.
Guillaume Le Vaillant April 20, 2022, 2:10 p.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> skribis:

> Hi!
>
> Guillaume Le Vaillant <glv@posteo.net> skribis:
>
>> I tested a few things, and it looks like there is a concurrency issue in
>> the 'http-write' function of the publish server.
>> After removing the 'call-with-new-thread' in the part of the function
>> sending narinfos, I can't reproduce the "broken pipe" error anymore.
>
> Ah ha!  Little has changed in this code, so I think the culprit must be
> f743f2046be2c5a338ab871ae8666d8f6de7440b.
>
> Could you try running it from master with this patch reverted?
>
> You can spawn it from the build tree with:
>
>   sudo -E ./pre-inst-env guix publish -u nobody
>
> Bonus points if you have a script to reproduce the issue.  I guess that
> involves making pipelined keep-alive narinfo requests the way ‘guix
> substitute’ and ‘guix weather’ do.
>
> Thanks,
> Ludo’.

Hi,

With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the
issue very often when trying to upgrade a profile with around 400
packages. The logs of the publish server show hundreds of narinfo
requests and tens of "broken pipe" errors.

With an additional commit reverting
f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue
anymore. The logs still show hundreds of narinfo requests, but no
errors.


I tried to make a script making many narinfo requests to reproduce the
issue more easily, but I didn't get it to work yet. I thought accessing
the 'fetch-narinfos' function from "guix/substitutes.scm" with:

--8<---------------cut here---------------start------------->8---
(use-modules (guix)
             (guix substitutes))

(define fetch-narinfos (@@ (guix substitutes) fetch-narinfos))
--8<---------------cut here---------------end--------------->8---

would work, but it doesn't:

--8<---------------cut here---------------start------------->8---
$ guix repl fetch-narinfos.scm
Backtrace:
           4 (primitive-load "/home/guillaume/fetch-narinfos.scm")
In ice-9/eval.scm:
   626:19  3 (_ #<directory (guix-user) 7faab8e0e8c0>)
   223:20  2 (proc #<directory (guix-user) 7faab8e0e8c0>)
In unknown file:
           1 (%resolve-variable (5 (guix substitutes) # . #f) #<dire…>)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
erreur : fetch-narinfos : variable non liée
--8<---------------cut here---------------end--------------->8---
Ludovic Courtès April 29, 2022, 4:20 p.m. UTC | #3
Hi,

(+Cc: Mathieu.)

Guillaume Le Vaillant <glv@posteo.net> skribis:

> With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the
> issue very often when trying to upgrade a profile with around 400
> packages. The logs of the publish server show hundreds of narinfo
> requests and tens of "broken pipe" errors.
>
> With an additional commit reverting
> f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue
> anymore. The logs still show hundreds of narinfo requests, but no
> errors.

For the record, a simple way to reproduce it is to start ‘guix publish’:

  sudo -E ./pre-inst-env guix publish -u nobody -p 9999

and to spawn ‘guix weather’, for example with:

  guix weather $(guix package -I. -p ~/.guix-home/profile |cut -f1) \
    --substitute-urls=http://localhost:9999

which crashes with the dreaded backtrace:

--8<---------------cut here---------------start------------->8---
Backtrace:
          11 (primitive-load "/home/ludo/.config/guix/current/bin/guix")
In guix/ui.scm:
   2230:7 10 (run-guix . _)
  2193:10  9 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10  8 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/scripts/weather.scm:
    595:9  7 (_)
In guix/build/utils.scm:
   677:23  6 (every* #<procedure 7f4af9621de0 at guix/scripts/weather.scm:595:17 (server)> _)
In guix/scripts/weather.scm:
   597:21  5 (_ "http://localhost:9999")
   120:17  4 (report-server-coverage _ ("/gnu/store/428bzp9325mfapyr4ywzwsz4ic7ssx2b-shepherd-0.9.0" "/gnu/store/sll9nfmqk8lkrraqbkyp3y…" …) …)
In unknown file:
           3 (_ #<procedure 7f4af896ca40 at guix/scripts/weather.scm:201:2 ()> #<procedure list _> . #w())
In guix/substitutes.scm:
   322:31  2 (lookup-narinfos "http://localhost:9999" _ #:open-connection _ #:make-progress-reporter _)
   245:26  1 (fetch-narinfos _ _ #:open-connection _ #:make-progress-reporter _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type (expecting exact integer): #f
--8<---------------cut here---------------end--------------->8---

I looked more closely at f743f2046be2c5a338ab871ae8666d8f6de7440b and I
think it had a logical flaw: if you pipeline 100 GET requests, then,
with this commit, ‘guix publish’ would spawn 100 threads that would all
reply concurrently (more or less).  This is clearly wrong: replies
should be sent sequentially.

So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and
reverted it that commit.  I also added a test that reproduces the issue
above.


Now, commit f743f2046be2c5a338ab871ae8666d8f6de7440b was itself an
attempt to fix a bug whereby ‘narinfo-string’ would take too long,
thereby preventing ‘guix publish’ from accepting connections (since it’s
single-threaded).

I think the only reasonable way to fix it is by using Fibers to make
‘guix publish’ concurrent (using a fiberized server like in Cuirass).
We should do that at some point.  That’ll also allow us to remove some
of the hacks we’ve accumulated.

Now, the only way ‘narinfo-string’ can take too long these days is (I
think) if the store GC lock is held (we should check that hypothesis,
but I believe that if the GC lock is held, then the ‘query-path-info’
RPC made from ‘narinfo-string’ might block until the lock is released).
The GC lock is no longer held for hours on berlin, so there’s less
pressure to address that.

To summarize: I think ‘guix publish’ is okay as-is but we should
fiberize it sometime.

Thoughts?

Ludo’.
diff mbox series

Patch

diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm
index 870dfc11e9..e9fad5845a 100644
--- a/guix/scripts/publish.scm
+++ b/guix/scripts/publish.scm
@@ -1002,28 +1002,25 @@  (define compression
            (begin
              (when (keep-alive? response)
                (keep-alive client))
-             (call-with-new-thread
-              (lambda ()
-                (set-thread-name "publish narinfo")
-                (let* ((narinfo
-                        (with-store store
-                          (narinfo-string store (utf8->string body)
-                                          #:nar-path nar-path
-                                          #:compressions compressions)))
-                       (narinfo-bv (string->bytevector narinfo "UTF-8"))
-                       (narinfo-length
-                        (bytevector-length narinfo-bv))
-                       (response (write-response
-                                  (with-content-length response
-                                                       narinfo-length)
-                                  client))
-                       (output (response-port response)))
-                  (configure-socket client)
-                  (put-bytevector output narinfo-bv)
-                  (force-output output)
-                  (unless (keep-alive? response)
-                    (close-port output))
-                  (values)))))
+             (let* ((narinfo
+                     (with-store store
+                       (narinfo-string store (utf8->string body)
+                                       #:nar-path nar-path
+                                       #:compressions compressions)))
+                    (narinfo-bv (string->bytevector narinfo "UTF-8"))
+                    (narinfo-length
+                     (bytevector-length narinfo-bv))
+                    (response (write-response
+                               (with-content-length response
+                                                    narinfo-length)
+                               client))
+                    (output (response-port response)))
+               (configure-socket client)
+               (put-bytevector output narinfo-bv)
+               (force-output output)
+               (unless (keep-alive? response)
+                 (close-port output))
+               (values)))
            (%http-write server client response body))))
     (_
      (match (assoc-ref (response-headers response) 'x-raw-file)