Message ID | 87sfqfc0fq.fsf@kitej |
---|---|
State | Accepted |
Headers | show |
Series | [bug#54723] Check URI when verifying narinfo validity. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
Hi! Guillaume Le Vaillant <glv@posteo.net> skribis: > I tested a few things, and it looks like there is a concurrency issue in > the 'http-write' function of the publish server. > After removing the 'call-with-new-thread' in the part of the function > sending narinfos, I can't reproduce the "broken pipe" error anymore. Ah ha! Little has changed in this code, so I think the culprit must be f743f2046be2c5a338ab871ae8666d8f6de7440b. Could you try running it from master with this patch reverted? You can spawn it from the build tree with: sudo -E ./pre-inst-env guix publish -u nobody Bonus points if you have a script to reproduce the issue. I guess that involves making pipelined keep-alive narinfo requests the way ‘guix substitute’ and ‘guix weather’ do. Thanks, Ludo’.
Ludovic Courtès <ludo@gnu.org> skribis: > Hi! > > Guillaume Le Vaillant <glv@posteo.net> skribis: > >> I tested a few things, and it looks like there is a concurrency issue in >> the 'http-write' function of the publish server. >> After removing the 'call-with-new-thread' in the part of the function >> sending narinfos, I can't reproduce the "broken pipe" error anymore. > > Ah ha! Little has changed in this code, so I think the culprit must be > f743f2046be2c5a338ab871ae8666d8f6de7440b. > > Could you try running it from master with this patch reverted? > > You can spawn it from the build tree with: > > sudo -E ./pre-inst-env guix publish -u nobody > > Bonus points if you have a script to reproduce the issue. I guess that > involves making pipelined keep-alive narinfo requests the way ‘guix > substitute’ and ‘guix weather’ do. > > Thanks, > Ludo’. Hi, With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the issue very often when trying to upgrade a profile with around 400 packages. The logs of the publish server show hundreds of narinfo requests and tens of "broken pipe" errors. With an additional commit reverting f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue anymore. The logs still show hundreds of narinfo requests, but no errors. I tried to make a script making many narinfo requests to reproduce the issue more easily, but I didn't get it to work yet. I thought accessing the 'fetch-narinfos' function from "guix/substitutes.scm" with: --8<---------------cut here---------------start------------->8--- (use-modules (guix) (guix substitutes)) (define fetch-narinfos (@@ (guix substitutes) fetch-narinfos)) --8<---------------cut here---------------end--------------->8--- would work, but it doesn't: --8<---------------cut here---------------start------------->8--- $ guix repl fetch-narinfos.scm Backtrace: 4 (primitive-load "/home/guillaume/fetch-narinfos.scm") In ice-9/eval.scm: 626:19 3 (_ #<directory (guix-user) 7faab8e0e8c0>) 223:20 2 (proc #<directory (guix-user) 7faab8e0e8c0>) In unknown file: 1 (%resolve-variable (5 (guix substitutes) # . #f) #<dire…>) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: erreur : fetch-narinfos : variable non liée --8<---------------cut here---------------end--------------->8---
Hi, (+Cc: Mathieu.) Guillaume Le Vaillant <glv@posteo.net> skribis: > With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the > issue very often when trying to upgrade a profile with around 400 > packages. The logs of the publish server show hundreds of narinfo > requests and tens of "broken pipe" errors. > > With an additional commit reverting > f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue > anymore. The logs still show hundreds of narinfo requests, but no > errors. For the record, a simple way to reproduce it is to start ‘guix publish’: sudo -E ./pre-inst-env guix publish -u nobody -p 9999 and to spawn ‘guix weather’, for example with: guix weather $(guix package -I. -p ~/.guix-home/profile |cut -f1) \ --substitute-urls=http://localhost:9999 which crashes with the dreaded backtrace: --8<---------------cut here---------------start------------->8--- Backtrace: 11 (primitive-load "/home/ludo/.config/guix/current/bin/guix") In guix/ui.scm: 2230:7 10 (run-guix . _) 2193:10 9 (run-guix-command _ . _) In ice-9/boot-9.scm: 1752:10 8 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _) In guix/scripts/weather.scm: 595:9 7 (_) In guix/build/utils.scm: 677:23 6 (every* #<procedure 7f4af9621de0 at guix/scripts/weather.scm:595:17 (server)> _) In guix/scripts/weather.scm: 597:21 5 (_ "http://localhost:9999") 120:17 4 (report-server-coverage _ ("/gnu/store/428bzp9325mfapyr4ywzwsz4ic7ssx2b-shepherd-0.9.0" "/gnu/store/sll9nfmqk8lkrraqbkyp3y…" …) …) In unknown file: 3 (_ #<procedure 7f4af896ca40 at guix/scripts/weather.scm:201:2 ()> #<procedure list _> . #w()) In guix/substitutes.scm: 322:31 2 (lookup-narinfos "http://localhost:9999" _ #:open-connection _ #:make-progress-reporter _) 245:26 1 (fetch-narinfos _ _ #:open-connection _ #:make-progress-reporter _) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: Wrong type (expecting exact integer): #f --8<---------------cut here---------------end--------------->8--- I looked more closely at f743f2046be2c5a338ab871ae8666d8f6de7440b and I think it had a logical flaw: if you pipeline 100 GET requests, then, with this commit, ‘guix publish’ would spawn 100 threads that would all reply concurrently (more or less). This is clearly wrong: replies should be sent sequentially. So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and reverted it that commit. I also added a test that reproduces the issue above. Now, commit f743f2046be2c5a338ab871ae8666d8f6de7440b was itself an attempt to fix a bug whereby ‘narinfo-string’ would take too long, thereby preventing ‘guix publish’ from accepting connections (since it’s single-threaded). I think the only reasonable way to fix it is by using Fibers to make ‘guix publish’ concurrent (using a fiberized server like in Cuirass). We should do that at some point. That’ll also allow us to remove some of the hacks we’ve accumulated. Now, the only way ‘narinfo-string’ can take too long these days is (I think) if the store GC lock is held (we should check that hypothesis, but I believe that if the GC lock is held, then the ‘query-path-info’ RPC made from ‘narinfo-string’ might block until the lock is released). The GC lock is no longer held for hours on berlin, so there’s less pressure to address that. To summarize: I think ‘guix publish’ is okay as-is but we should fiberize it sometime. Thoughts? Ludo’.
diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm index 870dfc11e9..e9fad5845a 100644 --- a/guix/scripts/publish.scm +++ b/guix/scripts/publish.scm @@ -1002,28 +1002,25 @@ (define compression (begin (when (keep-alive? response) (keep-alive client)) - (call-with-new-thread - (lambda () - (set-thread-name "publish narinfo") - (let* ((narinfo - (with-store store - (narinfo-string store (utf8->string body) - #:nar-path nar-path - #:compressions compressions))) - (narinfo-bv (string->bytevector narinfo "UTF-8")) - (narinfo-length - (bytevector-length narinfo-bv)) - (response (write-response - (with-content-length response - narinfo-length) - client)) - (output (response-port response))) - (configure-socket client) - (put-bytevector output narinfo-bv) - (force-output output) - (unless (keep-alive? response) - (close-port output)) - (values))))) + (let* ((narinfo + (with-store store + (narinfo-string store (utf8->string body) + #:nar-path nar-path + #:compressions compressions))) + (narinfo-bv (string->bytevector narinfo "UTF-8")) + (narinfo-length + (bytevector-length narinfo-bv)) + (response (write-response + (with-content-length response + narinfo-length) + client)) + (output (response-port response))) + (configure-socket client) + (put-bytevector output narinfo-bv) + (force-output output) + (unless (keep-alive? response) + (close-port output)) + (values))) (%http-write server client response body)))) (_ (match (assoc-ref (response-headers response) 'x-raw-file)