Message ID | 20210315151133.16282-1-mail@cbaines.net |
---|---|
State | Accepted |
Headers | show |
Series | [bug#47160] scripts: substitute: Add back some error handling. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
Hi, Christopher Baines <mail@cbaines.net> skribis: > In f50f5751fff4cfc6d5abba9681054569694b7a5c, the way fetch was called within > process-substitution was changed. As with-cached-connection actually includes > important error handling for the opening of a HTTP request (when using a > cached connection), this change removed some error handling. > > This commit adds that error handling back, > with-cached-connection/call-with-cached-connection is back, rebranded as > call-with-fresh-connection-retry. > > * guix/scripts/substitute.scm (process-substitution): Retry once for some > errors when making HTTP requests to fetch substitutes. [...] > + (define (call-with-fresh-connection-retry uri proc) > + (define (get-port) > + (open-connection-for-uri/cached uri > + #:verify-certificate? #f)) > + > + (let ((port (get-port))) > + (catch #t > + (lambda () > + (proc port)) > + (lambda (key . args) > + ;; If PORT was cached and the server closed the connection in the > + ;; meantime, we get EPIPE. In that case, open a fresh connection > + ;; and retry. We might also get 'bad-response or a similar > + ;; exception from (web response) later on, once we've sent the > + ;; request, or a ERROR/INVALID-SESSION from GnuTLS. > + (if (or (and (eq? key 'system-error) > + (= EPIPE (system-error-errno `(,key ,@args)))) > + (and (eq? key 'gnutls-error) > + (eq? (first args) error/invalid-session)) > + (memq key '(bad-response bad-header bad-header-component))) > + (begin > + (close-port port) ; close the port to get a fresh one > + (proc (get-port))) > + (apply throw key args)))))) I think this should be at the top level for clarity. It used to have ‘cached’ in its name because catching all these exceptions is something you wouldn’t normally do; it only makes sense in the context of cached connections. > (define (fetch uri) > (case (uri-scheme uri) > ((file) > @@ -424,11 +450,13 @@ the current output port." > (call-with-connection-error-handling > uri > (lambda () > - (http-fetch uri #:text? #f > - #:open-connection open-connection-for-uri/cached > - #:keep-alive? #t > - #:buffered? #f > - #:verify-certificate? #f)))))) > + (call-with-fresh-connection-retry > + uri > + (lambda (port) > + (http-fetch uri #:text? #f > + #:port port > + #:keep-alive? #t > + #:buffered? #f)))))))) Does ‘call-with-connection-error-handling’ still make sense here? There’s already ‘with-networking’ at the top level to do proper networking error reporting. Regarding <https://issues.guix.gnu.org/47157>, I would lean towards perhaps reverting the connection/error-handling patch series and starting anew from that “known state”. This area is unfortunately quite tedious to test and to get right so I’d err on the path of conservative, incremental changes. Thought? Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > Christopher Baines <mail@cbaines.net> skribis: > >> In f50f5751fff4cfc6d5abba9681054569694b7a5c, the way fetch was called within >> process-substitution was changed. As with-cached-connection actually includes >> important error handling for the opening of a HTTP request (when using a >> cached connection), this change removed some error handling. >> >> This commit adds that error handling back, >> with-cached-connection/call-with-cached-connection is back, rebranded as >> call-with-fresh-connection-retry. >> >> * guix/scripts/substitute.scm (process-substitution): Retry once for some >> errors when making HTTP requests to fetch substitutes. > > [...] > >> + (define (call-with-fresh-connection-retry uri proc) >> + (define (get-port) >> + (open-connection-for-uri/cached uri >> + #:verify-certificate? #f)) >> + >> + (let ((port (get-port))) >> + (catch #t >> + (lambda () >> + (proc port)) >> + (lambda (key . args) >> + ;; If PORT was cached and the server closed the connection in the >> + ;; meantime, we get EPIPE. In that case, open a fresh connection >> + ;; and retry. We might also get 'bad-response or a similar >> + ;; exception from (web response) later on, once we've sent the >> + ;; request, or a ERROR/INVALID-SESSION from GnuTLS. >> + (if (or (and (eq? key 'system-error) >> + (= EPIPE (system-error-errno `(,key ,@args)))) >> + (and (eq? key 'gnutls-error) >> + (eq? (first args) error/invalid-session)) >> + (memq key '(bad-response bad-header bad-header-component))) >> + (begin >> + (close-port port) ; close the port to get a fresh one >> + (proc (get-port))) >> + (apply throw key args)))))) > > I think this should be at the top level for clarity. It used to have > ‘cached’ in its name because catching all these exceptions is something > you wouldn’t normally do; it only makes sense in the context of cached > connections. I initially tried to just put the error handling in just where it's needed, but that was difficult since the http-fetch bit needs to happen again when there's a relevant error. The two things: getting a port which maybe is a cached connection and handling some errors plus potentially re-running proc is difficult to capture in a name, but "call-with-cached-connection-and-error-handling" is an improvement over "with-cached-connection" I think. >> (define (fetch uri) >> (case (uri-scheme uri) >> ((file) >> @@ -424,11 +450,13 @@ the current output port." >> (call-with-connection-error-handling >> uri >> (lambda () >> - (http-fetch uri #:text? #f >> - #:open-connection open-connection-for-uri/cached >> - #:keep-alive? #t >> - #:buffered? #f >> - #:verify-certificate? #f)))))) >> + (call-with-fresh-connection-retry >> + uri >> + (lambda (port) >> + (http-fetch uri #:text? #f >> + #:port port >> + #:keep-alive? #t >> + #:buffered? #f)))))))) > > Does ‘call-with-connection-error-handling’ still make sense here? > There’s already ‘with-networking’ at the top level to do proper > networking error reporting. So, looking back, the call-with-connection-error-handling error handling was related to (call-)with-cached-connection, but it was only relevant inside of fetch-narinfos, as that's when open-connection-for-uri/maybe was passed in to call-with-cached-connection. Which means no, I think it can be removed, at least that's more consistent with the older behaviour. I'll send some updated patches. > Regarding <https://issues.guix.gnu.org/47157>, I would lean towards > perhaps reverting the connection/error-handling patch series and > starting anew from that “known state”. > > This area is unfortunately quite tedious to test and to get right so I’d > err on the path of conservative, incremental changes. > > Thought? My preference is still to try and move forward and to make the error handling easier to see in the code. Particularly with this change, I think the problem was introduced in this commit [1], but I think it's hard to tell from the diff, since the error handling and retrying is within with-cached-connection. 1: https://git.savannah.gnu.org/cgit/guix.git/commit/?id=f50f5751fff4cfc6d5abba9681054569694b7a5c That commit was one of the commits where I was making small incremental changes prior to actually getting to the changes I was looking at making, but a breakage was still introduced. What I was thinking about with this patch was how to make the error handling being added back here easier to see, and thus harder to break/remove.
Christopher Baines <mail@cbaines.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: [...] >> Regarding <https://issues.guix.gnu.org/47157>, I would lean towards >> perhaps reverting the connection/error-handling patch series and >> starting anew from that “known state”. >> >> This area is unfortunately quite tedious to test and to get right so I’d >> err on the path of conservative, incremental changes. >> >> Thought? > > My preference is still to try and move forward and to make the error > handling easier to see in the code. > > Particularly with this change, I think the problem was introduced in > this commit [1], but I think it's hard to tell from the diff, since the > error handling and retrying is within with-cached-connection. > > 1: https://git.savannah.gnu.org/cgit/guix.git/commit/?id=f50f5751fff4cfc6d5abba9681054569694b7a5c > > That commit was one of the commits where I was making small incremental > changes prior to actually getting to the changes I was looking at > making, but a breakage was still introduced. > > What I was thinking about with this patch was how to make the error > handling being added back here easier to see, and thus harder to > break/remove. OK. Though I’m still unsure what the patch series starting at 7b812f7c84c43455cdd68a0e51b6ded018afcc8e was about. What was the end goal? I also wonder if it introduced other issues. For example, 7b812f7c84c43455cdd68a0e51b6ded018afcc8e replaced a reference to ‘open-connection-for-uri/cached’ by one to ‘open-connection-for-uri/maybe’. Are we still using cached connections? Commit f50f5751fff4cfc6d5abba9681054569694b7a5c no longer passes the #:port parameter to ‘http-fetch’. Commit 20c08a8a45d0f137ead7c05e720456b2aea44402 does other things but at first sight I’m not sure what the effect is. If you’re confident we can move forward to fix the bug, that’s great (though we’ll need a good deal of testing), but I’d still like to clarify these points later on. Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Christopher Baines <mail@cbaines.net> skribis: > >> Ludovic Courtès <ludo@gnu.org> writes: > > [...] > >>> Regarding <https://issues.guix.gnu.org/47157>, I would lean towards >>> perhaps reverting the connection/error-handling patch series and >>> starting anew from that “known state”. >>> >>> This area is unfortunately quite tedious to test and to get right so I’d >>> err on the path of conservative, incremental changes. >>> >>> Thought? >> >> My preference is still to try and move forward and to make the error >> handling easier to see in the code. >> >> Particularly with this change, I think the problem was introduced in >> this commit [1], but I think it's hard to tell from the diff, since the >> error handling and retrying is within with-cached-connection. >> >> 1: https://git.savannah.gnu.org/cgit/guix.git/commit/?id=f50f5751fff4cfc6d5abba9681054569694b7a5c >> >> That commit was one of the commits where I was making small incremental >> changes prior to actually getting to the changes I was looking at >> making, but a breakage was still introduced. >> >> What I was thinking about with this patch was how to make the error >> handling being added back here easier to see, and thus harder to >> break/remove. > > OK. > > Though I’m still unsure what the patch series starting at > 7b812f7c84c43455cdd68a0e51b6ded018afcc8e was about. What was the end > goal? So that was part of the creation of the (guix substitutes) module, unpicking the code in the script to separate out some of the connection caching was a prerequisite (discussion starts here https://issues.guix.gnu.org/45409#5 ). I think separating out that module is still a good thing. It's allowed for improvements in guix, the weather script doesn't now call in to the substitute script code for example. I'd also like the separation for things like the Guix Build Coordinator, which currently attempts to use the substitute code from Guix. > I also wonder if it introduced other issues. For > example, 7b812f7c84c43455cdd68a0e51b6ded018afcc8e replaced a reference > to ‘open-connection-for-uri/cached’ by one to > ‘open-connection-for-uri/maybe’. Are we still using cached > connections? At least on that commit, open-connection-for-uri/maybe calls open-connection-for-uri/cached, so yes, still using cached connections. > Commit f50f5751fff4cfc6d5abba9681054569694b7a5c no longer passes the > #:port parameter to ‘http-fetch’. Yeah, that change is sort of fine if you're just looking at how the port/connection is handled, but that area is being fixed up here, and because closing the port is something that happens, it's better to also pass the port in. > Commit 20c08a8a45d0f137ead7c05e720456b2aea44402 does other things but at > first sight I’m not sure what the effect is. So, open-connection-for-uri/maybe is like open-connection-for-uri/cached, but it catches a couple of exceptions relating to not being able to connect to a substitute server, it also remembers about showing the messages. The second commit here is changing that slightly, to not apply to process-substitution, however I do think that code might have applied in the past (as open-connection-for-uri/maybe was used I believe). But I think you're right in saying there's probably some overlap between the error handling here and done by with-networking. > If you’re confident we can move forward to fix the bug, that’s great > (though we’ll need a good deal of testing), but I’d still like to > clarify these points later on. Well, the changes I'm suggesting here seem reasonable to me. As for testing, checking things basically work is easy enough, but I don't currently have many ideas for how to test for when fetching things doesn't go to plan (which can of course happen).
Howdy! Christopher Baines <mail@cbaines.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: [...] >> Though I’m still unsure what the patch series starting at >> 7b812f7c84c43455cdd68a0e51b6ded018afcc8e was about. What was the end >> goal? > > So that was part of the creation of the (guix substitutes) module, > unpicking the code in the script to separate out some of the connection > caching was a prerequisite (discussion starts here > https://issues.guix.gnu.org/45409#5 ). > > I think separating out that module is still a good thing. It's allowed > for improvements in guix, the weather script doesn't now call in to the > substitute script code for example. I'd also like the separation for > things like the Guix Build Coordinator, which currently attempts to use > the substitute code from Guix. Right, I agree this is a worthy goal. Untangling the stateful bits is the hard part, as we see. :-) >> I also wonder if it introduced other issues. For >> example, 7b812f7c84c43455cdd68a0e51b6ded018afcc8e replaced a reference >> to ‘open-connection-for-uri/cached’ by one to >> ‘open-connection-for-uri/maybe’. Are we still using cached >> connections? > > At least on that commit, open-connection-for-uri/maybe calls > open-connection-for-uri/cached, so yes, still using cached connections. OK. >> Commit f50f5751fff4cfc6d5abba9681054569694b7a5c no longer passes the >> #:port parameter to ‘http-fetch’. > > Yeah, that change is sort of fine if you're just looking at how the > port/connection is handled, but that area is being fixed up here, and > because closing the port is something that happens, it's better to also > pass the port in. OK. >> Commit 20c08a8a45d0f137ead7c05e720456b2aea44402 does other things but at >> first sight I’m not sure what the effect is. > > So, open-connection-for-uri/maybe is like > open-connection-for-uri/cached, but it catches a couple of exceptions > relating to not being able to connect to a substitute server, it also > remembers about showing the messages. > > The second commit here is changing that slightly, to not apply to > process-substitution, however I do think that code might have applied in > the past (as open-connection-for-uri/maybe was used I believe). But I > think you're right in saying there's probably some overlap between the > error handling here and done by with-networking. Alright. >> If you’re confident we can move forward to fix the bug, that’s great >> (though we’ll need a good deal of testing), but I’d still like to >> clarify these points later on. > > Well, the changes I'm suggesting here seem reasonable to me. As for > testing, checking things basically work is easy enough, but I don't > currently have many ideas for how to test for when fetching things > doesn't go to plan (which can of course happen). I’ll do some testing of v2 on my end and report back. Thanks for the explanations! Ludo’.
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm index 6892aa999b..2c9b45023f 100755 --- a/guix/scripts/substitute.scm +++ b/guix/scripts/substitute.scm @@ -45,6 +45,7 @@ #:select (uri-abbreviation nar-uri-abbreviation (open-connection-for-uri . guix:open-connection-for-uri))) + #:autoload (gnutls) (error/invalid-session) #:use-module (guix progress) #:use-module ((guix build syscalls) #:select (set-thread-name)) @@ -401,6 +402,31 @@ the current output port." (apply dump-file/deduplicate (append args (list #:store (%store-prefix))))) + (define (call-with-fresh-connection-retry uri proc) + (define (get-port) + (open-connection-for-uri/cached uri + #:verify-certificate? #f)) + + (let ((port (get-port))) + (catch #t + (lambda () + (proc port)) + (lambda (key . args) + ;; If PORT was cached and the server closed the connection in the + ;; meantime, we get EPIPE. In that case, open a fresh connection + ;; and retry. We might also get 'bad-response or a similar + ;; exception from (web response) later on, once we've sent the + ;; request, or a ERROR/INVALID-SESSION from GnuTLS. + (if (or (and (eq? key 'system-error) + (= EPIPE (system-error-errno `(,key ,@args)))) + (and (eq? key 'gnutls-error) + (eq? (first args) error/invalid-session)) + (memq key '(bad-response bad-header bad-header-component))) + (begin + (close-port port) ; close the port to get a fresh one + (proc (get-port))) + (apply throw key args)))))) + (define (fetch uri) (case (uri-scheme uri) ((file) @@ -424,11 +450,13 @@ the current output port." (call-with-connection-error-handling uri (lambda () - (http-fetch uri #:text? #f - #:open-connection open-connection-for-uri/cached - #:keep-alive? #t - #:buffered? #f - #:verify-certificate? #f)))))) + (call-with-fresh-connection-retry + uri + (lambda (port) + (http-fetch uri #:text? #f + #:port port + #:keep-alive? #t + #:buffered? #f)))))))) (else (leave (G_ "unsupported substitute URI scheme: ~a~%") (uri->string uri)))))