Message ID | 20220303211444.19928-2-ludo@gnu.org |
---|---|
State | Accepted |
Headers | show |
Series | 'github' importer gracefully handles rate limiting | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: > When running "guix refresh" (with no arguments), the 'github' updater > gets used until the rate limit has been reached, after which "guix > refresh" automatically picks up the next valid updater, typically > 'generic-git'. Wouldn't this make "guix refresh" non-deterministic (though this might be an acceptable trade-off)? Greetings, Maxime.
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: > + (and (not (request-rate-limit-reached?)) > + (guard (c ([...] > + ;; See > + ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>. > + (match (assq-ref (http-get-error-headers c) > + 'x-ratelimit-remaining) > + (#f > + (raise c)) > + ((? (compose zero? string->number)) > + (let ((reset (update-rate-limit-reset-time! > + (http-get-error-headers c)))) > + (warning (G_ "GitHub rate limit exceeded; \ > +disallowing requests for ~a seconds~%") > + (- reset (car (gettimeofday)))) > + (display-hint (G_ "You can waive the rate limit by > +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained > +from @url{https://github.com/settings/tokens} with your GitHub account. IIRC, the GitHub documentation doesn't state that this waives the rate limit, rather it increases the rate limit a lot, but there's still a limit. Greetings, Maxime.
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> [...]
It would be nice to have some tests for the rate limiting code,
in tests/import-github.scm.
Greetings,
Maxime.
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: > +(define (update-rate-limit-reset-time! headers) > + "Update the rate limit reset time based on HEADERS, the HTTP response > +headers." > + (match (assq-ref headers 'x-ratelimit-reset) > + ((= string->number (? number? reset)) > + (set! %rate-limit-reset-time reset) > + reset) > + (_ > + 0))) When can this second case happen? Greetings, Maxime.
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: > +(define (request-rate-limit-reached?) > + "Return true if the rate limit has been reached." > + (and %rate-limit-reset-time > + (match (< (car (gettimeofday)) %rate-limit-reset-time) > + (#t #t) > + (#f > + (set! %rate-limit-reset-time #f) > + #f)))) The clocks used by the GitHub server cannot exactly be the clock of the local Guix (at least, not in a realistic setting). WDYT of adding a little margin, accounting for the impossibility of clocks exactly matching and allowing for some clock skew? (< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time)) Greetings, Maxime.
Maxime Devos <maximedevos@telenet.be> skribis: > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: >> When running "guix refresh" (with no arguments), the 'github' updater >> gets used until the rate limit has been reached, after which "guix >> refresh" automatically picks up the next valid updater, typically >> 'generic-git'. > > Wouldn't this make "guix refresh" non-deterministic (though this might > be an acceptable trade-off)? Correct. It could be said that ‘guix refresh’ is “non-deterministic” anyway since its result depends on external services. I think it’s an acceptable tradeoff. Ludo’.
Maxime Devos <maximedevos@telenet.be> skribis: > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: >> + (and (not (request-rate-limit-reached?)) >> + (guard (c ([...] >> + ;; See >> + ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>. >> + (match (assq-ref (http-get-error-headers c) >> + 'x-ratelimit-remaining) >> + (#f >> + (raise c)) >> + ((? (compose zero? string->number)) >> + (let ((reset (update-rate-limit-reset-time! >> + (http-get-error-headers c)))) >> + (warning (G_ "GitHub rate limit exceeded; \ >> +disallowing requests for ~a seconds~%") >> + (- reset (car (gettimeofday)))) >> + (display-hint (G_ "You can waive the rate limit by >> +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained >> +from @url{https://github.com/settings/tokens} with your GitHub account. > > IIRC, the GitHub documentation doesn't state that this waives the rate > limit, rather it increases the rate limit a lot, but there's still a > limit. Good point, I’ll adjust the message accordingly.
Maxime Devos <maximedevos@telenet.be> skribis: > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: >> +(define (update-rate-limit-reset-time! headers) >> + "Update the rate limit reset time based on HEADERS, the HTTP response >> +headers." >> + (match (assq-ref headers 'x-ratelimit-reset) >> + ((= string->number (? number? reset)) >> + (set! %rate-limit-reset-time reset) >> + reset) >> + (_ >> + 0))) > > When can this second case happen? I don’t know if it’s supposed to happen. It’s defensive programming: better keep going than crash if the server starts behaving slightly differently.
Maxime Devos <maximedevos@telenet.be> skribis: > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: >> +(define (request-rate-limit-reached?) >> + "Return true if the rate limit has been reached." >> + (and %rate-limit-reset-time >> + (match (< (car (gettimeofday)) %rate-limit-reset-time) >> + (#t #t) >> + (#f >> + (set! %rate-limit-reset-time #f) >> + #f)))) > > The clocks used by the GitHub server cannot exactly be the clock of the > local Guix (at least, not in a realistic setting). WDYT of adding a > little margin, accounting for the impossibility of clocks exactly > matching and allowing for some clock skew? > > (< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time)) I don’t think it’s necessary. The worst that can happen is that we retry too early, get another 403 response, and retry later. (In practice, on my NTP-synchronized laptop, things just work.)
Maxime Devos <maximedevos@telenet.be> skribis: > It would be nice to have some tests for the rate limiting code, > in tests/import-github.scm. It would but… I think I’ll punt on that one. :-)
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]: > I don’t know if it’s supposed to happen. It’s defensive programming: > better keep going than crash if the server starts behaving slightly > differently. That's called total programming I think? From a OOP I'm following: * total: handle all cases without complaints (no throwing exceptions or such), assign every case a well-defined (and documented!) behaviour * nominal: document the preconditions, but don't bother checking them * defensive: check inputs, if they are wrong, throw an exception (it was probably formulated a bit differently but that was the gist of it) At least according to this classification, this 'update-rate-limit-reset-time!' would be total (except for the lack of documentation), not defensive. Greetings, Maxime.
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]: > Maxime Devos <maximedevos@telenet.be> skribis: > > > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]: > > > +(define (update-rate-limit-reset-time! headers) > > > + "Update the rate limit reset time based on HEADERS, the HTTP response > > > +headers." > > > + (match (assq-ref headers 'x-ratelimit-reset) > > > + ((= string->number (? number? reset)) > > > + (set! %rate-limit-reset-time reset) > > > + reset) > > > + (_ > > > + 0))) > > > > When can this second case happen? > > I don’t know if it’s supposed to happen. It’s defensive programming: > better keep going than crash if the server starts behaving slightly > differently. If it's not supposed to happen, can it at least be reported with a warning, such that we then know that 'update-rate-limit-reset-time!' needs to be extended or GitHub needs to be contacted? FWIW, I think crashing in case of bogus HTTP answers is fine, as long as it crashes with a _nice_ error message ("guix: error: HTTP server foo.com returned an unrecoginised X-Ratelimit-Reset $SOME_STRING" or something like that) instead of some vague backtrace. Greetings, Maxime.
Hi, Maxime Devos <maximedevos@telenet.be> skribis: > That's called total programming I think? From a OOP I'm following: > > * total: handle all cases without complaints (no throwing exceptions > or such), assign every case a well-defined (and documented!) > behaviour > * nominal: document the preconditions, but don't bother checking them > * defensive: check inputs, if they are wrong, throw an exception Interesting; I stand corrected! > If it's not supposed to happen, can it at least be reported with a > warning, such that we then know that 'update-rate-limit-reset-time!' > needs to be extended or GitHub needs to be contacted? Yes, sounds reasonable. Thanks, Ludo’.
diff --git a/guix/import/github.scm b/guix/import/github.scm index 8c1898c0c5..5062bad80d 100644 --- a/guix/import/github.scm +++ b/guix/import/github.scm @@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2016 Ben Woodcroft <donttrustben@gmail.com> -;;; Copyright © 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2017-2020, 2022 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2019 Arun Isaac <arunisaac@systemreboot.net> ;;; Copyright © 2019 Efraim Flashner <efraim@flashner.co.il> @@ -30,15 +30,16 @@ (define-module (guix import github) #:use-module (guix utils) #:use-module (guix i18n) #:use-module (guix diagnostics) + #:use-module ((guix ui) #:select (display-hint)) #:use-module ((guix download) #:prefix download:) #:use-module ((guix git-download) #:prefix download:) #:use-module (guix import utils) - #:use-module (guix import json) #:use-module (json) #:use-module (guix packages) #:use-module (guix upstream) #:use-module (guix http-client) #:use-module (web uri) + #:use-module (web response) #:export (%github-api %github-updater)) ;; For tests. @@ -140,6 +141,30 @@ (define %github-token ;; limit, or #f. (make-parameter (getenv "GUIX_GITHUB_TOKEN"))) +(define %rate-limit-reset-time + ;; Time (seconds since the Epoch, UTC) when the rate limit for GitHub + ;; requests will be reset, or #f if the rate limit hasn't been reached. + #f) + +(define (update-rate-limit-reset-time! headers) + "Update the rate limit reset time based on HEADERS, the HTTP response +headers." + (match (assq-ref headers 'x-ratelimit-reset) + ((= string->number (? number? reset)) + (set! %rate-limit-reset-time reset) + reset) + (_ + 0))) + +(define (request-rate-limit-reached?) + "Return true if the rate limit has been reached." + (and %rate-limit-reset-time + (match (< (car (gettimeofday)) %rate-limit-reset-time) + (#t #t) + (#f + (set! %rate-limit-reset-time #f) + #f)))) + (define (fetch-releases-or-tags url) "Fetch the list of \"releases\" or, if it's empty, the list of tags for the repository at URL. Return the corresponding JSON dictionaries (alists), @@ -170,20 +195,49 @@ (define headers `((Authorization . ,(string-append "token " (%github-token)))) '()))) - (guard (c ((and (http-get-error? c) - (= 404 (http-get-error-code c))) - (warning (G_ "~a is unreachable (~a)~%") - release-url (http-get-error-code c)) - '#())) ;return an empty release set - (let* ((port (http-fetch release-url #:headers headers)) - (result (json->scm port))) - (close-port port) - (match result - (#() - ;; We got the empty list, presumably because the user didn't use GitHub's - ;; "release" mechanism, but hopefully they did use Git tags. - (json-fetch tag-url #:headers headers)) - (x x))))) + (and (not (request-rate-limit-reached?)) + (guard (c ((and (http-get-error? c) + (= 404 (http-get-error-code c))) + (warning (G_ "~a is unreachable (~a)~%") + (uri->string (http-get-error-uri c)) + (http-get-error-code c)) + '#()) ;return an empty release set + ((and (http-get-error? c) + (= 403 (http-get-error-code c))) + ;; See + ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>. + (match (assq-ref (http-get-error-headers c) + 'x-ratelimit-remaining) + (#f + (raise c)) + ((? (compose zero? string->number)) + (let ((reset (update-rate-limit-reset-time! + (http-get-error-headers c)))) + (warning (G_ "GitHub rate limit exceeded; \ +disallowing requests for ~a seconds~%") + (- reset (car (gettimeofday)))) + (display-hint (G_ "You can waive the rate limit by +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained +from @url{https://github.com/settings/tokens} with your GitHub account. + +Alternatively, you can wait until your rate limit is reset, or use the +@code{generic-git} updater instead.")) + #f)) ;bail out + (_ + (raise c))))) + + (let* ((port (http-fetch release-url #:headers headers)) + (result (json->scm port))) + (close-port port) + (match result + (#() + ;; We got the empty list, presumably because the user didn't use GitHub's + ;; "release" mechanism, but hopefully they did use Git tags. + (let* ((port (http-fetch tag-url #:headers headers)) + (json (json->scm port))) + (close-port port) + json)) + (x x)))))) (define (latest-released-version url package-name) "Return the newest released version and its tag given a string URL like @@ -223,23 +277,16 @@ (define (release->version release) (cons tag tag)) (else #f)))) - (let* ((json (and=> (fetch-releases-or-tags url) - vector->list))) - (if (eq? json #f) - (if (%github-token) - (error "Error downloading release information through the GitHub -API when using a GitHub token") - (error "Error downloading release information through the GitHub -API. This may be fixed by using an access token and setting the environment -variable GUIX_GITHUB_TOKEN, for instance one procured from -https://github.com/settings/tokens")) - (match (sort (filter-map release->version - (match (remove pre-release? json) - (() json) ; keep everything - (releases releases))) - (lambda (x y) (version>? (car x) (car y)))) - (((latest-version . tag) . _) (values latest-version tag)) - (() (values #f #f)))))) + (match (and=> (fetch-releases-or-tags url) vector->list) + (#f (values #f #f)) + (json + (match (sort (filter-map release->version + (match (remove pre-release? json) + (() json) ; keep everything + (releases releases))) + (lambda (x y) (version>? (car x) (car y)))) + (((latest-version . tag) . _) (values latest-version tag)) + (() (values #f #f)))))) (define (latest-release pkg) "Return an <upstream-source> for the latest release of PKG."