diff mbox series

[bug#54241,2/4] import: github: Gracefully handle rate limit exhaustion.

Message ID 20220303211444.19928-2-ludo@gnu.org
State Accepted
Headers show
Series 'github' importer gracefully handles rate limiting | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Ludovic Courtès March 3, 2022, 9:14 p.m. UTC
Previously, 'guix refresh' would literally crash when the rate limit was
reached due to the call to 'error'.  With this change, the updater
notices when the rate limit is reached and it turns itself into a no-op
until the rate limit has been reset.

When running "guix refresh" (with no arguments), the 'github' updater
gets used until the rate limit has been reached, after which "guix
refresh" automatically picks up the next valid updater, typically
'generic-git'.

* guix/import/github.scm (fetch-releases-or-tags): Use 'http-fetch'
directly instead of 'json-fetch' to let 'http-get-error?' exceptions
through.  Handle 403 errors with an 'X-RateLimit-Remaining' header.
(%rate-limit-reset-time): New variable.
(update-rate-limit-reset-time!, request-rate-limit-reached?): New
procedures.
(latest-released-version): Remove calls to 'error'.
---
 guix/import/github.scm | 113 +++++++++++++++++++++++++++++------------
 1 file changed, 80 insertions(+), 33 deletions(-)

Comments

M March 5, 2022, 9:35 a.m. UTC | #1
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> When running "guix refresh" (with no arguments), the 'github' updater
> gets used until the rate limit has been reached, after which "guix
> refresh" automatically picks up the next valid updater, typically
> 'generic-git'.

Wouldn't this make "guix refresh" non-deterministic (though this might
be an acceptable trade-off)?

Greetings,
Maxime.
M March 5, 2022, 9:37 a.m. UTC | #2
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> +  (and (not (request-rate-limit-reached?))
> +       (guard (c ([...]
> +                  ;; See
> +                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
> +                  (match (assq-ref (http-get-error-headers c)
> +                                   'x-ratelimit-remaining)
> +                    (#f
> +                     (raise c))
> +                    ((? (compose zero? string->number))
> +                     (let ((reset (update-rate-limit-reset-time!
> +                                   (http-get-error-headers c))))
> +                       (warning (G_ "GitHub rate limit exceeded; \
> +disallowing requests for ~a seconds~%")
> +                                (- reset (car (gettimeofday))))
> +                       (display-hint (G_ "You can waive the rate limit by
> +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
> +from @url{https://github.com/settings/tokens} with your GitHub account.

IIRC, the GitHub documentation doesn't state that this waives the rate
limit, rather it increases the rate limit a lot, but there's still a
limit.

Greetings,
Maxime.
M March 5, 2022, 9:48 a.m. UTC | #3
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> [...]

It would be nice to have some tests for the rate limiting code,
in tests/import-github.scm.

Greetings,
Maxime.
M March 5, 2022, 9:48 a.m. UTC | #4
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> +(define (update-rate-limit-reset-time! headers)
> +  "Update the rate limit reset time based on HEADERS, the HTTP response
> +headers."
> +  (match (assq-ref headers 'x-ratelimit-reset)
> +    ((= string->number (? number? reset))
> +     (set! %rate-limit-reset-time reset)
> +     reset)
> +    (_
> +     0)))

When can this second case happen?

Greetings,
Maxime.
M March 5, 2022, 9:52 a.m. UTC | #5
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> +(define (request-rate-limit-reached?)
> +  "Return true if the rate limit has been reached."
> +  (and %rate-limit-reset-time
> +       (match (< (car (gettimeofday)) %rate-limit-reset-time)
> +         (#t #t)
> +         (#f
> +          (set! %rate-limit-reset-time #f)
> +          #f))))

The clocks used by the GitHub server cannot exactly be the clock of the
local Guix (at least, not in a realistic setting).  WDYT of adding a
little margin, accounting for the impossibility of clocks exactly
matching and allowing for some clock skew?

  (< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time))

Greetings,
Maxime.
Ludovic Courtès March 5, 2022, 10 p.m. UTC | #6
Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> When running "guix refresh" (with no arguments), the 'github' updater
>> gets used until the rate limit has been reached, after which "guix
>> refresh" automatically picks up the next valid updater, typically
>> 'generic-git'.
>
> Wouldn't this make "guix refresh" non-deterministic (though this might
> be an acceptable trade-off)?

Correct.  It could be said that ‘guix refresh’ is “non-deterministic”
anyway since its result depends on external services.  I think it’s an
acceptable tradeoff.

Ludo’.
Ludovic Courtès March 5, 2022, 10:01 p.m. UTC | #7
Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +  (and (not (request-rate-limit-reached?))
>> +       (guard (c ([...]
>> +                  ;; See
>> +                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
>> +                  (match (assq-ref (http-get-error-headers c)
>> +                                   'x-ratelimit-remaining)
>> +                    (#f
>> +                     (raise c))
>> +                    ((? (compose zero? string->number))
>> +                     (let ((reset (update-rate-limit-reset-time!
>> +                                   (http-get-error-headers c))))
>> +                       (warning (G_ "GitHub rate limit exceeded; \
>> +disallowing requests for ~a seconds~%")
>> +                                (- reset (car (gettimeofday))))
>> +                       (display-hint (G_ "You can waive the rate limit by
>> +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
>> +from @url{https://github.com/settings/tokens} with your GitHub account.
>
> IIRC, the GitHub documentation doesn't state that this waives the rate
> limit, rather it increases the rate limit a lot, but there's still a
> limit.

Good point, I’ll adjust the message accordingly.
Ludovic Courtès March 5, 2022, 10:03 p.m. UTC | #8
Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +(define (update-rate-limit-reset-time! headers)
>> +  "Update the rate limit reset time based on HEADERS, the HTTP response
>> +headers."
>> +  (match (assq-ref headers 'x-ratelimit-reset)
>> +    ((= string->number (? number? reset))
>> +     (set! %rate-limit-reset-time reset)
>> +     reset)
>> +    (_
>> +     0)))
>
> When can this second case happen?

I don’t know if it’s supposed to happen.  It’s defensive programming:
better keep going than crash if the server starts behaving slightly
differently.
Ludovic Courtès March 5, 2022, 10:06 p.m. UTC | #9
Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +(define (request-rate-limit-reached?)
>> +  "Return true if the rate limit has been reached."
>> +  (and %rate-limit-reset-time
>> +       (match (< (car (gettimeofday)) %rate-limit-reset-time)
>> +         (#t #t)
>> +         (#f
>> +          (set! %rate-limit-reset-time #f)
>> +          #f))))
>
> The clocks used by the GitHub server cannot exactly be the clock of the
> local Guix (at least, not in a realistic setting).  WDYT of adding a
> little margin, accounting for the impossibility of clocks exactly
> matching and allowing for some clock skew?
>
>   (< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time))

I don’t think it’s necessary.  The worst that can happen is that we
retry too early, get another 403 response, and retry later.  (In
practice, on my NTP-synchronized laptop, things just work.)
Ludovic Courtès March 5, 2022, 10:09 p.m. UTC | #10
Maxime Devos <maximedevos@telenet.be> skribis:

> It would be nice to have some tests for the rate limiting code,
> in tests/import-github.scm.

It would but… I think I’ll punt on that one.  :-)
M March 5, 2022, 10:16 p.m. UTC | #11
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]:
> I don’t know if it’s supposed to happen.  It’s defensive programming:
> better keep going than crash if the server starts behaving slightly
> differently.

That's called total programming I think?  From a OOP I'm following:

  * total: handle all cases without complaints (no throwing exceptions
    or such), assign every case a well-defined (and documented!)
    behaviour
  * nominal: document the preconditions, but don't bother checking them
  * defensive: check inputs, if they are wrong, throw an exception

(it was probably formulated a bit differently but that was the gist of
it)

At least according to this classification, this
'update-rate-limit-reset-time!' would be total (except for the lack of
documentation), not defensive.

Greetings,
Maxime.
M March 5, 2022, 10:21 p.m. UTC | #12
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]:
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> > > +(define (update-rate-limit-reset-time! headers)
> > > +  "Update the rate limit reset time based on HEADERS, the HTTP response
> > > +headers."
> > > +  (match (assq-ref headers 'x-ratelimit-reset)
> > > +    ((= string->number (? number? reset))
> > > +     (set! %rate-limit-reset-time reset)
> > > +     reset)
> > > +    (_
> > > +     0)))
> > 
> > When can this second case happen?
> 
> I don’t know if it’s supposed to happen.  It’s defensive programming:
> better keep going than crash if the server starts behaving slightly
> differently.

If it's not supposed to happen, can it at least be reported with a
warning, such that we then know that 'update-rate-limit-reset-time!'
needs to be extended or GitHub needs to be contacted?

FWIW, I think crashing in case of bogus HTTP answers is fine, as long
as it crashes with a _nice_ error message ("guix: error: HTTP server
foo.com returned an unrecoginised X-Ratelimit-Reset $SOME_STRING" or
something like that) instead of some vague backtrace.

Greetings,
Maxime.
Ludovic Courtès March 6, 2022, 5:18 p.m. UTC | #13
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

> That's called total programming I think?  From a OOP I'm following:
>
>   * total: handle all cases without complaints (no throwing exceptions
>     or such), assign every case a well-defined (and documented!)
>     behaviour
>   * nominal: document the preconditions, but don't bother checking them
>   * defensive: check inputs, if they are wrong, throw an exception

Interesting; I stand corrected!

> If it's not supposed to happen, can it at least be reported with a
> warning, such that we then know that 'update-rate-limit-reset-time!'
> needs to be extended or GitHub needs to be contacted?

Yes, sounds reasonable.

Thanks,
Ludo’.
diff mbox series

Patch

diff --git a/guix/import/github.scm b/guix/import/github.scm
index 8c1898c0c5..5062bad80d 100644
--- a/guix/import/github.scm
+++ b/guix/import/github.scm
@@ -1,6 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2016 Ben Woodcroft <donttrustben@gmail.com>
-;;; Copyright © 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2017-2020, 2022 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org>
 ;;; Copyright © 2019 Arun Isaac <arunisaac@systemreboot.net>
 ;;; Copyright © 2019 Efraim Flashner <efraim@flashner.co.il>
@@ -30,15 +30,16 @@  (define-module (guix import github)
   #:use-module (guix utils)
   #:use-module (guix i18n)
   #:use-module (guix diagnostics)
+  #:use-module ((guix ui) #:select (display-hint))
   #:use-module ((guix download) #:prefix download:)
   #:use-module ((guix git-download) #:prefix download:)
   #:use-module (guix import utils)
-  #:use-module (guix import json)
   #:use-module (json)
   #:use-module (guix packages)
   #:use-module (guix upstream)
   #:use-module (guix http-client)
   #:use-module (web uri)
+  #:use-module (web response)
   #:export (%github-api %github-updater))
 
 ;; For tests.
@@ -140,6 +141,30 @@  (define %github-token
   ;; limit, or #f.
   (make-parameter (getenv "GUIX_GITHUB_TOKEN")))
 
+(define %rate-limit-reset-time
+  ;; Time (seconds since the Epoch, UTC) when the rate limit for GitHub
+  ;; requests will be reset, or #f if the rate limit hasn't been reached.
+  #f)
+
+(define (update-rate-limit-reset-time! headers)
+  "Update the rate limit reset time based on HEADERS, the HTTP response
+headers."
+  (match (assq-ref headers 'x-ratelimit-reset)
+    ((= string->number (? number? reset))
+     (set! %rate-limit-reset-time reset)
+     reset)
+    (_
+     0)))
+
+(define (request-rate-limit-reached?)
+  "Return true if the rate limit has been reached."
+  (and %rate-limit-reset-time
+       (match (< (car (gettimeofday)) %rate-limit-reset-time)
+         (#t #t)
+         (#f
+          (set! %rate-limit-reset-time #f)
+          #f))))
+
 (define (fetch-releases-or-tags url)
   "Fetch the list of \"releases\" or, if it's empty, the list of tags for the
 repository at URL.  Return the corresponding JSON dictionaries (alists),
@@ -170,20 +195,49 @@  (define headers
             `((Authorization . ,(string-append "token " (%github-token))))
             '())))
 
-  (guard (c ((and (http-get-error? c)
-                  (= 404 (http-get-error-code c)))
-             (warning (G_ "~a is unreachable (~a)~%")
-                      release-url (http-get-error-code c))
-             '#()))                               ;return an empty release set
-    (let* ((port   (http-fetch release-url #:headers headers))
-           (result (json->scm port)))
-      (close-port port)
-      (match result
-        (#()
-         ;; We got the empty list, presumably because the user didn't use GitHub's
-         ;; "release" mechanism, but hopefully they did use Git tags.
-         (json-fetch tag-url #:headers headers))
-        (x x)))))
+  (and (not (request-rate-limit-reached?))
+       (guard (c ((and (http-get-error? c)
+                       (= 404 (http-get-error-code c)))
+                  (warning (G_ "~a is unreachable (~a)~%")
+                           (uri->string (http-get-error-uri c))
+                           (http-get-error-code c))
+                  '#())                           ;return an empty release set
+                 ((and (http-get-error? c)
+                       (= 403 (http-get-error-code c)))
+                  ;; See
+                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
+                  (match (assq-ref (http-get-error-headers c)
+                                   'x-ratelimit-remaining)
+                    (#f
+                     (raise c))
+                    ((? (compose zero? string->number))
+                     (let ((reset (update-rate-limit-reset-time!
+                                   (http-get-error-headers c))))
+                       (warning (G_ "GitHub rate limit exceeded; \
+disallowing requests for ~a seconds~%")
+                                (- reset (car (gettimeofday))))
+                       (display-hint (G_ "You can waive the rate limit by
+setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
+from @url{https://github.com/settings/tokens} with your GitHub account.
+
+Alternatively, you can wait until your rate limit is reset, or use the
+@code{generic-git} updater instead."))
+                       #f))                       ;bail out
+                    (_
+                     (raise c)))))
+
+         (let* ((port   (http-fetch release-url #:headers headers))
+                (result (json->scm port)))
+           (close-port port)
+           (match result
+             (#()
+              ;; We got the empty list, presumably because the user didn't use GitHub's
+              ;; "release" mechanism, but hopefully they did use Git tags.
+              (let* ((port (http-fetch tag-url #:headers headers))
+                     (json (json->scm port)))
+                (close-port port)
+                json))
+             (x x))))))
 
 (define (latest-released-version url package-name)
   "Return the newest released version and its tag given a string URL like
@@ -223,23 +277,16 @@  (define (release->version release)
         (cons tag tag))
        (else #f))))
 
-  (let* ((json (and=> (fetch-releases-or-tags url)
-                      vector->list)))
-    (if (eq? json #f)
-        (if (%github-token)
-            (error "Error downloading release information through the GitHub
-API when using a GitHub token")
-            (error "Error downloading release information through the GitHub
-API. This may be fixed by using an access token and setting the environment
-variable GUIX_GITHUB_TOKEN, for instance one procured from
-https://github.com/settings/tokens"))
-        (match (sort (filter-map release->version
-                                 (match (remove pre-release? json)
-                                   (() json) ; keep everything
-                                   (releases releases)))
-                     (lambda (x y) (version>? (car x) (car y))))
-          (((latest-version . tag) . _) (values latest-version tag))
-          (() (values #f #f))))))
+  (match (and=> (fetch-releases-or-tags url) vector->list)
+    (#f (values #f #f))
+    (json
+     (match (sort (filter-map release->version
+                              (match (remove pre-release? json)
+                                (() json)         ; keep everything
+                                (releases releases)))
+                  (lambda (x y) (version>? (car x) (car y))))
+       (((latest-version . tag) . _) (values latest-version tag))
+       (() (values #f #f))))))
 
 (define (latest-release pkg)
   "Return an <upstream-source> for the latest release of PKG."