Message ID | cu7mup04rfp.fsf@systemreboot.net |
---|---|
State | Accepted |
Headers | show |
Series | [bug#33801] import: github: Support source URIs that redirect to GitHub | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | Apply failed |
Hi, Arun Isaac <arunisaac@systemreboot.net> skribis: >> Do you know how many packages fall into that category? > > With this patch, we have a problem estimating the coverage using `guix > refresh -L'. Now, to estimate coverage, we need to make HTTP requests > for every single source tarball in Guix to determine if it redirects to > GitHub. This is an enormous number of HTTP requests! When I ran `guix > refresh -L', it took a very long time to finish coverage estimation. So, > I cancelled the command. Any better way to handle this? Ouch, that’s a problem indeed. We could maintain a cache of redirects, although that wouldn’t be a real solution. Hmm, now that I think about it, shouldn’t we store the github.com URL directly for these packages? We could add a new lint check along the lines of the ‘check-mirror-url’ procedure that would allow us to find out which URLs need to be changed. If we took that route, the changes you made to the importer would no longer be necessary. :-/ WDYT? Thanks, Ludo’.
> Hmm, now that I think about it, shouldn’t we store the github.com URL > directly for these packages? We could add a new lint check along the > lines of the ‘check-mirror-url’ procedure that would allow us to find > out which URLs need to be changed. If we took that route, the changes > you made to the importer would no longer be necessary. :-/ My changes only took a small amount of effort. I don't have much of an issue with them being made obsolete. Shall I proceed with creating a lint check along the lines of what you proposed?
Arun Isaac <arunisaac@systemreboot.net> skribis: >> Hmm, now that I think about it, shouldn’t we store the github.com URL >> directly for these packages? We could add a new lint check along the >> lines of the ‘check-mirror-url’ procedure that would allow us to find >> out which URLs need to be changed. If we took that route, the changes >> you made to the importer would no longer be necessary. :-/ > > My changes only took a small amount of effort. I don't have much of an > issue with them being made obsolete. Shall I proceed with creating a > lint check along the lines of what you proposed? Yes, if that sounds good to you, please do! Ludo’.
> We could maintain a cache of redirects, although that wouldn’t be a > real solution. > > Hmm, now that I think about it, shouldn’t we store the github.com URL > directly for these packages? We could add a new lint check along the > lines of the ‘check-mirror-url’ procedure that would allow us to find > out which URLs need to be changed. If we took that route, the changes > you made to the importer would no longer be necessary. :-/ On the other hand, if we constrain our updater predicates to not perform network operations, we would be restricting what we can achieve with updaters. For example, suppose in the future we have a cgit-updater to update packages hosted using cgit. The only way to detect that would be using some kind of network operation. The way our updaters currently work, we will only ever be able to properly handle centralized platforms like GitHub, PyPI, etc. And, that's a pity. Another solution, though somewhat inelegant and tedious, would be to add an `updater' field to the package specification so that the packager can explicitly specify what updater to use. WDYT?
Arun Isaac <arunisaac@systemreboot.net> skribis: >> We could maintain a cache of redirects, although that wouldn’t be a >> real solution. >> >> Hmm, now that I think about it, shouldn’t we store the github.com URL >> directly for these packages? We could add a new lint check along the >> lines of the ‘check-mirror-url’ procedure that would allow us to find >> out which URLs need to be changed. If we took that route, the changes >> you made to the importer would no longer be necessary. :-/ > > On the other hand, if we constrain our updater predicates to not perform > network operations, we would be restricting what we can achieve with > updaters. For example, suppose in the future we have a cgit-updater to > update packages hosted using cgit. The only way to detect that would be > using some kind of network operation. The way our updaters currently > work, we will only ever be able to properly handle centralized platforms > like GitHub, PyPI, etc. And, that's a pity. We can use heuristics in many cases to determine whether an updater applies to a package; most of the time, that’s only based on the URL, and it might also work for many cgit instances—those that have “/cgit” in their URL. Then again, I expect that a growing number of packages will be obtained through ‘git-fetch’, and the updater for such a thing is trivial (interestingly it’s not yet implemented though :-)). So I feel like at present we don’t have strong evidence that that the simple URL-based approach would not scale. WDYT? > Another solution, though somewhat inelegant and tedious, would be to add > an `updater' field to the package specification so that the packager can > explicitly specify what updater to use. That could be useful to handle corner cases, but it should remain the exception. Thanks, Ludo’.
> We can use heuristics in many cases to determine whether an updater > applies to a package; most of the time, that’s only based on the URL, > and it might also work for many cgit instances—those that have “/cgit” > in their URL. > > Then again, I expect that a growing number of packages will be obtained > through ‘git-fetch’, and the updater for such a thing is trivial > (interestingly it’s not yet implemented though :-)). > > So I feel like at present we don’t have strong evidence that that the > simple URL-based approach would not scale. Agreed! Better to not complicate things beyond what is necessary at this point. I am not confident that good heuristics exist. But, if we do use git-fetch for the majority of our packages, like you said, updating will be trivial. I will proceed with the lint check as discussed earlier.
On Thu, 20 Dec 2018 17:28:42 +0100 Ludovic Courtès <ludo@gnu.org> wrote: > Then again, I expect that a growing number of packages will be obtained > through ‘git-fetch’, and the updater for such a thing is trivial > (interestingly it’s not yet implemented though :-)). See http://debbugs.gnu.org/cgi/bugreport.cgi?bug=33809 `~Eric
From 7fa1daaf44720fa31813e4f07a2c49a2540a0526 Mon Sep 17 00:00:00 2001 From: Arun Isaac <arunisaac@systemreboot.net> Date: Wed, 19 Dec 2018 15:59:52 +0530 Subject: [PATCH] import: github: Support source URIs that redirect to GitHub. * guix/import/github.scm (follow-redirects-to-github): New function. (updated-github-url)[updated-url]: For source URIs on other domains, replace all instances of the old version with the new version. (latest-release)[origin-github-uri]: If necessary, follow redirects to find the GitHub URI. --- guix/import/github.scm | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/guix/import/github.scm b/guix/import/github.scm index af9f56e1d..8db7db305 100644 --- a/guix/import/github.scm +++ b/guix/import/github.scm @@ -1,6 +1,7 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2016 Ben Woodcroft <donttrustben@gmail.com> ;;; Copyright © 2017, 2018 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2018 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;; @@ -19,6 +20,8 @@ (define-module (guix import github) #:use-module (ice-9 match) + #:use-module (ice-9 receive) + #:use-module (ice-9 regex) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:use-module (srfi srfi-34) @@ -29,6 +32,8 @@ #:use-module (guix packages) #:use-module (guix upstream) #:use-module (guix http-client) + #:use-module (web client) + #:use-module (web response) #:use-module (web uri) #:export (%github-updater)) @@ -39,12 +44,30 @@ false if none is recognized" (list ".tar.gz" ".tar.bz2" ".tar.xz" ".zip" ".tar" ".tgz" ".tbz" ".love"))) +(define (follow-redirects-to-github uri) + "Follow redirects of URI until a GitHub URI is found. Return that GitHub +URI. If no GitHub URI is found, return #f." + (define (follow-redirect uri) + (receive (response body) (http-head uri) + (case (response-code response) + ((301 302) + (uri->string (assoc-ref (response-headers response) 'location))) + (else #f)))) + + (cond + ((string-prefix? "https://github.com/" uri) uri) + ((string-prefix? "http" uri) + (and=> (follow-redirect uri) follow-redirects-to-github)) + ;; Do not attempt to follow redirects on URIs other than http and https + ;; (such as mirror, file) + (else #f))) + (define (updated-github-url old-package new-version) ;; Return a url for the OLD-PACKAGE with NEW-VERSION. If no source url in ;; the OLD-PACKAGE is a GitHub url, then return false. (define (updated-url url) - (if (string-prefix? "https://github.com/" url) + (if (follow-redirects-to-github url) (let ((ext (or (find-extension url) "")) (name (package-name old-package)) (version (package-version old-package)) @@ -83,7 +106,14 @@ false if none is recognized" url) (string-append "/releases/download/" repo "-" version "/" repo "-" version ext)) - (#t #f))) ; Some URLs are not recognised. + ;; As a last resort, attempt to replace all instances of the old + ;; version with the new version. This is necessary to handle URIs + ;; hosted on other domains that redirect to GitHub (for an example, + ;; see the youtube-dl package). We do not know the internal + ;; structure of these URIs and cannot handle them more + ;; intelligently. + (else (regexp-substitute/global + #f version url 'pre new-version 'post)))) #f)) (let ((source-url (and=> (package-source old-package) origin-uri)) @@ -210,11 +240,14 @@ https://github.com/settings/tokens")) (define (latest-release pkg) "Return an <upstream-source> for the latest release of PKG." (define (origin-github-uri origin) + ;; We follow redirects to GitHub because the origin URI might appear to be + ;; hosted on some other domain but just redirects to GitHub. For example, + ;; see the youtube-dl package. (match (origin-uri origin) ((? string? url) - url) ;surely a github.com URL + (follow-redirects-to-github url)) ((urls ...) - (find (cut string-contains <> "github.com") urls)))) + (find follow-redirects-to-github urls)))) (let* ((source-uri (origin-github-uri (package-source pkg))) (name (package-name pkg)) -- 2.19.2