Message ID | 87jzzbms54.fsf_-_@gnu.org |
---|---|
State | New |
Headers | show |
Series | [bug#43442] Fixes init of #42162: gforge.inria.fr down Dec. 2020 | expand |
Hello, Ludovic Courtès <ludovic.courtes@inria.fr> writes: > This whole hack looks like a dead end. > > It would be ideal if SWH would compute nar hashes, as you proposed: > > https://gitlab.softwareheritage.org/swh/meta/-/issues/4538 > > As a stopgap, I wonder if we could use “double hashing” on our side, but > only for svn: we’d store both the nar sha256 as we currently do, plus > the swhid. It still seems to me that it’d be hard to scale and to > maintain that over time, even if it’s limited to svn. Plus, there’d > still be the problem of ‘svn-multi-fetch’, which is what most TeX Live > packages use. > > Thoughts? Not too many, but I do have more bad news. Apologies if this is already known, but I’m just getting up to speed with how SWH handles Subversion (for coverage checking) and thought this seemed pretty significant. I was starting with doing a simple check for the “easy” Subversion repositories. That is, no externals (‘recursive?’) and no ‘svn-multi-fetch’ [1]. I immediately hit a problem. Guix hashes the export of the repository with the keywords processed, while SWH hashes it with unprocessed keywords. For example, take ‘libsmpeg’. It has a file called “mkinstalldirs”, which has a keyword in it: “$Id$”. The SWH loader hashes this as $Id$ while we hash it as $Id: mkinstalldirs 9 1999-10-21 15:55:01Z hercules $ This is not a big issue in terms of coverage checking, but it will be an issue for automatic recovery. Even if you know the exact SWH directory ID, you won’t get a directory that satisfies the daemon’s hash check. I have no idea how hard it is to process the keywords with only data from SWH. In this case, you would have to walk revisions to find the last time “mkinstalldirs” was modified, and then format its metadata. However, I assume the Subversion properties are gone, so there might be edge cases like a file with “$Id$” (or whatever) that Subversion wouldn’t processes. Again, apologies if this is old news. Actually apologies either way, ’cause this is a bit of a downer! -- Tim [1] More precisely, I was going to process recursive ‘svn-fetch’ origins because a lot of them are needlessly marked as recursive. In some (many?) cases, the repositories don’t actually have external references, so the flag does nothing. I was only going to skip the ones where it makes a difference.
Hi, On lun., 20 mars 2023 at 15:09, Ludovic Courtès <ludovic.courtes@inria.fr> wrote: > As a stopgap, I wonder if we could use “double hashing” on our side, but > only for svn: we’d store both the nar sha256 as we currently do, plus > the swhid. It still seems to me that it’d be hard to scale and to > maintain that over time, even if it’s limited to svn. Plus, there’d > still be the problem of ‘svn-multi-fetch’, which is what most TeX Live > packages use. As proposed, somehow, in [1], I think we have nothing to loose with optionally “double hashing” (or even triple or more). I do not understand your concerns in [1] and I will discuss overthere. :-) About ’svn-multi-fetch’ and TeX, what is at our advantage is that most TeX Live packages use ’svn-multi-fetch’ via ’simple-texlive-package’ relying on ’texlive-origin’. And from official TeX documentation [2], Subversion is not the recommended access, instead they suggest to rely on ’rsync’. Moreover, some Git mirror is available. Therefore, about TeX we could ask if relying on Subversion is still the “good” way and maybe we could switch to something else as url-fetch or git-fetch. Hum?! Not convinced by my proposal here. :-) Well, IMHO, there is 2 issues: one about using SWH as fallback for packages using svn-fetch and another one about TeX Live packages. It could be nice to solve them both at once but maybe we should start now one foot, now the other. 1: https://yhetil.org/guix/87jzzxd7z8.fsf@gmail.com 2: https://tug.org/texlive/svn/ Cheers, simon
Ludovic Courtès <ludovic.courtes@inria.fr> skribis: > The attached patch does that: > > scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 680) > $12 = #<<revision> id: "72102de7605a2459ebcb016338ebbf1a997e8c8d" date: #<date nanosecond: 938388 second: 35 minute: 32 hour: 11 day: 6 month: 9 year: 2018 zone-offset: 0> directory: "5c89c025a4cd9d16befdfec12dfc23f7318d0d5b" directory-url: "https://archive.softwareheritage.org/api/1/directory/5c89c025a4cd9d16befdfec12dfc23f7318d0d5b/" parents-ids: ("16da41f1848d77a93aec565320b72b460c429b61") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "680"))> > scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 666) > $13 = #<<revision> id: "148eb1e7206b111af4075c73c656e54c9efed6cb" date: #<date nanosecond: 654167 second: 2 minute: 52 hour: 12 day: 2 month: 8 year: 2018 zone-offset: 0> directory: "ed7b0bd7019fb85cd86d948a97c23b9d43aa8728" directory-url: "https://archive.softwareheritage.org/api/1/directory/ed7b0bd7019fb85cd86d948a97c23b9d43aa8728/" parents-ids: ("0ba2aa7e0d3fc0a1eb3ba72b32094515415ae47a") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "666"))> > > The implementation is pretty bad though, because it walks the revision > history until it finds the right revision number—so you’re likely to > reach the bandwidth rate limit before you’ve found the revision you’re > looking for. > > More importantly, most svn origins cannot be found, or at least not > by passing the URL as-is: > > https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html > > This whole hack looks like a dead end. > > It would be ideal if SWH would compute nar hashes, as you proposed: > > https://gitlab.softwareheritage.org/swh/meta/-/issues/4538 Happy to report that this is solved with nar-sha256 ExtIDs at SWH, which commit 0e73f933b291c7e154c7e019b6de1e2f3a97e4c1 uses to implement the SWH fallback for Subversion checkouts. Wo0t! Ludo’.
diff --git a/guix/swh.scm b/guix/swh.scm index c7c1c873a2..a65635b1db 100644 --- a/guix/swh.scm +++ b/guix/swh.scm @@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU -;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2018, 2019, 2020, 2021, 2023 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2020 Jakub Kądziołka <kuba@kadziolka.net> ;;; Copyright © 2021 Xinglu Chen <public@yoctocell.xyz> ;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com> @@ -75,8 +75,10 @@ (define-module (guix swh) revision-id revision-date revision-directory + revision-parents lookup-revision lookup-origin-revision + lookup-subversion-revision content? content-checksums @@ -207,6 +209,14 @@ (define string* ((? null?) #f) ;Guile-JSON 3.x ('null #f))) ;Guile-JSON 4.x +(define pair-vector->alist + (match-lambda + ('null '()) + ((= vector->list lst) + (map (match-lambda + (#(key value) (cons key value))) + lst)))) + (define %allow-request? ;; Takes a URL and method (e.g., the 'http-get' procedure) and returns true ;; to keep going. This can be used to disallow requests when @@ -346,7 +356,14 @@ (define-json-mapping <revision> make-revision revision? (id revision-id) (date revision-date "date" (maybe-null string->date*)) (directory revision-directory) - (directory-url revision-directory-url "directory_url")) + (directory-url revision-directory-url "directory_url") + (parents-ids revision-parent-ids "parents" + (lambda (vector) + (map (lambda (alist) + (assoc-ref alist "id")) + (vector->list vector)))) + (extra-headers revision-extra-headers ;alist--e.g., with "svn_revision" + "extra_headers" pair-vector->alist)) ;; <https://archive.softwareheritage.org/api/1/content/> (define-json-mapping <content> make-content content? @@ -524,6 +541,50 @@ (define (lookup-origin-revision url tag) (() #f))))) +(define (revision-parents revision) + "Return the parent revision(s) of REVISION." + (filter-map lookup-revision (revision-parent-ids revision))) + +(define (lookup-subversion-revision-in-history revision revision-number) + "Look for Subversion REVISION-NUMBER starting from REVISION and going back +in history." + (let loop ((revision revision)) + (let ((number (and=> (assoc-ref (revision-extra-headers revision) + "svn_revision") + string->number))) + (and number + (cond ((= number revision-number) + ;; Found it! + revision) + ((< number revision-number) + ;; REVISION is ancestor of REVISION-NUMBER, so stop here. + #f) + (else + ;; Check the parent(s) of REVISION. + (any loop (revision-parents revision)))))))) + +(define (lookup-subversion-revision url revision-number) + "Return either #f or the revision of the Subversion repository once +available at URL with the given REVISION-NUMBER." + (match (lookup-origin url) + (#f #f) + (origin + (match (filter (lambda (visit) + ;; Return #f if (visit-snapshot VISIT) would return #f. + (and (visit-snapshot-url visit) + (eq? 'full (visit-status visit)))) + (origin-visits origin)) + (() + #f) + ((visit . _) + (any (lambda (branch) + (match (branch-target branch) + ((? revision? revision) + (lookup-subversion-revision-in-history revision + revision-number)) + (_ #f))) + (snapshot-branches (visit-snapshot visit)))))))) + (define (release-target release) "Return the revision that is the target of RELEASE." (match (release-target-type release)