diff mbox series

[bug#43442] Fixes init of #42162: gforge.inria.fr down Dec. 2020

Message ID 87jzzbms54.fsf_-_@gnu.org
State New
Headers show
Series [bug#43442] Fixes init of #42162: gforge.inria.fr down Dec. 2020 | expand

Commit Message

Ludovic Courtès March 20, 2023, 2:09 p.m. UTC
Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

> Ah yes, under “extra_headers” there’s the SVN revision number.  (guix
> swh) doesn’t expose “extra_headers” yet, but once it does, we can walk
> snapshots until we find the SVN revision we’re looking for.
>
> scheme@(guile-user)> (lookup-origin "https://scm.gforge.inria.fr/anonscm/svn/mpfi/")
> $3 = #<<origin> visits-url: "https://archive.softwareheritage.org/api/1/origin/https://scm.gforge.inria.fr/anonscm/svn/mpfi/visits/" type: #f url: "https://scm.gforge.inria.fr/anonscm/svn/mpfi">
> scheme@(guile-user)> (origin-visits $3)
> $4 = (#<<visit> date: #<date nanosecond: 902765 second: 25 minute: 53 hour: 21 day: 21 month: 9 year: 2020 zone-offset: 0> origin: "https://scm.gforge.inria.fr/anonscm/svn/mpfi" url: "https://archive.softwareheritage.org/api/1/origin/https://scm.gforge.inria.fr/anonscm/svn/mpfi/visit/1/" snapshot-url: "https://archive.softwareheritage.org/api/1/snapshot/e7fdd4dc6230f710dbc55c1b308804fa1b5f51f0/" status: full number: 1>)
> scheme@(guile-user)> (visit-snapshot (car $4))
> $5 = #<<snapshot> branches: (#<<branch> name: "HEAD" target-type: revision target-url: "https://archive.softwareheritage.org/api/1/revision/f7b445a6bdc38bf075f29265120ca49824f698ea/">)>
>
> So the next step is to augment (guix swh) with a
> ‘lookup-subversion-revision’ procedure.

The attached patch does that:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 680)
$12 = #<<revision> id: "72102de7605a2459ebcb016338ebbf1a997e8c8d" date: #<date nanosecond: 938388 second: 35 minute: 32 hour: 11 day: 6 month: 9 year: 2018 zone-offset: 0> directory: "5c89c025a4cd9d16befdfec12dfc23f7318d0d5b" directory-url: "https://archive.softwareheritage.org/api/1/directory/5c89c025a4cd9d16befdfec12dfc23f7318d0d5b/" parents-ids: ("16da41f1848d77a93aec565320b72b460c429b61") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "680"))>
scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 666)
$13 = #<<revision> id: "148eb1e7206b111af4075c73c656e54c9efed6cb" date: #<date nanosecond: 654167 second: 2 minute: 52 hour: 12 day: 2 month: 8 year: 2018 zone-offset: 0> directory: "ed7b0bd7019fb85cd86d948a97c23b9d43aa8728" directory-url: "https://archive.softwareheritage.org/api/1/directory/ed7b0bd7019fb85cd86d948a97c23b9d43aa8728/" parents-ids: ("0ba2aa7e0d3fc0a1eb3ba72b32094515415ae47a") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "666"))>
--8<---------------cut here---------------end--------------->8---

The implementation is pretty bad though, because it walks the revision
history until it finds the right revision number—so you’re likely to
reach the bandwidth rate limit before you’ve found the revision you’re
looking for.

More importantly, most svn origins cannot be found, or at least not
by passing the URL as-is:

  https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html

This whole hack looks like a dead end.

It would be ideal if SWH would compute nar hashes, as you proposed:

  https://gitlab.softwareheritage.org/swh/meta/-/issues/4538

As a stopgap, I wonder if we could use “double hashing” on our side, but
only for svn: we’d store both the nar sha256 as we currently do, plus
the swhid.  It still seems to me that it’d be hard to scale and to
maintain that over time, even if it’s limited to svn.  Plus, there’d
still be the problem of ‘svn-multi-fetch’, which is what most TeX Live
packages use.

Thoughts?

Ludo’.

Comments

Timothy Sample March 22, 2023, 10:42 p.m. UTC | #1
Hello,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> This whole hack looks like a dead end.
>
> It would be ideal if SWH would compute nar hashes, as you proposed:
>
>   https://gitlab.softwareheritage.org/swh/meta/-/issues/4538
>
> As a stopgap, I wonder if we could use “double hashing” on our side, but
> only for svn: we’d store both the nar sha256 as we currently do, plus
> the swhid.  It still seems to me that it’d be hard to scale and to
> maintain that over time, even if it’s limited to svn.  Plus, there’d
> still be the problem of ‘svn-multi-fetch’, which is what most TeX Live
> packages use.
>
> Thoughts?

Not too many, but I do have more bad news.  Apologies if this is already
known, but I’m just getting up to speed with how SWH handles Subversion
(for coverage checking) and thought this seemed pretty significant.

I was starting with doing a simple check for the “easy” Subversion
repositories.  That is, no externals (‘recursive?’) and no
‘svn-multi-fetch’ [1].  I immediately hit a problem.  Guix hashes the
export of the repository with the keywords processed, while SWH hashes
it with unprocessed keywords.

For example, take ‘libsmpeg’.  It has a file called “mkinstalldirs”,
which has a keyword in it: “$Id$”.  The SWH loader hashes this as

    $Id$

while we hash it as

    $Id: mkinstalldirs 9 1999-10-21 15:55:01Z hercules $

This is not a big issue in terms of coverage checking, but it will be an
issue for automatic recovery.  Even if you know the exact SWH directory
ID, you won’t get a directory that satisfies the daemon’s hash check.  I
have no idea how hard it is to process the keywords with only data from
SWH.  In this case, you would have to walk revisions to find the last
time “mkinstalldirs” was modified, and then format its metadata.
However, I assume the Subversion properties are gone, so there might be
edge cases like a file with “$Id$” (or whatever) that Subversion
wouldn’t processes.

Again, apologies if this is old news.  Actually apologies either way,
’cause this is a bit of a downer!


-- Tim

[1] More precisely, I was going to process recursive ‘svn-fetch’ origins
because a lot of them are needlessly marked as recursive.  In some
(many?) cases, the repositories don’t actually have external references,
so the flag does nothing.  I was only going to skip the ones where it
makes a difference.
Simon Tournier April 3, 2023, 1:34 p.m. UTC | #2
Hi,

On lun., 20 mars 2023 at 15:09, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

> As a stopgap, I wonder if we could use “double hashing” on our side, but
> only for svn: we’d store both the nar sha256 as we currently do, plus
> the swhid.  It still seems to me that it’d be hard to scale and to
> maintain that over time, even if it’s limited to svn.  Plus, there’d
> still be the problem of ‘svn-multi-fetch’, which is what most TeX Live
> packages use.

As proposed, somehow, in [1], I think we have nothing to loose with
optionally “double hashing” (or even triple or more).  I do not
understand your concerns in [1] and I will discuss overthere. :-)

About ’svn-multi-fetch’ and TeX, what is at our advantage is that most
TeX Live packages use ’svn-multi-fetch’ via ’simple-texlive-package’
relying on ’texlive-origin’.  And from official TeX documentation [2],
Subversion is not the recommended access, instead they suggest to rely
on ’rsync’.  Moreover, some Git mirror is available.

Therefore, about TeX we could ask if relying on Subversion is still the
“good” way and maybe we could switch to something else as url-fetch or
git-fetch.  Hum?!  Not convinced by my proposal here. :-)

Well, IMHO, there is 2 issues: one about using SWH as fallback for
packages using svn-fetch and another one about TeX Live packages.  It
could be nice to solve them both at once but maybe we should start now one
foot, now the other.

1: https://yhetil.org/guix/87jzzxd7z8.fsf@gmail.com
2: https://tug.org/texlive/svn/


Cheers,
simon
Ludovic Courtès March 9, 2024, 10:34 p.m. UTC | #3
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> The attached patch does that:
>
> scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 680)
> $12 = #<<revision> id: "72102de7605a2459ebcb016338ebbf1a997e8c8d" date: #<date nanosecond: 938388 second: 35 minute: 32 hour: 11 day: 6 month: 9 year: 2018 zone-offset: 0> directory: "5c89c025a4cd9d16befdfec12dfc23f7318d0d5b" directory-url: "https://archive.softwareheritage.org/api/1/directory/5c89c025a4cd9d16befdfec12dfc23f7318d0d5b/" parents-ids: ("16da41f1848d77a93aec565320b72b460c429b61") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "680"))>
> scheme@(guile-user)> (lookup-subversion-revision "https://scm.gforge.inria.fr/anonscm/svn/mpfi" 666)
> $13 = #<<revision> id: "148eb1e7206b111af4075c73c656e54c9efed6cb" date: #<date nanosecond: 654167 second: 2 minute: 52 hour: 12 day: 2 month: 8 year: 2018 zone-offset: 0> directory: "ed7b0bd7019fb85cd86d948a97c23b9d43aa8728" directory-url: "https://archive.softwareheritage.org/api/1/directory/ed7b0bd7019fb85cd86d948a97c23b9d43aa8728/" parents-ids: ("0ba2aa7e0d3fc0a1eb3ba72b32094515415ae47a") extra-headers: (("svn_repo_uuid" . "e2f78e0c-bb60-4709-9413-9660a31d4696") ("svn_revision" . "666"))>
>
> The implementation is pretty bad though, because it walks the revision
> history until it finds the right revision number—so you’re likely to
> reach the bandwidth rate limit before you’ve found the revision you’re
> looking for.
>
> More importantly, most svn origins cannot be found, or at least not
> by passing the URL as-is:
>
>   https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html
>
> This whole hack looks like a dead end.
>
> It would be ideal if SWH would compute nar hashes, as you proposed:
>
>   https://gitlab.softwareheritage.org/swh/meta/-/issues/4538

Happy to report that this is solved with nar-sha256 ExtIDs at SWH, which
commit 0e73f933b291c7e154c7e019b6de1e2f3a97e4c1 uses to implement the
SWH fallback for Subversion checkouts.  Wo0t!

Ludo’.
diff mbox series

Patch

diff --git a/guix/swh.scm b/guix/swh.scm
index c7c1c873a2..a65635b1db 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -1,5 +1,5 @@ 
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018, 2019, 2020, 2021, 2023 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2020 Jakub Kądziołka <kuba@kadziolka.net>
 ;;; Copyright © 2021 Xinglu Chen <public@yoctocell.xyz>
 ;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
@@ -75,8 +75,10 @@  (define-module (guix swh)
             revision-id
             revision-date
             revision-directory
+            revision-parents
             lookup-revision
             lookup-origin-revision
+            lookup-subversion-revision
 
             content?
             content-checksums
@@ -207,6 +209,14 @@  (define string*
     ((? null?) #f)                                ;Guile-JSON 3.x
     ('null #f)))                                  ;Guile-JSON 4.x
 
+(define pair-vector->alist
+  (match-lambda
+    ('null '())
+    ((= vector->list lst)
+     (map (match-lambda
+            (#(key value) (cons key value)))
+          lst))))
+
 (define %allow-request?
   ;; Takes a URL and method (e.g., the 'http-get' procedure) and returns true
   ;; to keep going.  This can be used to disallow requests when
@@ -346,7 +356,14 @@  (define-json-mapping <revision> make-revision revision?
   (id            revision-id)
   (date          revision-date "date" (maybe-null string->date*))
   (directory     revision-directory)
-  (directory-url revision-directory-url "directory_url"))
+  (directory-url revision-directory-url "directory_url")
+  (parents-ids   revision-parent-ids "parents"
+                 (lambda (vector)
+                   (map (lambda (alist)
+                          (assoc-ref alist "id"))
+                        (vector->list vector))))
+  (extra-headers revision-extra-headers      ;alist--e.g., with "svn_revision"
+                 "extra_headers" pair-vector->alist))
 
 ;; <https://archive.softwareheritage.org/api/1/content/>
 (define-json-mapping <content> make-content content?
@@ -524,6 +541,50 @@  (define (lookup-origin-revision url tag)
         (()
          #f)))))
 
+(define (revision-parents revision)
+  "Return the parent revision(s) of REVISION."
+  (filter-map lookup-revision (revision-parent-ids revision)))
+
+(define (lookup-subversion-revision-in-history revision revision-number)
+  "Look for Subversion REVISION-NUMBER starting from REVISION and going back
+in history."
+  (let loop ((revision revision))
+    (let ((number (and=> (assoc-ref (revision-extra-headers revision)
+                                    "svn_revision")
+                         string->number)))
+      (and number
+           (cond ((= number revision-number)
+                  ;; Found it!
+                  revision)
+                 ((< number revision-number)
+                  ;; REVISION is ancestor of REVISION-NUMBER, so stop here.
+                  #f)
+                 (else
+                  ;; Check the parent(s) of REVISION.
+                  (any loop (revision-parents revision))))))))
+
+(define (lookup-subversion-revision url revision-number)
+  "Return either #f or the revision of the Subversion repository once
+available at URL with the given REVISION-NUMBER."
+  (match (lookup-origin url)
+    (#f #f)
+    (origin
+      (match (filter (lambda (visit)
+                       ;; Return #f if (visit-snapshot VISIT) would return #f.
+                       (and (visit-snapshot-url visit)
+                            (eq? 'full (visit-status visit))))
+                     (origin-visits origin))
+        (()
+         #f)
+        ((visit . _)
+         (any (lambda (branch)
+                (match (branch-target branch)
+                  ((? revision? revision)
+                   (lookup-subversion-revision-in-history revision
+                                                          revision-number))
+                  (_ #f)))
+              (snapshot-branches (visit-snapshot visit))))))))
+
 (define (release-target release)
   "Return the revision that is the target of RELEASE."
   (match (release-target-type release)