diff mbox series

[bug#48437] lint: archival: Lookup content in Disarchive database.

Message ID 20210515102814.5944-1-ludo@gnu.org
State Accepted
Headers show
Series [bug#48437] lint: archival: Lookup content in Disarchive database. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Ludovic Courtès May 15, 2021, 10:28 a.m. UTC
* guix/lint.scm (lookup-disarchive-spec): New procedure.
(check-archival): When 'lookup-content' returns #f, call
'lookup-disarchive-spec'.
* guix/download.scm (%disarchive-mirrors): Make public.
---
 guix/download.scm |  1 +
 guix/lint.scm     | 31 +++++++++++++++++++++++++++----
 2 files changed, 28 insertions(+), 4 deletions(-)

Hello!

This patch makes the ‘archival’ checker check the Disarchive database(s)
when SWH ‘lookup-content’ returns #f.  For example, before the patch,
we get:

  $ guix lint -c archival guile-json
  gnu/packages/guile.scm:622:12: guile-json@4.5.2: source not archived on Software Heritage

After the patch, we get nothing (success) thanks to Disarchive metadata
available at:

  https://disarchive.ngyro.com/sha256/1ab046ec36b1c44c041ac275568d818784d71fab9a5d95f9128cfe8a25051933

It assumes that the swhid found in the Disarchive metadata is valid, a
reasonable assumption IMO.

Thoughts?

Ludo’.

Comments

Timothy Sample May 18, 2021, 3:19 a.m. UTC | #1
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

> This patch makes the ‘archival’ checker check the Disarchive database(s)
> when SWH ‘lookup-content’ returns #f.  [...]
>
> It assumes that the swhid found in the Disarchive metadata is valid, a
> reasonable assumption IMO.
>
> Thoughts?

One thing to consider is that just because Disarchive has captured an
archive’s metadata and computed the SWHID of its contents doesn’t mean
that the contents are actually in the SWH archive.  (Maybe that’s what
you meant when you wrote about valid IDs above.)  It would be neat if
the lint check looked up the SWHID to see if it exists.  Unfortunately,
Disarchive doesn’t make getting the underlying SWHID easy at the moment.
One option would be to pass a resolver to “disarchive-assemble” that
exfiltrates the ID using “set!”.  Another one would be to “read” the
specification and search for a form like ‘(swhid "swh:1:dir:...")’.
Neither is particularly lovely....

Other than that, the code looks good and everything seems to work.  :)


-- Tim
Ludovic Courtès May 18, 2021, 9:47 p.m. UTC | #2
Hi!

Timothy Sample <samplet@ngyro.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> This patch makes the ‘archival’ checker check the Disarchive database(s)
>> when SWH ‘lookup-content’ returns #f.  [...]
>>
>> It assumes that the swhid found in the Disarchive metadata is valid, a
>> reasonable assumption IMO.
>>
>> Thoughts?
>
> One thing to consider is that just because Disarchive has captured an
> archive’s metadata and computed the SWHID of its contents doesn’t mean
> that the contents are actually in the SWH archive.  (Maybe that’s what
> you meant when you wrote about valid IDs above.)

Yes, I thought we could assume the contents were necessarily in the
archive.

> It would be neat if the lint check looked up the SWHID to see if it
> exists.  Unfortunately, Disarchive doesn’t make getting the underlying
> SWHID easy at the moment.  One option would be to pass a resolver to
> “disarchive-assemble” that exfiltrates the ID using “set!”.  Another
> one would be to “read” the specification and search for a form like
> ‘(swhid "swh:1:dir:...")’.  Neither is particularly lovely....

Hmm yeah.  There’s no API to deserialize the (disarchive …) sexp as a
record, right?

> Other than that, the code looks good and everything seems to work.  :)

Maybe we can assume (with a comment) that the SWHID points to a valid
content, and when creating the Disarchive database we actually make sure
this is the case?

More generally, we need to talk about that database, how to create it
and maintain it.  :-)

Thanks,
Ludo’.
diff mbox series

Patch

diff --git a/guix/download.scm b/guix/download.scm
index 72094e7318..b6eb97e6fa 100644
--- a/guix/download.scm
+++ b/guix/download.scm
@@ -35,6 +35,7 @@ 
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:export (%mirrors
+            %disarchive-mirrors
             (url-fetch* . url-fetch)
             url-fetch/executable
             url-fetch/tarbomb
diff --git a/guix/lint.scm b/guix/lint.scm
index 1bebfe03d3..c6ad54ddeb 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -30,6 +30,7 @@ 
 
 (define-module (guix lint)
   #:use-module (guix store)
+  #:autoload   (guix base16) (bytevector->base16-string)
   #:use-module (guix base32)
   #:use-module (guix diagnostics)
   #:use-module (guix download)
@@ -1227,6 +1228,23 @@  upstream releases")
                             #:field 'source)))))))
 
 
+(define (lookup-disarchive-spec hash)
+  "Return true if Disarchive mirrors have a spec for HASH, false otherwise."
+  (any (lambda (mirror)
+         (with-networking-fail-safe
+          (format #f (G_ "failed to access Disarchive database at ~a")
+                  mirror)
+          #f
+          (let* ((url (string-append mirror
+                                     (symbol->string
+                                      (content-hash-algorithm hash))
+                                     "/"
+                                     (bytevector->base16-string
+                                      (content-hash-value hash))))
+                 (response (http-head url)))
+            (= 200 (response-code response)))))
+       %disarchive-mirrors))
+
 (define (check-archival package)
   "Check whether PACKAGE's source code is archived on Software Heritage.  If
 it's not, and if its source code is a VCS snapshot, then send a \"save\"
@@ -1302,10 +1320,15 @@  try again later")
                                         (symbol->string
                                          (content-hash-algorithm hash)))
                    (#f
-                    (list (make-warning package
-                                        (G_ "source not archived on Software \
-Heritage")
-                                        #:field 'source)))
+                    ;; If SWH doesn't have HASH as is, it may be because it's
+                    ;; a hand-crafted tarball.  In that case, check whether
+                    ;; the Disarchive database has an entry for that tarball.
+                    (if (lookup-disarchive-spec hash)
+                        '()
+                        (list (make-warning package
+                                            (G_ "source not archived on Software \
+Heritage and missing from the Disarchive database")
+                                            #:field 'source))))
                    ((? content?)
                     '())))
                '()))))