diff mbox series

[bug#42019,1/1] website: Add integrity to JSON sources.

Message ID 20200623152139.512-1-zimon.toutoune@gmail.com
State Accepted
Headers show
Series sources.json compliant with SWH loader | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job

Commit Message

Simon Tournier June 23, 2020, 3:21 p.m. UTC
* website/apps/packages/builder.scm (origin->json): Add integrity field using
SRI format.
---
 website/apps/packages/builder.scm | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

Comments

Ludovic Courtès June 27, 2020, 5:05 p.m. UTC | #1
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

> * website/apps/packages/builder.scm (origin->json): Add integrity field using
> SRI format.

[...]

> -             `(("url" . ,(list->vector
> +             `(("urls" . ,(list->vector
>                            (resolve
>                             (match uri
>                               ((? string? url) (list url))

Is this change OK for Repology?  Or should we keep “url” in addition to
“urls”?

>    (make-page "sources.json"
>               `(("sources" . ,(list->vector (map package->json (all-packages))))
> -               ("version" . "1"))
> +               ("version" . "1")
> +               ("revision" . ,%guix-version))

There’s no guarantee that ‘%guix-version’ is a commit ID, so perhaps we
should do something like:

  (match (current-profile)
    (#f %guix-version)   ;for lack of a better ID
    (profile
     (let ((channel (find guix-channel? (profile-channels profile))))
       (channel-commit channel))))

Otherwise LGTM, thank you!

Ludo’.
Simon Tournier June 27, 2020, 5:41 p.m. UTC | #2
Hi Ludo,

Thank you for the review.

On Sat, 27 Jun 2020 at 19:05, Ludovic Courtès <ludo@gnu.org> wrote:

>> -             `(("url" . ,(list->vector
>> +             `(("urls" . ,(list->vector
>>                            (resolve
>>                             (match uri
>>                               ((? string? url) (list url))
>
> Is this change OK for Repology?  Or should we keep “url” in addition to
> “urls”?

From what I understood of their API [1] when I checked it, I may say yes. :-)
Well, I do not think that repology parses the field 'sources'.

1: https://repology.org/addrepo


> There’s no guarantee that ‘%guix-version’ is a commit ID, so perhaps we
> should do something like:

Thanks for the tip, I did not know.  I will sent a v2 with your
suggestion or feel free to update the patch and push it. :-)


Cheers,
simon
Simon Tournier June 29, 2020, 5:01 p.m. UTC | #3
Hi Ludo,

On Sat, 27 Jun 2020 at 19:42, zimoun <zimon.toutoune@gmail.com> wrote:

> Thanks for the tip, I did not know.  I will sent a v2 with your
> suggestion or feel free to update the patch and push it. :-)

v2 is sent.

BTW, in the SWH picture and after a chat video with lewo, I do not
think that the website is the right place.  Instead, it should go to
ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
publish".  Well, the next step is to have a collection of sources.json
-- that the point of "revision".  WDYT?

Cheers,
simon
Ludovic Courtès June 29, 2020, 8:41 p.m. UTC | #4
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> BTW, in the SWH picture and after a chat video with lewo, I do not
> think that the website is the right place.  Instead, it should go to
> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
> publish".  Well, the next step is to have a collection of sources.json
> -- that the point of "revision".  WDYT?

The Guix Data Service would be a natural place for ‘sources.json’ IMO.
Thoughts, Chris?

Thanks,
Ludo’.
Simon Tournier June 29, 2020, 11:28 p.m. UTC | #5
Hi Chris,

On Mon, 29 Jun 2020 at 22:41, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> BTW, in the SWH picture and after a chat video with lewo, I do not
>> think that the website is the right place.  Instead, it should go to
>> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
>> publish".  Well, the next step is to have a collection of sources.json
>> -- that the point of "revision".  WDYT?
>
> The Guix Data Service would be a natural place for ‘sources.json’ IMO.
> Thoughts, Chris?

If it goes to the GDS, then first let point me where to start. :-)

And second, it could be nice in the "near" future to have at least 2
sources.json: one for the last commit refreshed every X minutes (or
hours) and another one containing the concatenation of all the sources
of Guix (at least the one reachable by guix time-machine i.e. after the
big overhaul of Inferiors).  I will go on #swh-devel or reach lewo to
know how "near" it is on SWH side.


Cheers,
simon
Christopher Baines July 1, 2020, 7:35 p.m. UTC | #6
zimoun <zimon.toutoune@gmail.com> writes:

> Hi Chris,
>
> On Mon, 29 Jun 2020 at 22:41, Ludovic Courtès <ludo@gnu.org> wrote:
>> zimoun <zimon.toutoune@gmail.com> skribis:
>>
>>> BTW, in the SWH picture and after a chat video with lewo, I do not
>>> think that the website is the right place.  Instead, it should go to
>>> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
>>> publish".  Well, the next step is to have a collection of sources.json
>>> -- that the point of "revision".  WDYT?
>>
>> The Guix Data Service would be a natural place for ‘sources.json’ IMO.
>> Thoughts, Chris?
>
> If it goes to the GDS, then first let point me where to start. :-)

I think this does sound like a good use of the Guix Data
Service. Unfortunately, the sources of packages aren't currently stored
in the Guix Data Service database, so I'm guessing this will require
storing some new data, then working out how to present it.

A question maybe for you Simon, what would be the perfect data for this
particular use case? I gather it's something about the (source ...)
field in packages, probably for all the exported (plus maybe
not-exported packages).

> And second, it could be nice in the "near" future to have at least 2
> sources.json: one for the last commit refreshed every X minutes (or
> hours) and another one containing the concatenation of all the sources
> of Guix (at least the one reachable by guix time-machine i.e. after the
> big overhaul of Inferiors).  I will go on #swh-devel or reach lewo to
> know how "near" it is on SWH side.

Once you can get the data for an individual revision in the Guix Data
Service, it should be reasonably easy to just get the data for multiple
revisions, say all for the last week.

Chris
Simon Tournier July 1, 2020, 8:29 p.m. UTC | #7
Hi Chris,

On Wed, 01 Jul 2020 at 20:35, Christopher Baines <mail@cbaines.net> wrote:

> A question maybe for you Simon, what would be the perfect data for this
> particular use case? I gather it's something about the (source ...)
> field in packages, probably for all the exported (plus maybe
> not-exported packages).

Currently the website builds source.json by using 'fold-packages'
(traversing all the modules and returning all the public variables, if I
read correctly) then excluding 'package-superseded' and
'package-replacement'.

Well, maybe an example is simpler than a lot of words.  The resulting
JSON looks like:

--8<---------------cut here---------------start------------->8---
    {
      "type": "url",
      "urls": [
        "https://ftpmirror.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
        "ftp://ftp.cs.tu-berlin.de/pub/gnu/a2ps/a2ps-4.14.tar.gz",
        "ftp://ftp.funet.fi/pub/mirrors/ftp.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
        "http://ftp.gnu.org/pub/gnu/a2ps/a2ps-4.14.tar.gz"
      ],
      "integrity": "sha256-866NPUVkpBtuKiHyN9LysQT0gQhZHouDSXUAGCo6s6Q="
    },
    {
      "type": "git",
      "git_url": "https://github.com/opencog/agi-bio.git",
      "git_ref": "b5c6f3d99e8cca3798bf0cdf2c32f4bdb8098efb"
    },
--8<---------------cut here---------------end--------------->8---

So basically, the data are: origin-method, origin-uri (implies reference
URLs and {git,hg,svn}-{commit,revision}), origin-hash (implies
content-hash-{value,algorithm}).  Note that the list of mirrors are
necessary too.

I have given a look to

  http://git.savannah.gnu.org/cgit/guix/data-service.git/tree/

but I am not sure to understand where the SQL table is defined.


Thanks,
simon
diff mbox series

Patch

diff --git a/website/apps/packages/builder.scm b/website/apps/packages/builder.scm
index d2bccd7..e20d672 100644
--- a/website/apps/packages/builder.scm
+++ b/website/apps/packages/builder.scm
@@ -46,6 +46,8 @@ 
   #:use-module (guix hg-download)
   #:use-module (guix utils)                       ;location
   #:use-module ((guix build download) #:select (maybe-expand-mirrors))
+  #:use-module ((guix base64) #:select (base64-encode))
+  #:use-module ((guix config) #:select (%guix-version))
   #:use-module (json)
   #:use-module (ice-9 match)
   #:use-module ((web uri) #:select (string->uri uri->string))
@@ -114,7 +116,7 @@ 
     ,@(cond ((or (eq? url-fetch method)
                  (eq? url-fetch/tarbomb method)
                  (eq? url-fetch/zipbomb method))
-             `(("url" . ,(list->vector
+             `(("urls" . ,(list->vector
                           (resolve
                            (match uri
                              ((? string? url) (list url))
@@ -128,6 +130,16 @@ 
             ((eq? hg-fetch method)
              `(("hg_url" . ,(hg-reference-url uri))))
             (else '()))
+    ,@(if (or (eq? url-fetch method)
+              (eq? url-fetch/tarbomb method)
+              (eq? url-fetch/zipbomb method))
+          (let* ((content-hash (origin-hash origin))
+                 (hash-value (content-hash-value content-hash))
+                 (hash-algorithm (content-hash-algorithm content-hash))
+                 (algorithm-string (symbol->string hash-algorithm)))
+            `(("integrity" . ,(string-append algorithm-string "-"
+                                             (base64-encode hash-value)))))
+          '())
     ,@(if (eq? method git-fetch)
           `(("git_ref" . ,(git-reference-commit uri)))
           '())
@@ -174,9 +186,11 @@ 
              scm->json))
 
 (define (sources-json-builder)
-  "Return a JSON page listing all the sources.
-
-See <https://forge.softwareheritage.org/D2025#51269>."
+  "Return a JSON page listing all the sources."
+  ;; The Software Heritage format is described here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/tests/data/https_nix-community.github.io/nixpkgs-swh_sources.json
+  ;; And the loader is implemented here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/
   (define (package->json package)
     `(,@(if (origin? (package-source package))
             (origin->json (package-source package))
@@ -185,7 +199,8 @@  See <https://forge.softwareheritage.org/D2025#51269>."
 
   (make-page "sources.json"
              `(("sources" . ,(list->vector (map package->json (all-packages))))
-               ("version" . "1"))
+               ("version" . "1")
+               ("revision" . ,%guix-version))
              scm->json))
 
 (define (index-builder)