mbox

[bug#37224,0/4] Add 'archival' checker for 'guix lint'

Message ID 20190829231653.7607-1-ludo@gnu.org
Headers show

Message

Ludovic Courtès Aug. 29, 2019, 11:16 p.m. UTC
Hello Guix!

This patch series adds an ‘archival’ checker for ‘guix lint’, documented
like this:

     Checks whether the package’s source code is archived at Software
     Heritage (https://www.softwareheritage.org).

     When the source code that is not archived comes from a
     version-control system (VCS)—e.g., it’s obtained with ‘git-fetch’,
     send Software Heritage a “save” request so that it eventually
     archives it.  This ensures that the source will remain available in
     the long term, and that Guix can fall back to Software Heritage
     should the source code disappear from its original host.  The
     status of recent “save” requests can be viewed on-line
     (https://archive.softwareheritage.org/save/#requests).

     When source code is a tarball obtained with ‘url-fetch’, simply
     print a message when it is not archived.  As of this writing
     Software Heritage does not allow requests to save arbitrary
     tarballs; we are working on ways to ensure that non-VCS source code
     is also archived.

     Software Heritage limits the request rate per IP address
     (https://archive.softwareheritage.org/api/#rate-limiting).  When
     the limit is reached, ‘guix lint’ prints a message and the
     ‘archival’ checker stops doing anything until that limit has been
     reset.

Currently, only 25% of our packages are not fetched with ‘url-fetch’.
For the remaining 75%, this checker can only report whether the tarball
is missing (and apart from ftp.gnu.org and a few other exceptions, it
usually _is_ missing) and cannot actually save it.

Anyway, it’s a first step in that direction.  Feedback welcome!

The second step will be to write a “lister” for Software Heritage that
grabs the list of source code URLs from
<https://guix.gnu.org/packages.json>.  That could would run at SWH
and it could potentially grab the tarballs, not just the VCS checkouts.
Here’s are examples:

  https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py
  https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py

It should be quite easy for a Pythonista to write something similar
for our ‘packages.json’.  Any takers?  :-)

Ludo’.

Ludovic Courtès (4):
  tests: 'with-http-server' accepts multiple responses.
  swh: Add hooks for rate limiting handling.
  swh: Make 'commit-id?' public.
  lint: Add 'archival' checker.

 doc/guix.texi         |  25 ++++++
 guix/lint.scm         |  96 +++++++++++++++++++++-
 guix/swh.scm          |  88 ++++++++++++++++-----
 guix/tests/http.scm   |  39 +++++----
 tests/derivations.scm |  12 +--
 tests/lint.scm        | 179 ++++++++++++++++++++++++++++++++----------
 tests/swh.scm         |  41 +++++++++-
 7 files changed, 395 insertions(+), 85 deletions(-)

Comments

Ludovic Courtès Sept. 2, 2019, 1:28 p.m. UTC | #1
Hello,

Ludovic Courtès <ludo@gnu.org> skribis:

>   tests: 'with-http-server' accepts multiple responses.
>   swh: Add hooks for rate limiting handling.
>   swh: Make 'commit-id?' public.
>   lint: Add 'archival' checker.

I went ahead and pushed these at commit
55549c7b9b778a79d3e1f3d085861ef36aabdca6.

I asked for feedback on #swh-devel and olasd (Nicolas Dandrimont), one
of the SWH developers, replied:

--8<---------------cut here---------------start------------->8---
<olasd> civodul: this seems like a sensible design to me; Does `guix lint`
	automatically call other network services? maybe the save request
	should be an optional flag  [13:55]
<olasd> (automatically _checking_ is fine; automatically _saving_, I don't
	know)
<civodul> olasd: there's a 'refresh' checker that calls out to services to
	  determine whether a newer version of the package is available, for
	  instance  [14:01]
<civodul> initially i thought about not saving at all, and just writing "you
	  should save this"
<civodul> but then i thought it's more convenient to just do it right away
<civodul> it's unlikely to send garbage anyway, and it'll necessarily send
	  only public code, and very likely only free code  [14:02]
<civodul> or did you have other concerns?
<olasd> I don't think it's going to be an issue for us  [14:08]
<olasd> I would just (personally) be surprised if a lint tool I'm using
	started to have side effects on somewhat unrelated systems :)
								        [14:09]
[...]

<civodul> olasd: ah true, though i guess we just got used to that ;-)  [14:12]
<civodul> anyway, thanks for your feedback!
<olasd> civodul: feel free to quote me by mail if you want to keep it archived
--8<---------------cut here---------------end--------------->8---

Ludo’.
Simon Tournier Sept. 11, 2019, 10:20 a.m. UTC | #2
Hi,

Nice !
And it is so aligned with their recent announcement [1] ;-)

[1] https://www.softwareheritage.org/2019/08/05/saving-and-referencing-research-software-in-software-heritage/

On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:

> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
> For the remaining 75%, this checker can only report whether the tarball
> is missing (and apart from ftp.gnu.org and a few other exceptions, it
> usually _is_ missing) and cannot actually save it.

Maybe I miss something, but for example guile-2.0 is not yet archived.
I am not able to find it with their search resources. And `guix lint
-c archival guile@2.0' reports "guile@2.0.14: source not archived on
Software Heritage".


> Anyway, it’s a first step in that direction.  Feedback welcome!

I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
SWH that the automatic "save" should be optional (even if the default
is save=true).


> The second step will be to write a “lister” for Software Heritage that
> grabs the list of source code URLs from
> <https://guix.gnu.org/packages.json>.  That could would run at SWH
> and it could potentially grab the tarballs, not just the VCS checkouts.
> Here’s are examples:
>
>   https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py
>   https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py
>
> It should be quite easy for a Pythonista to write something similar
> for our ‘packages.json’.  Any takers?  :-)

I am not sure to understand all but I will give a look... I am reading
their GSoC about this topic [2].

[2] https://wiki.softwareheritage.org/wiki/Google_Summer_of_Code_2019/Increase_archive_coverage


All the best,
simon
Ludovic Courtès Sept. 12, 2019, 7:41 a.m. UTC | #3
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

> On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
>> For the remaining 75%, this checker can only report whether the tarball
>> is missing (and apart from ftp.gnu.org and a few other exceptions, it
>> usually _is_ missing) and cannot actually save it.
>
> Maybe I miss something, but for example guile-2.0 is not yet archived.
> I am not able to find it with their search resources. And `guix lint
> -c archival guile@2.0' reports "guile@2.0.14: source not archived on
> Software Heritage".

Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so I
don’t know why this one is missing.  We’d have to check with them.

> I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
> SWH that the automatic "save" should be optional (even if the default
> is save=true).

Maybe we could have a flag somewhere to turn it off?  The good thing of
having it on (or opt-out) is that we increase the chances that the code
we care about is archived.  :-)

>> The second step will be to write a “lister” for Software Heritage that
>> grabs the list of source code URLs from
>> <https://guix.gnu.org/packages.json>.  That could would run at SWH
>> and it could potentially grab the tarballs, not just the VCS checkouts.
>> Here’s are examples:
>>
>>   https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py
>>   https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py
>>
>> It should be quite easy for a Pythonista to write something similar
>> for our ‘packages.json’.  Any takers?  :-)
>
> I am not sure to understand all but I will give a look... I am reading
> their GSoC about this topic [2].

Awesome, thank you!  Having a “guix” lister in place would be perfect.

Ludo’.
Simon Tournier Sept. 12, 2019, 9:52 a.m. UTC | #4
Hi Ludo,

On Thu, 12 Sep 2019 at 09:41, Ludovic Courtès <ludo@gnu.org> wrote:

> zimoun <zimon.toutoune@gmail.com> skribis:
>
> > On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:
> >
> >> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
> >> For the remaining 75%, this checker can only report whether the tarball
> >> is missing (and apart from ftp.gnu.org and a few other exceptions, it
> >> usually _is_ missing) and cannot actually save it.

And it is interesting that Nix has the same stats. ;-)

https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00024.html


> > Maybe I miss something, but for example guile-2.0 is not yet archived.
> > I am not able to find it with their search resources. And `guix lint
> > -c archival guile@2.0' reports "guile@2.0.14: source not archived on
> > Software Heritage".
>
> Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so I
> don’t know why this one is missing.  We’d have to check with them.

Maybe I have wrong, but bunch of GNU packages seems missing. :-)


> > I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
> > SWH that the automatic "save" should be optional (even if the default
> > is save=true).
>
> Maybe we could have a flag somewhere to turn it off?  The good thing of
> having it on (or opt-out) is that we increase the chances that the code
> we care about is archived.  :-)

I agree. :-)


Speaking of UI, I would expect 2 different commands:

 - one to check if the package is in SWH, say:
    guix package <name> --is-in-swh
 - one to send a "save" request
    guix lint <name> -c archival

And adding an option to turn "the push" off, say:
  guix lint <name> --no-archival

Because when linting the process is generally iterative:
  guix lint <name>
  # fix mistake
  guix lint <name>
 # fix other mistake
 etc.
and it will save network resource (latency, etc.) by avoiding to check
again and again in this lint process; I guess.

Or even something in this flavour should be a better UI:

  guix lint <name> --checkers=description,synopsis
--no-checkers=license,archival

What do you think?



Cheers,
simon
Ludovic Courtès Sept. 13, 2019, 8:49 a.m. UTC | #5
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

> Or even something in this flavour should be a better UI:
>
>   guix lint <name> --checkers=description,synopsis
> --no-checkers=license,archival
>
> What do you think?

Good idea, this would be simple and effective!

Thanks,
Ludo’.