mbox series

[bug#51307,0/2] guix hash: eases conversion

Message ID 20211020165020.3358311-1-zimon.toutoune@gmail.com
Headers show
Series guix hash: eases conversion | expand

Message

Simon Tournier Oct. 20, 2021, 4:50 p.m. UTC
Hi,

The first patch is a tiny improvement on the error handling.

 1. The current situation does not correctly handle error because of
    ’with-error-handling’.
 2. Using the option recursive changes the result for tarball, as with:

      $ guix hash $(guix build hello -S)
      0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i

      $ guix hash $(guix build hello -S) --recursive
      1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9

    And I am not able to imagine a case.  To me, it should be a fixed-point.
    That’s what the first patch correct.

Moreover, it is possible to pass several arguments,

--8<---------------cut here---------------start------------->8---
$ find . -maxdepth 1 -type d | xargs guix hash -r
guix hash: erreur : nombre d'arguments incorrect

$ find . -maxdepth 1 -type d | xargs ./pre-inst-env guix hash -r
1rzh9b4b4qc5nf4mq601jr2p3xsw690q6d4137ymgq0an9xsli9v
1cgdvnjlh1ziwb12ax2wcrs7ddr44c2nhjali1v3ilsv7fsm79fq
0x64hc3jqq1jwbym5gvcbnsck4v08xxa3kr44m9961nsml1rpmld
03gzaccd1cws05sf469l9ghf9mhxqsnlkkbr859l13alba5isirb
1qmmppfg65wdzcg137hg62ic31ykzvgb26brcv77is1nszvrqm14
1ajw5s2ykyyvpaisv8xbd8rn77q1whk2fxmyfqn3qyzxjf8vw7sz
1fjnk5hsfvsyahf997f6nca5c01jh7gm590xcx2d2adjj2vm51r2
0hm8s9hc6c4x32v3ff0kz7npd1n2i3ld6p69ya68wxfhhkhwpg6r
1k1y2hax62r2jj7j8vk8wx6mhww42g77x1fp7iy151alplv6mi23
1c3dg3mfl4kg0px7rdj52qyxkpn00sdaf7z1bxib4n2wy175gd9m
15680dqbzr7dcngyqblyzqnr5s74rka4qh76n2pdfndd9gc81j0h
0hvlnas7grx69hrxbxz3zw9z80wr02m2c0lbjs0kcxv6wv3da871
1zvw0k4gl3sj3hagp415iy0dcqx8c1k3zwph3n1xcg0z2ljfqpl2
--8<---------------cut here---------------end--------------->8---



Then, working on Disarchive which uses base16 as encoding, it is annoying
twice,

  a) because it requires to download when all the sources
  b) because it sometimes requires to apply patches

Compare,

  $ guix hash $(guix build ceph -S)
  0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7

with the checksum in the package definition:
0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh.

With the second patch, it becomes easy to convert the checksum from upstream:

  $ ./pre-inst-env guix hash ceph -f base16
  f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52

and nothing is downloaded.  Get the checksum of what Guix really builds is
done via the current way, for instance,

   $ guix hash $(guix build ceph -S) -f base16
   473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e

and the second patch allows to convert the checksum from the package
definition (without downloading).

For instance, now it is really cheap to do:

--8<---------------cut here---------------start------------->8---
guix package -A | cut -f1 | grep julia | xargs ./pre-inst-env guix hash -f base16
--8<---------------cut here---------------end--------------->8---


All the best,
simon



zimoun (2):
  scripts: hash: Improve error handling.
  scripts: hash: Support file or package.

 guix/scripts/hash.scm | 75 ++++++++++++++++++++++++++++++-------------
 tests/guix-hash.sh    | 10 ++++++
 2 files changed, 63 insertions(+), 22 deletions(-)


base-commit: 19d3cfec72720a4a1339be3d14f4d88ae5bd59f4
--
2.32.0

Comments

Ludovic Courtès Oct. 30, 2021, 2:53 p.m. UTC | #1
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

>  2. Using the option recursive changes the result for tarball, as with:
>
>       $ guix hash $(guix build hello -S)
>       0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i
>
>       $ guix hash $(guix build hello -S) --recursive
>       1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9
>
>     And I am not able to imagine a case.  To me, it should be a fixed-point.
>     That’s what the first patch correct.

That’s expected: ‘--recursive’ uses a different computation method,
including file metadata (technically, it serializes the file as a nar
and computes the hash of the nar).

[...]

> Then, working on Disarchive which uses base16 as encoding, it is annoying
> twice,
>
>   a) because it requires to download when all the sources
>   b) because it sometimes requires to apply patches
>
> Compare,
>
>   $ guix hash $(guix build ceph -S)
>   0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7
>
> with the checksum in the package definition:
> 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh.
>
> With the second patch, it becomes easy to convert the checksum from upstream:
>
>   $ ./pre-inst-env guix hash ceph -f base16
>   f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52
>
> and nothing is downloaded.  Get the checksum of what Guix really builds is
> done via the current way, for instance,
>
>    $ guix hash $(guix build ceph -S) -f base16
>    473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e
>
> and the second patch allows to convert the checksum from the package
> definition (without downloading).

Ah yes, got it.  (I should read messages in the right order, oops!)

An obvious problem with the interface you propose is that it’s
ambiguous: are you printing the hash of the ‘ceph’ package, or computing
that of the ‘ceph’ file?  I’m sure the Zen of Python has something on
ambiguity.  ;-)

Do you think there’s another place where we could provide helpers for
the die-hard Disarchive hackers among us?  Maybe we could get ‘guix lint
-c archival’ to print Disarchive URLs upon failure, and that’d already
help?

WDYT?

Thanks!

Ludo’.
Simon Tournier Oct. 30, 2021, 3:19 p.m. UTC | #2
Hi Ludo,

On Sat, 30 Oct 2021 at 16:53, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>>  2. Using the option recursive changes the result for tarball, as with:
>>
>>       $ guix hash $(guix build hello -S)
>>       0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i
>>
>>       $ guix hash $(guix build hello -S) --recursive
>>       1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9
>>
>>     And I am not able to imagine a case.  To me, it should be a fixed-point.
>>     That’s what the first patch correct.
>
> That’s expected: ‘--recursive’ uses a different computation method,
> including file metadata (technically, it serializes the file as a nar
> and computes the hash of the nar).

Yes, but that’s odd.  It should be the same computation method for
tarballs.  Nothing is recursive for a tarball therefore, the option
should be skipped.  This proposal is perhaps not the best approach
although I lacked of imagination about corner cases.


>> Then, working on Disarchive which uses base16 as encoding, it is annoying
>> twice,
>>
>>   a) because it requires to download when all the sources
>>   b) because it sometimes requires to apply patches
>>
>> Compare,
>>
>>   $ guix hash $(guix build ceph -S)
>>   0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7
>>
>> with the checksum in the package definition:
>> 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh.
>>
>> With the second patch, it becomes easy to convert the checksum from upstream:
>>
>>   $ ./pre-inst-env guix hash ceph -f base16
>>   f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52
>>
>> and nothing is downloaded.  Get the checksum of what Guix really builds is
>> done via the current way, for instance,
>>
>>    $ guix hash $(guix build ceph -S) -f base16
>>    473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e
>>
>> and the second patch allows to convert the checksum from the package
>> definition (without downloading).
>
> Ah yes, got it.  (I should read messages in the right order, oops!)
>
> An obvious problem with the interface you propose is that it’s
> ambiguous: are you printing the hash of the ‘ceph’ package, or computing
> that of the ‘ceph’ file?  I’m sure the Zen of Python has something on
> ambiguity.  ;-)

The patch is printing the hash of upstream and it is the only hash which
matters – speaking both about packaging and about Disarchive.
Therefore, there is no ambiguity here.  Better said, the ambiguity is
from “guix build --source” where it is not predictable beforehand what
it will return.

For instance, can you guess what “guix build -S graphviz” returns? ;-)
And can you guess the hash?


> Do you think there’s another place where we could provide helpers for
> the die-hard Disarchive hackers among us?  Maybe we could get ‘guix lint
> -c archival’ to print Disarchive URLs upon failure, and that’d already
> help?

To me, “guix hash” is about hashing therefore it appears to me the right
place for getting the hash of something.  For instance, I do not find
“guix lint -c archival” the right place for sending a request and saving
to SWH; as olasd said at the time, IIRC. :-) However, the good is that
“guix lint <pkg>” just works (for archiving). :-)

Last, I do not want Diarchive URLs upon failure, I would like hashes and
upstream URLs on request. :-)

Well, I do not know.  What could be better?  Another subcommand “guix
archival” doing all these plumbings: save, display hashes, upstream URL,
disarchive URL, etc.

WDYT?

Cheers,
simon
Simon Tournier Oct. 30, 2021, 3:24 p.m. UTC | #3
Re,

On Sat, 30 Oct 2021 at 17:19, zimoun <zimon.toutoune@gmail.com> wrote:

>> An obvious problem with the interface you propose is that it’s
>> ambiguous: are you printing the hash of the ‘ceph’ package, or computing
>> that of the ‘ceph’ file?  I’m sure the Zen of Python has something on
>> ambiguity.  ;-)

[...]

> Well, I do not know.  What could be better?  Another subcommand “guix
> archival” doing all these plumbings: save, display hashes, upstream URL,
> disarchive URL, etc.

Ah, I forgot.   Zen of Python says,

        Now is better than never.

and the proposed patch does “now” the things I need and this hypothetical
other subcommand falls into “never”. :-)


Cheers,
simon
Ludovic Courtès Oct. 31, 2021, 2:03 p.m. UTC | #4
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

> On Sat, 30 Oct 2021 at 16:53, Ludovic Courtès <ludo@gnu.org> wrote:
>> zimoun <zimon.toutoune@gmail.com> skribis:
>>
>>>  2. Using the option recursive changes the result for tarball, as with:
>>>
>>>       $ guix hash $(guix build hello -S)
>>>       0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i
>>>
>>>       $ guix hash $(guix build hello -S) --recursive
>>>       1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9
>>>
>>>     And I am not able to imagine a case.  To me, it should be a fixed-point.
>>>     That’s what the first patch correct.
>>
>> That’s expected: ‘--recursive’ uses a different computation method,
>> including file metadata (technically, it serializes the file as a nar
>> and computes the hash of the nar).
>
> Yes, but that’s odd.  It should be the same computation method for
> tarballs.  Nothing is recursive for a tarball therefore, the option
> should be skipped.  This proposal is perhaps not the best approach
> although I lacked of imagination about corner cases.

The way I see it, ‘guix hash’ is a low-level tool and it should do what
I ask for and not try to second-guess.

>>> Then, working on Disarchive which uses base16 as encoding, it is annoying
>>> twice,
>>>
>>>   a) because it requires to download when all the sources
>>>   b) because it sometimes requires to apply patches
>>>
>>> Compare,
>>>
>>>   $ guix hash $(guix build ceph -S)
>>>   0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7
>>>
>>> with the checksum in the package definition:
>>> 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh.
>>>
>>> With the second patch, it becomes easy to convert the checksum from upstream:
>>>
>>>   $ ./pre-inst-env guix hash ceph -f base16
>>>   f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52
>>>
>>> and nothing is downloaded.  Get the checksum of what Guix really builds is
>>> done via the current way, for instance,
>>>
>>>    $ guix hash $(guix build ceph -S) -f base16
>>>    473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e
>>>
>>> and the second patch allows to convert the checksum from the package
>>> definition (without downloading).
>>
>> Ah yes, got it.  (I should read messages in the right order, oops!)
>>
>> An obvious problem with the interface you propose is that it’s
>> ambiguous: are you printing the hash of the ‘ceph’ package, or computing
>> that of the ‘ceph’ file?  I’m sure the Zen of Python has something on
>> ambiguity.  ;-)
>
> The patch is printing the hash of upstream and it is the only hash which
> matters – speaking both about packaging and about Disarchive.
> Therefore, there is no ambiguity here.

Sorry, I think I wasn’t clear.  Consider this:

  touch ceph
  guix hash ceph

What does it print?

If the result depends on external context (the presence or not of a
‘ceph’ file in $PWD), that’s a brittle interface IMO.

This could be addressed by requiring users to be explicit, along these
lines:

  guix hash ceph    # compute the hash of the file called ‘ceph’
  guix hash -P ceph # print the hash of the ‘ceph’ package


But there’s another issue with the interface: ‘guix hash -P ceph’ would
merely print the hash as it appears in the package definition.  Thus
‘-H’ and ‘-r’ would have no effect, which can be confusing.

> For instance, can you guess what “guix build -S graphviz” returns? ;-)

I’m aware it returns the source after applying patches and snippets; I
understand the shortcomings you mention quite well and I don’t deny
there’s a need.  :-)

My comment is on the interface.

>> Do you think there’s another place where we could provide helpers for
>> the die-hard Disarchive hackers among us?  Maybe we could get ‘guix lint
>> -c archival’ to print Disarchive URLs upon failure, and that’d already
>> help?
>
> To me, “guix hash” is about hashing therefore it appears to me the right
> place for getting the hash of something.  For instance, I do not find
> “guix lint -c archival” the right place for sending a request and saving
> to SWH; as olasd said at the time, IIRC. :-) However, the good is that
> “guix lint <pkg>” just works (for archiving). :-)
>
> Last, I do not want Diarchive URLs upon failure, I would like hashes and
> upstream URLs on request. :-)
>
> Well, I do not know.  What could be better?  Another subcommand “guix
> archival” doing all these plumbings: save, display hashes, upstream URL,
> disarchive URL, etc.

Yes, maybe?  I don’t know.  I think it’s important to take a step back:
perhaps we’re in need of a better tool around SWH and Disarchive, rather
than just a tool that displays a hash.  We already have all the APIs to
do these things anyway, so if we clarify the use case, we can surely
glue things together to build a tool that will be more convenient.
(Maybe you’ve already written scripts to help you?)

For example, if the use case is “is this tarball in Disarchive”, this is
answered by ‘guix lint -c archival’, but perhaps we need a more
low-level or more focused tool in that area?

Regarding base16: that too isn’t set in stone.  Commit
3cb5ae8577db28b2c6013b9d9ecf99cb696e3432 provides more flexibility, so
we don’t have to stick to base16.

I hope this perspective is useful!

Ludo’.
Ludovic Courtès Oct. 31, 2021, 2:48 p.m. UTC | #5
zimoun <zimon.toutoune@gmail.com> skribis:

>> That’s expected: ‘--recursive’ uses a different computation method,
>> including file metadata (technically, it serializes the file as a nar
>> and computes the hash of the nar).
>
> Yes, but that’s odd.  It should be the same computation method for
> tarballs.  Nothing is recursive for a tarball

Thinking more about it, I think confusion stems from the term
“recursive” (inherited from Nix) because, as you write, it doesn’t
necessarily have to do with recursion and directory traversal.

Instead, it has to do with the serialization method.

Thus, probably, ‘--recursive’ could be replaced by a ‘-S’ flag:

  guix hash -S nar something

or:

  guix hash -S none something
  guix hash -S git something
  guix hash -S swh something

‘-S none’ would be like not passing ‘-r’; ‘-S git’ would serialize the
file/directory as a Git tree; ‘-S swh’ would serialize it the SWH way,
which is like Git except it preserves empty directories (Disarchive
implements the Git/SWH methods already.)

Thoughts?

As mentioned towards the end of
<https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/>,
being able to deal with different tree serialization methods would be
useful going forward; for instance, if we had the option to store
SWH-style content hashes for origins, that’d help a lot.  Allowing ‘guix
hash’ to deal with that is a first step in that direction.

(Apologies for slipping a bit off-topic…)

Ludo’.
Simon Tournier Nov. 9, 2021, 9:18 a.m. UTC | #6
Hi Ludo,

On Sun, 31 Oct 2021 at 15:03, Ludovic Courtès <ludo@gnu.org> wrote:


[...]

>> The patch is printing the hash of upstream and it is the only hash which
>> matters – speaking both about packaging and about Disarchive.
>> Therefore, there is no ambiguity here.
>
> Sorry, I think I wasn’t clear.  Consider this:
>
>   touch ceph
>   guix hash ceph
>
> What does it print?

It would print the first clause.  Two things: 1. How many times do you
run “guix hash foo” inside a folder where there is a folder or file
’foo’? and 2. It is easy to document this corner case and “guix hash
./ceph” fixes the issue.

Well, the root is that I disagree with your comment, I guess. :-)

        The way I see it, ‘guix hash’ is a low-level tool and it should
        do what I ask for and not try to second-guess.

Bah it is similar as Garbage Collector debate; Pythonista says: devs are
too dumb for managing memory by themselves, it has to be done
automatically; C devs says: managing memory is too important for
second-guessing dev intent. ;-)


Note that from my understanding, “guix hash” and “guix download” are
somehow redundant, i.e., “guix download” should be included to “guix
hash”.  Another story… but I was not drifting yet. ;-)


> If the result depends on external context (the presence or not of a
> ‘ceph’ file in $PWD), that’s a brittle interface IMO.

I trust your experience on designing interfaces. :-)


> This could be addressed by requiring users to be explicit, along these
> lines:
>
>   guix hash ceph    # compute the hash of the file called ‘ceph’
>   guix hash -P ceph # print the hash of the ‘ceph’ package

Well, let’s go for that.  One last question about bikeshedding, what
should do

   guix hash -P ceph ceph
   
?  Print twice hash of ceph package?  Or print hash of ceph package and
hash of ceph file?


> But there’s another issue with the interface: ‘guix hash -P ceph’ would
> merely print the hash as it appears in the package definition.  Thus
> ‘-H’ and ‘-r’ would have no effect, which can be confusing.

Wow, many many options of many many Guix commands cannot be composed.

Aside, these two still open bugs,

<http://issues.guix.gnu.org/issue/50472>
<http://issues.guix.gnu.org/issue/50473>

for instance,

   guix package --list-installed --show=hello
   guix package --show=hello     --list-installed

   guix package  --list-available --list-installed
   guix package  --list-installed --list-available

And many more,

   guix pull --commit=1234 --branch=core-updates

and so “guix time-machine” too.  And I am not speaking about build
transformations.


Bah, ok let’s avoid to add another one. :-)  It seems possible to detect
and display a warning that -H or -r does not take effect because -P.


> Yes, maybe?  I don’t know.  I think it’s important to take a step back:
> perhaps we’re in need of a better tool around SWH and Disarchive, rather
> than just a tool that displays a hash.  We already have all the APIs to
> do these things anyway, so if we clarify the use case, we can surely
> glue things together to build a tool that will be more convenient.
> (Maybe you’ve already written scripts to help you?)

I will start to collect my needs and what I am doing when playing with
that.  And I will try to put that inside an extension, such as “guix
archival”.  It will be a basis for judging if it is worth or not.

No, I do not have scripts.  I mean, each time I work on that topic, I
write again and again some quick and dirty stuff coupled to ugly Bash
glue code.

This patch is because I have been annoyed to repeat again and again. :-)


Well, I am going to send another version adding multi FILE, first patch
which is making consensus, and second patch the option --package/-P.


Cheers,
simon
Simon Tournier Nov. 18, 2021, 12:29 a.m. UTC | #7
Hi Ludo,

On Sun, 31 Oct 2021 at 15:48, Ludovic Courtès <ludo@gnu.org> wrote:

> Thinking more about it, I think confusion stems from the term
> “recursive” (inherited from Nix) because, as you write, it doesn’t
> necessarily have to do with recursion and directory traversal.
>
> Instead, it has to do with the serialization method.
>
> Thus, probably, ‘--recursive’ could be replaced by a ‘-S’ flag:
>
>   guix hash -S nar something
>
> or:
>
>   guix hash -S none something
>   guix hash -S git something
>   guix hash -S swh something
>
> ‘-S none’ would be like not passing ‘-r’; ‘-S git’ would serialize the
> file/directory as a Git tree; ‘-S swh’ would serialize it the SWH way,
> which is like Git except it preserves empty directories (Disarchive
> implements the Git/SWH methods already.)

Well, v2 is an attempt for this UI.  It does not solve my initial
problem but it is a first step in that direction. ;-)

Note that SWH serializer is not added.  Because I have probably missed
to reach the implementation. :-)

Well, next step could be to add an option “--list-serializers“.  Let
as an exercise for the reader. ;-)


> As mentioned towards the end of
> <https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/>,
> being able to deal with different tree serialization methods would be
> useful going forward; for instance, if we had the option to store
> SWH-style content hashes for origins, that’d help a lot.  Allowing ‘guix
> hash’ to deal with that is a first step in that direction.

Well, this series does not add SWH-style but it is a tiny step in that
direction.  Somehow, after v2-patch#2, it is easy to add as many
serializers as we want. :-)

> (Apologies for slipping a bit off-topic…)

Thanks for slipping off-topic. :-)

Cheers,
simon

*serializer: it is not an English word but I lacked imagination.