Message ID | 20211020165020.3358311-1-zimon.toutoune@gmail.com |
---|---|
Headers | show |
Series | guix hash: eases conversion | expand |
Hi, zimoun <zimon.toutoune@gmail.com> skribis: > 2. Using the option recursive changes the result for tarball, as with: > > $ guix hash $(guix build hello -S) > 0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i > > $ guix hash $(guix build hello -S) --recursive > 1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9 > > And I am not able to imagine a case. To me, it should be a fixed-point. > That’s what the first patch correct. That’s expected: ‘--recursive’ uses a different computation method, including file metadata (technically, it serializes the file as a nar and computes the hash of the nar). [...] > Then, working on Disarchive which uses base16 as encoding, it is annoying > twice, > > a) because it requires to download when all the sources > b) because it sometimes requires to apply patches > > Compare, > > $ guix hash $(guix build ceph -S) > 0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7 > > with the checksum in the package definition: > 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh. > > With the second patch, it becomes easy to convert the checksum from upstream: > > $ ./pre-inst-env guix hash ceph -f base16 > f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52 > > and nothing is downloaded. Get the checksum of what Guix really builds is > done via the current way, for instance, > > $ guix hash $(guix build ceph -S) -f base16 > 473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e > > and the second patch allows to convert the checksum from the package > definition (without downloading). Ah yes, got it. (I should read messages in the right order, oops!) An obvious problem with the interface you propose is that it’s ambiguous: are you printing the hash of the ‘ceph’ package, or computing that of the ‘ceph’ file? I’m sure the Zen of Python has something on ambiguity. ;-) Do you think there’s another place where we could provide helpers for the die-hard Disarchive hackers among us? Maybe we could get ‘guix lint -c archival’ to print Disarchive URLs upon failure, and that’d already help? WDYT? Thanks! Ludo’.
Hi Ludo, On Sat, 30 Oct 2021 at 16:53, Ludovic Courtès <ludo@gnu.org> wrote: > zimoun <zimon.toutoune@gmail.com> skribis: > >> 2. Using the option recursive changes the result for tarball, as with: >> >> $ guix hash $(guix build hello -S) >> 0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i >> >> $ guix hash $(guix build hello -S) --recursive >> 1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9 >> >> And I am not able to imagine a case. To me, it should be a fixed-point. >> That’s what the first patch correct. > > That’s expected: ‘--recursive’ uses a different computation method, > including file metadata (technically, it serializes the file as a nar > and computes the hash of the nar). Yes, but that’s odd. It should be the same computation method for tarballs. Nothing is recursive for a tarball therefore, the option should be skipped. This proposal is perhaps not the best approach although I lacked of imagination about corner cases. >> Then, working on Disarchive which uses base16 as encoding, it is annoying >> twice, >> >> a) because it requires to download when all the sources >> b) because it sometimes requires to apply patches >> >> Compare, >> >> $ guix hash $(guix build ceph -S) >> 0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7 >> >> with the checksum in the package definition: >> 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh. >> >> With the second patch, it becomes easy to convert the checksum from upstream: >> >> $ ./pre-inst-env guix hash ceph -f base16 >> f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52 >> >> and nothing is downloaded. Get the checksum of what Guix really builds is >> done via the current way, for instance, >> >> $ guix hash $(guix build ceph -S) -f base16 >> 473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e >> >> and the second patch allows to convert the checksum from the package >> definition (without downloading). > > Ah yes, got it. (I should read messages in the right order, oops!) > > An obvious problem with the interface you propose is that it’s > ambiguous: are you printing the hash of the ‘ceph’ package, or computing > that of the ‘ceph’ file? I’m sure the Zen of Python has something on > ambiguity. ;-) The patch is printing the hash of upstream and it is the only hash which matters – speaking both about packaging and about Disarchive. Therefore, there is no ambiguity here. Better said, the ambiguity is from “guix build --source” where it is not predictable beforehand what it will return. For instance, can you guess what “guix build -S graphviz” returns? ;-) And can you guess the hash? > Do you think there’s another place where we could provide helpers for > the die-hard Disarchive hackers among us? Maybe we could get ‘guix lint > -c archival’ to print Disarchive URLs upon failure, and that’d already > help? To me, “guix hash” is about hashing therefore it appears to me the right place for getting the hash of something. For instance, I do not find “guix lint -c archival” the right place for sending a request and saving to SWH; as olasd said at the time, IIRC. :-) However, the good is that “guix lint <pkg>” just works (for archiving). :-) Last, I do not want Diarchive URLs upon failure, I would like hashes and upstream URLs on request. :-) Well, I do not know. What could be better? Another subcommand “guix archival” doing all these plumbings: save, display hashes, upstream URL, disarchive URL, etc. WDYT? Cheers, simon
Re, On Sat, 30 Oct 2021 at 17:19, zimoun <zimon.toutoune@gmail.com> wrote: >> An obvious problem with the interface you propose is that it’s >> ambiguous: are you printing the hash of the ‘ceph’ package, or computing >> that of the ‘ceph’ file? I’m sure the Zen of Python has something on >> ambiguity. ;-) [...] > Well, I do not know. What could be better? Another subcommand “guix > archival” doing all these plumbings: save, display hashes, upstream URL, > disarchive URL, etc. Ah, I forgot. Zen of Python says, Now is better than never. and the proposed patch does “now” the things I need and this hypothetical other subcommand falls into “never”. :-) Cheers, simon
Hi! zimoun <zimon.toutoune@gmail.com> skribis: > On Sat, 30 Oct 2021 at 16:53, Ludovic Courtès <ludo@gnu.org> wrote: >> zimoun <zimon.toutoune@gmail.com> skribis: >> >>> 2. Using the option recursive changes the result for tarball, as with: >>> >>> $ guix hash $(guix build hello -S) >>> 0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i >>> >>> $ guix hash $(guix build hello -S) --recursive >>> 1qx3qqk86vgdvpqkhpgzq3gfcxmys29wzfizjb9asn4crbn503x9 >>> >>> And I am not able to imagine a case. To me, it should be a fixed-point. >>> That’s what the first patch correct. >> >> That’s expected: ‘--recursive’ uses a different computation method, >> including file metadata (technically, it serializes the file as a nar >> and computes the hash of the nar). > > Yes, but that’s odd. It should be the same computation method for > tarballs. Nothing is recursive for a tarball therefore, the option > should be skipped. This proposal is perhaps not the best approach > although I lacked of imagination about corner cases. The way I see it, ‘guix hash’ is a low-level tool and it should do what I ask for and not try to second-guess. >>> Then, working on Disarchive which uses base16 as encoding, it is annoying >>> twice, >>> >>> a) because it requires to download when all the sources >>> b) because it sometimes requires to apply patches >>> >>> Compare, >>> >>> $ guix hash $(guix build ceph -S) >>> 0ppd362s177cc47g75v0k27j7aaf27qc31cbbh0j2g30wmhl8gj7 >>> >>> with the checksum in the package definition: >>> 0lmdri415hqczc9565s5m5568pnj97ipqxgnw6085kps0flwq5zh. >>> >>> With the second patch, it becomes easy to convert the checksum from upstream: >>> >>> $ ./pre-inst-env guix hash ceph -f base16 >>> f017cca903face8280e1f6757ce349d25e644aa945175312fb0cc31248ccad52 >>> >>> and nothing is downloaded. Get the checksum of what Guix really builds is >>> done via the current way, for instance, >>> >>> $ guix hash $(guix build ceph -S) -f base16 >>> 473e4461e5603c21015c8b85c1f0114ea9238f986097f30e61ec9ca08519ed5e >>> >>> and the second patch allows to convert the checksum from the package >>> definition (without downloading). >> >> Ah yes, got it. (I should read messages in the right order, oops!) >> >> An obvious problem with the interface you propose is that it’s >> ambiguous: are you printing the hash of the ‘ceph’ package, or computing >> that of the ‘ceph’ file? I’m sure the Zen of Python has something on >> ambiguity. ;-) > > The patch is printing the hash of upstream and it is the only hash which > matters – speaking both about packaging and about Disarchive. > Therefore, there is no ambiguity here. Sorry, I think I wasn’t clear. Consider this: touch ceph guix hash ceph What does it print? If the result depends on external context (the presence or not of a ‘ceph’ file in $PWD), that’s a brittle interface IMO. This could be addressed by requiring users to be explicit, along these lines: guix hash ceph # compute the hash of the file called ‘ceph’ guix hash -P ceph # print the hash of the ‘ceph’ package But there’s another issue with the interface: ‘guix hash -P ceph’ would merely print the hash as it appears in the package definition. Thus ‘-H’ and ‘-r’ would have no effect, which can be confusing. > For instance, can you guess what “guix build -S graphviz” returns? ;-) I’m aware it returns the source after applying patches and snippets; I understand the shortcomings you mention quite well and I don’t deny there’s a need. :-) My comment is on the interface. >> Do you think there’s another place where we could provide helpers for >> the die-hard Disarchive hackers among us? Maybe we could get ‘guix lint >> -c archival’ to print Disarchive URLs upon failure, and that’d already >> help? > > To me, “guix hash” is about hashing therefore it appears to me the right > place for getting the hash of something. For instance, I do not find > “guix lint -c archival” the right place for sending a request and saving > to SWH; as olasd said at the time, IIRC. :-) However, the good is that > “guix lint <pkg>” just works (for archiving). :-) > > Last, I do not want Diarchive URLs upon failure, I would like hashes and > upstream URLs on request. :-) > > Well, I do not know. What could be better? Another subcommand “guix > archival” doing all these plumbings: save, display hashes, upstream URL, > disarchive URL, etc. Yes, maybe? I don’t know. I think it’s important to take a step back: perhaps we’re in need of a better tool around SWH and Disarchive, rather than just a tool that displays a hash. We already have all the APIs to do these things anyway, so if we clarify the use case, we can surely glue things together to build a tool that will be more convenient. (Maybe you’ve already written scripts to help you?) For example, if the use case is “is this tarball in Disarchive”, this is answered by ‘guix lint -c archival’, but perhaps we need a more low-level or more focused tool in that area? Regarding base16: that too isn’t set in stone. Commit 3cb5ae8577db28b2c6013b9d9ecf99cb696e3432 provides more flexibility, so we don’t have to stick to base16. I hope this perspective is useful! Ludo’.
zimoun <zimon.toutoune@gmail.com> skribis: >> That’s expected: ‘--recursive’ uses a different computation method, >> including file metadata (technically, it serializes the file as a nar >> and computes the hash of the nar). > > Yes, but that’s odd. It should be the same computation method for > tarballs. Nothing is recursive for a tarball Thinking more about it, I think confusion stems from the term “recursive” (inherited from Nix) because, as you write, it doesn’t necessarily have to do with recursion and directory traversal. Instead, it has to do with the serialization method. Thus, probably, ‘--recursive’ could be replaced by a ‘-S’ flag: guix hash -S nar something or: guix hash -S none something guix hash -S git something guix hash -S swh something ‘-S none’ would be like not passing ‘-r’; ‘-S git’ would serialize the file/directory as a Git tree; ‘-S swh’ would serialize it the SWH way, which is like Git except it preserves empty directories (Disarchive implements the Git/SWH methods already.) Thoughts? As mentioned towards the end of <https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/>, being able to deal with different tree serialization methods would be useful going forward; for instance, if we had the option to store SWH-style content hashes for origins, that’d help a lot. Allowing ‘guix hash’ to deal with that is a first step in that direction. (Apologies for slipping a bit off-topic…) Ludo’.
Hi Ludo, On Sun, 31 Oct 2021 at 15:03, Ludovic Courtès <ludo@gnu.org> wrote: [...] >> The patch is printing the hash of upstream and it is the only hash which >> matters – speaking both about packaging and about Disarchive. >> Therefore, there is no ambiguity here. > > Sorry, I think I wasn’t clear. Consider this: > > touch ceph > guix hash ceph > > What does it print? It would print the first clause. Two things: 1. How many times do you run “guix hash foo” inside a folder where there is a folder or file ’foo’? and 2. It is easy to document this corner case and “guix hash ./ceph” fixes the issue. Well, the root is that I disagree with your comment, I guess. :-) The way I see it, ‘guix hash’ is a low-level tool and it should do what I ask for and not try to second-guess. Bah it is similar as Garbage Collector debate; Pythonista says: devs are too dumb for managing memory by themselves, it has to be done automatically; C devs says: managing memory is too important for second-guessing dev intent. ;-) Note that from my understanding, “guix hash” and “guix download” are somehow redundant, i.e., “guix download” should be included to “guix hash”. Another story… but I was not drifting yet. ;-) > If the result depends on external context (the presence or not of a > ‘ceph’ file in $PWD), that’s a brittle interface IMO. I trust your experience on designing interfaces. :-) > This could be addressed by requiring users to be explicit, along these > lines: > > guix hash ceph # compute the hash of the file called ‘ceph’ > guix hash -P ceph # print the hash of the ‘ceph’ package Well, let’s go for that. One last question about bikeshedding, what should do guix hash -P ceph ceph ? Print twice hash of ceph package? Or print hash of ceph package and hash of ceph file? > But there’s another issue with the interface: ‘guix hash -P ceph’ would > merely print the hash as it appears in the package definition. Thus > ‘-H’ and ‘-r’ would have no effect, which can be confusing. Wow, many many options of many many Guix commands cannot be composed. Aside, these two still open bugs, <http://issues.guix.gnu.org/issue/50472> <http://issues.guix.gnu.org/issue/50473> for instance, guix package --list-installed --show=hello guix package --show=hello --list-installed guix package --list-available --list-installed guix package --list-installed --list-available And many more, guix pull --commit=1234 --branch=core-updates and so “guix time-machine” too. And I am not speaking about build transformations. Bah, ok let’s avoid to add another one. :-) It seems possible to detect and display a warning that -H or -r does not take effect because -P. > Yes, maybe? I don’t know. I think it’s important to take a step back: > perhaps we’re in need of a better tool around SWH and Disarchive, rather > than just a tool that displays a hash. We already have all the APIs to > do these things anyway, so if we clarify the use case, we can surely > glue things together to build a tool that will be more convenient. > (Maybe you’ve already written scripts to help you?) I will start to collect my needs and what I am doing when playing with that. And I will try to put that inside an extension, such as “guix archival”. It will be a basis for judging if it is worth or not. No, I do not have scripts. I mean, each time I work on that topic, I write again and again some quick and dirty stuff coupled to ugly Bash glue code. This patch is because I have been annoyed to repeat again and again. :-) Well, I am going to send another version adding multi FILE, first patch which is making consensus, and second patch the option --package/-P. Cheers, simon
Hi Ludo, On Sun, 31 Oct 2021 at 15:48, Ludovic Courtès <ludo@gnu.org> wrote: > Thinking more about it, I think confusion stems from the term > “recursive” (inherited from Nix) because, as you write, it doesn’t > necessarily have to do with recursion and directory traversal. > > Instead, it has to do with the serialization method. > > Thus, probably, ‘--recursive’ could be replaced by a ‘-S’ flag: > > guix hash -S nar something > > or: > > guix hash -S none something > guix hash -S git something > guix hash -S swh something > > ‘-S none’ would be like not passing ‘-r’; ‘-S git’ would serialize the > file/directory as a Git tree; ‘-S swh’ would serialize it the SWH way, > which is like Git except it preserves empty directories (Disarchive > implements the Git/SWH methods already.) Well, v2 is an attempt for this UI. It does not solve my initial problem but it is a first step in that direction. ;-) Note that SWH serializer is not added. Because I have probably missed to reach the implementation. :-) Well, next step could be to add an option “--list-serializers“. Let as an exercise for the reader. ;-) > As mentioned towards the end of > <https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/>, > being able to deal with different tree serialization methods would be > useful going forward; for instance, if we had the option to store > SWH-style content hashes for origins, that’d help a lot. Allowing ‘guix > hash’ to deal with that is a first step in that direction. Well, this series does not add SWH-style but it is a tiny step in that direction. Somehow, after v2-patch#2, it is easy to add as many serializers as we want. :-) > (Apologies for slipping a bit off-topic…) Thanks for slipping off-topic. :-) Cheers, simon *serializer: it is not an English word but I lacked imagination.