Message ID | 20220104200643.43374-2-maximedevos@telenet.be |
---|---|
State | Accepted |
Headers | show |
Series | Add upstream updater for git-fetch origins | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
Hi Maxime, Thanks! All LGTM and I have two naive remarks. On Tue, 04 Jan 2022 at 20:06, Maxime Devos <maximedevos@telenet.be> wrote: > diff --git a/guix/hash.scm b/guix/hash.scm [...] > +(define-module (guix hash) > + #:use-module (gcrypt hash) > + #:use-module (guix serialization) > + #:use-module (srfi srfi-1) > + #:use-module (srfi srfi-11) > + #:export (vcs-file? > + file-hash*)) > + > +(define (vcs-file? file stat) > + "Returns true if FILE is a version control system file." > + (case (stat:type stat) > + ((directory) > + (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS"))) > + ((regular) > + ;; Git sub-modules have a '.git' file that is a regular text file. > + (string=? (basename file) ".git")) > + (else > + #f))) 1) Why ’vcs-file?’ requires to be exported? Is it used elsewhere? > +(define* (file-hash* file #:key > + (algorithm (hash-algorithm sha256)) > + (recursive? 'auto) 2) ’auto’ is confusing… > + (select? (negate vcs-file?))) > + "Compute the hash of FILE with ALGORITHM. If RECURSIVE? is #true or 'auto', > +recurse into subdirectories of FILE, computing the combined hash (nar hash) of …here I understand that ’auto’ means #true… > +Keep in mind that the hash of a regular file depends on RECURSIVE?: > +if the recursive hash is desired, it must be set to #true. Otherwise, it must > +be set to #false or 'auto'. In most situations, the non-recursive hash is desired > +for regular files." …but there it is the contrary. :-) To me, #true/#false or #t/#f are meaningful, especially when… > + (if (or (eq? recursive? #true) > + (and (eq? recursive? 'auto) …the symbol ’auto’ is only used here. IIRC all the series. :-) (I know Ricardo is for instance in favor of #true/#false compared to #t/#f. I have an opinion but I would like to avoid another bikeshed. ;-)) Cheers, simon
zimoun schreef op di 04-01-2022 om 23:22 [+0100]: > 2) ’auto’ is confusing… > > > + (select? (negate vcs-file?))) > > + "Compute the hash of FILE with ALGORITHM. If RECURSIVE? is #true or 'auto', > > +recurse into subdirectories of FILE, computing the combined hash (nar hash) of > > …here I understand that ’auto’ means #true… Precisely, in the sense 'auto' means #true in that 'auto' recurses. But sometimes #true / auto compute a different hash ... > > +Keep in mind that the hash of a regular file depends on RECURSIVE?: > > +if the recursive hash is desired, it must be set to #true. Otherwise, it must > > +be set to #false or 'auto'. In most situations, the non-recursive hash is desired > > +for regular files." > > …but there it is the contrary. :-) No, when #:recursive? is 'auto' and the file is a directory, it recurses. When it is 'auto' and the file is a regular file, then it also recurses, albeit in a trivial way (because regular files don't contain other files). This comment explains that the 'recursive hash' (nar hash) and 'regular hash' of a regular file are different, that usually you want the regular hash for regular files, and implies that '#:recursive? auto' usually does the right thing. But if you really want the recursive hash for regular files, then you can still compute that by setting #:recursive? #true. > To me, #true/#false or #t/#f are > meaningful, especially when… > > > + (if (or (eq? recursive? #true) > > + (and (eq? recursive? 'auto) > > …the symbol ’auto’ is only used here. IIRC all the series. :-) In ‘[PATCH v4 3/4] refresh: Support non-tarball sources.’, there's > + (let ((hash (file-hash* output))) There, #:recursive? is 'auto'. Greetings, Maxime
zimoun schreef op di 04-01-2022 om 23:22 [+0100]: > > +(define (vcs-file? file stat) > > + "Returns true if FILE is a version control system file." > > + (case (stat:type stat) > > + ((directory) > > + (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS"))) > > + ((regular) > > + ;; Git sub-modules have a '.git' file that is a regular text > > file. > > + (string=? (basename file) ".git")) > > + (else > > + #f))) > > 1) Why ’vcs-file?’ requires to be exported? Is it used elsewhere? It is used in (guix scripts hash): (select? (if (assq-ref opts 'exclude-vcs?) (negate vcs-file?) (const #t))) Greetings, Maxime.
Hi Maxime, On Wed, 05 Jan 2022 at 11:07, Maxime Devos <maximedevos@telenet.be> wrote: > Precisely, in the sense 'auto' means #true in that 'auto' recurses. > But sometimes #true / auto compute a different hash ... [...] > No, when #:recursive? is 'auto' and the file is a directory, it > recurses. When it is 'auto' and the file is a regular file, then > it also recurses, albeit in a trivial way (because regular files don't > contain other files). > > This comment explains that the 'recursive hash' (nar hash) and 'regular > hash' of a regular file are different, that usually you want the > regular hash for regular files, and implies that '#:recursive? auto' > usually does the right thing. > > But if you really want the recursive hash for regular files, then you > can still compute that by setting #:recursive? #true. Thanks for explaining. Hm, my confusion is probably the same as #51307 [1]. 1: <https://issues.guix.gnu.org/51307#12> Well, I think ’#:recursive?’ is confusing, and ’auto’ too because it is not POLA for a plumbing function, IMHO. Anyway. It is v4 and it is ready to merge. :-) I just propose to replace ’#:recursive?’ by ’#:nar-serializer?’ and a docstring along these lines, --8<---------------cut here---------------start------------->8--- "Compute the hash of FILE with ALGORITHM. If NAR-SERIALIZER? is #true, compute the combined hash (NAR hash) of FILE for which (SELECT? FILE STAT) returns true. If NAR-SERIALIZER? is #false, compute the regular hash using the default serializer. It is meant to be used for a regular file. If NAR-SERIALIZER? is 'auto', when FILE is a directory, compute the combined hash (NAR hash). When FILE is a regular file, compute the regular hash using the default serializer. The option ’auto’ is meant to apply by default the expected hash computation. Symbolic links are not dereferenced unless NAR-SERIALIZER? is false. This procedure must only be used under controlled circumstances; the detection of symbolic links in FILE is racy. --8<---------------cut here---------------end--------------->8--- WDYT? >> > + (if (or (eq? recursive? #true) >> > + (and (eq? recursive? 'auto) >> >> …the symbol ’auto’ is only used here. IIRC all the series. :-) > > In ‘[PATCH v4 3/4] refresh: Support non-tarball sources.’, there's > >> + (let ((hash (file-hash* output))) > > There, #:recursive? is 'auto'. Naive questions: Is it mandatory? Or can be explicitly set? (I have nothing against, just to me ’auto’ is somehow ambiguous and «In the face of ambiguity, refuse the temptation to guess» as ’pyhon3 -c 'import this'’ says ;-)) Cheers, simon
zimoun schreef op wo 05-01-2022 om 12:48 [+0100]: > > > > + (if (or (eq? recursive? #true) > > > > + (and (eq? recursive? 'auto) > > > > > > …the symbol ’auto’ is only used here. IIRC all the series. :-) > > > > In ‘[PATCH v4 3/4] refresh: Support non-tarball sources.’, there's > > > > > + (let ((hash (file-hash* output))) > > > > There, #:recursive? is 'auto'. > > Naive questions: Is it mandatory? Or can be explicitly set? > > (I have nothing against, just to me ’auto’ is somehow ambiguous and > «In > the face of ambiguity, refuse the temptation to guess» as ’pyhon3 -c > 'import this'’ says ;-)) 'auto' is indeed a little ambigious, so I adjusted most calls to file-hash* to set #:recursive? #true/#false appropriately in v3. But in this particular case (guix/scripts/refresh.scm), it not known in advance, so some guesswork is necessary. Anyway, these calls to file-hash* are bothering me a little: can't we just record the hash in the 'upstream-source' record or ask the daemon for the hash of a store item (*) or something? (*) Maybe query-path-hash works or maybe there are problems. Also, would be nice if there was a variant of query-path-hash that works on non-sha256 (in principle guix supports other hashes, though currently they are unused). Or maybe query-path-hash is works differently. That would complicate this patch series more, so I'd prefer to delay that for a future patch series. Greetings, Maxime.
zimoun schreef op wo 05-01-2022 om 12:48 [+0100]: > Well, I think ’#:recursive?’ is confusing, and ’auto’ too because it is > not POLA for a plumbing function, IMHO. [...] Principle of least authority, or principle of least astonishment? I presume the latter. > Anyway. It is v4 and it is ready to merge. :-) I vote for a purple bikeshed! But your orange bikeshed would also keep the bikes out of the rain. > I just propose to replace ’#:recursive?’ by ’#:nar-serializer?’ and a > docstring along these lines, > > --8<---------------cut here---------------start------------->8--- > "Compute the hash of FILE with ALGORITHM. If NAR-SERIALIZER? is > #true, compute the combined hash (NAR hash) of FILE for which (SELECT? > FILE STAT) returns true. > > If NAR-SERIALIZER? is #false, compute the regular hash using the > default serializer. It is meant to be used for a regular file. > > If NAR-SERIALIZER? is 'auto', when FILE is a directory, compute the > combined hash (NAR hash). When FILE is a regular file, compute the > regular hash using the default serializer. The option ’auto’ is meant > to apply by default the expected hash computation. > > Symbolic links are not dereferenced unless NAR-SERIALIZER? is false. > > This procedure must only be used under controlled circumstances; the > detection of symbolic links in FILE is racy. > --8<---------------cut here---------------end--------------->8--- > > WDYT? The nar hash / regular hash difference seems a very low-level detail to me, that most (all?) users don't need to be bothered about. Except maybe if FILE denotes an executable regular file, but file-hash* is currently only used on tarballs/zip files/git checkouts, which aren't executable files unless weirdness or some kind of attack is happening. I think that, the ‘least astonishing’ thing to do here, is computing the hash that would go into the 'hash' / 'sha256' field of 'origin' objects by default, and not the nar hash for regular files that's almost never used. Greetings, Maxime.
Hi Maxime, On Wed, 05 Jan 2022 at 12:27, Maxime Devos <maximedevos@telenet.be> wrote: > zimoun schreef op wo 05-01-2022 om 12:48 [+0100]: >> Well, I think ’#:recursive?’ is confusing, and ’auto’ too because it is >> not POLA for a plumbing function, IMHO. [...] > > Principle of least authority, or principle of least astonishment? > I presume the latter. Latter. :-) >> Anyway. It is v4 and it is ready to merge. :-) > > I vote for a purple bikeshed! But your orange bikeshed would also keep > the bikes out of the rain. :-) >> --8<---------------cut here---------------start------------->8--- >> "Compute the hash of FILE with ALGORITHM. If NAR-SERIALIZER? is >> #true, compute the combined hash (NAR hash) of FILE for which (SELECT? >> FILE STAT) returns true. >> >> If NAR-SERIALIZER? is #false, compute the regular hash using the >> default serializer. It is meant to be used for a regular file. >> >> If NAR-SERIALIZER? is 'auto', when FILE is a directory, compute the >> combined hash (NAR hash). When FILE is a regular file, compute the >> regular hash using the default serializer. The option ’auto’ is meant >> to apply by default the expected hash computation. >> >> Symbolic links are not dereferenced unless NAR-SERIALIZER? is false. >> >> This procedure must only be used under controlled circumstances; the >> detection of symbolic links in FILE is racy. >> --8<---------------cut here---------------end--------------->8--- > The nar hash / regular hash difference seems a very low-level detail to > me, that most (all?) users don't need to be bothered about. Except > maybe if FILE denotes an executable regular file, but file-hash* is > currently only used on tarballs/zip files/git checkouts, which aren't > executable files unless weirdness or some kind of attack is happening. > > I think that, the ‘least astonishing’ thing to do here, is computing > the hash that would go into the 'hash' / 'sha256' field of 'origin' > objects by default, and not the nar hash for regular files that's > almost never used. I do not understand what you mean here. ’file-hash*’ is a low-level detail, no? Whatever. :-) Well, I am sorry if my 3 naive comments are not convenient. Just, to be sure, I am proposing: 1) It is v4 and ready, I guess. About ’auto’, I could have waken up earlier. :-) And it can be still improved later as you are saying in the other answer. So, we are done, right? 2) From my point of view, ’#:recursive?’ needs to be adapted in agreement with the discussion [1], quoting Ludo: Thinking more about it, I think confusion stems from the term “recursive” (inherited from Nix) because, as you write, it doesn’t necessarily have to do with recursion and directory traversal. Instead, it has to do with the serialization method. 1: <http://issues.guix.gnu.org/issue/51307> And I do not have a strong opinion. Just a naive remark. 3) Whatever the keyword for the current v4 ’#:recursive?’ is picked, I still find the current docstring wording unclear. In fact, reading the code is more helpful. :-) I am just proposing a reword which appears to me clearer than the current v4 one. Maybe, I am missing the obvious. Or maybe this proposed rewording is not clearer. :-) WDYT? Cheers, simon
zimoun schreef op wo 05-01-2022 om 13:58 [+0100]: > [...] > > > --8<---------------cut here---------------start------------->8--- > > > "Compute the hash of FILE with ALGORITHM. If NAR-SERIALIZER? is > > > #true, compute the combined hash (NAR hash) of FILE for which (SELECT? > > > FILE STAT) returns true. > > > > > > If NAR-SERIALIZER? is #false, compute the regular hash using the > > > default serializer. It is meant to be used for a regular file. > > > > > > If NAR-SERIALIZER? is 'auto', when FILE is a directory, compute the > > > combined hash (NAR hash). When FILE is a regular file, compute the > > > regular hash using the default serializer. The option ’auto’ is meant > > > to apply by default the expected hash computation. > > > > > > Symbolic links are not dereferenced unless NAR-SERIALIZER? is false. > > > > > > This procedure must only be used under controlled circumstances; the > > > detection of symbolic links in FILE is racy. > > > --8<---------------cut here---------------end--------------->8--- > > > The nar hash / regular hash difference seems a very low-level detail to > > me, that most (all?) users don't need to be bothered about. Except > > maybe if FILE denotes an executable regular file, but file-hash* is > > currently only used on tarballs/zip files/git checkouts, which aren't > > executable files unless weirdness or some kind of attack is happening. > > > > I think that, the ‘least astonishing’ thing to do here, is computing > > the hash that would go into the 'hash' / 'sha256' field of 'origin' > > objects by default, and not the nar hash for regular files that's > > almost never used. > > I do not understand what you mean here. ’file-hash*’ is a low-level > detail, no? Whatever. :-) I don't see what it matters if 'file-hash*' is classified as low-level or high-level. But what I do care about, is how easy to use file-hash* is. A low-level argument like #:nar-hash? #true/#false would make file- hash* much more complicated: this patch series uses file-hash* to compute the hash for 'origin' records, and the documentation of 'origin' doesn't mention 'nar' anywhere and if I search for 'nar hash' in the manual, I find zero results. Instead, file-hash* talks about directories, regular files, recursion and claims that the default value of #:recursive? usually does the right thing, so I don't have to look up any complicated terminology to figure out how to use file-hash* to compute hashes for 'origin' records. And in the rare situation where file-hash* doesn't do the right thing, the documentation tells me I can set #:recursive? #true/#false. > Just, to be sure, I am proposing: > > 1) It is v4 and ready, I guess. About ’auto’, I could have waken up > earlier. :-) And it can be still improved later as you are saying in > the other answer. So, we are done, right? I think so, yes, except for a docstring change I'll send as a v5. I'm also out of bikeshed paint. Anway, keep in mind that I'm not a committer. > 2) From my point of view, ’#:recursive?’ needs to be adapted in > agreement with the discussion [1], quoting Ludo: > > Thinking more about it, I think confusion stems from the term > “recursive” (inherited from Nix) because, as you write, it > doesn’t necessarily have to do with recursion and directory > traversal. > > Instead, it has to do with the serialization method. > > 1: <http://issues.guix.gnu.org/issue/51307> > > And I do not have a strong opinion. Just a naive remark. I don't think the arguments for (guix scripts hash) apply directly to (guix hash) -- (guix scripts hash) supports multiple serialisers: * none (regular in (guix hash) terminology) * git * nar * swh so something like -S nar makes a lot of sense there. But (guix hash) is only for computing the hash of something that would become a store item after interning, more specifically is is currently only used for computing the hash that would go into an (origin ...) object (though I suppose it could be extended to support git/swh/... if someone wants do that). Possibly some name like #:treat-it-as-a-directory-or-an-executable-file-or-a-symlink-and- compute-the-alternative-hash-even-if-it-is-regular? would be clearer and technically more accurate than #:recursive?, but that's a bit of a mouthful. > 3) Whatever the keyword for the current v4 ’#:recursive?’ is picked, I > still find the current docstring wording unclear. In fact, reading > the code is more helpful. :-) I am just proposing a reword which > appears to me clearer than the current v4 one. Maybe, I am missing > the obvious. Or maybe this proposed rewording is not clearer. :-) I've reworded it a bit; it falsely claimed that the nar hash was always computed when recursive? is 'auto' (even if FILE is a regular file). It also mentions executable files and SELECT? now. Greetings, Maxime.
Hi Maxime, On Wed, 05 Jan 2022 at 14:06, Maxime Devos <maximedevos@telenet.be> wrote: > A low-level argument like #:nar-hash? #true/#false would make file- > hash* much more complicated: this patch series uses file-hash* to > compute the hash for 'origin' records, and the documentation of > 'origin' doesn't mention 'nar' anywhere and if I search for 'nar hash' > in the manual, I find zero results. I agree, it was my point #1. :-) > Instead, file-hash* talks about directories, regular files, recursion > and claims that the default value of #:recursive? usually does the > right thing, so I don't have to look up any complicated terminology > to figure out how to use file-hash* to compute hashes for 'origin' > records. I also agree, it was my point #3. :-) > And in the rare situation where file-hash* doesn't do the right thing, > the documentation tells me I can set #:recursive? #true/#false. Yes. >> Just, to be sure, I am proposing: >> >> 1) It is v4 and ready, I guess. About ’auto’, I could have waken up >> earlier. :-) And it can be still improved later as you are saying in >> the other answer. So, we are done, right? > > I think so, yes, except for a docstring change I'll send as a v5. > I'm also out of bikeshed paint. > Anway, keep in mind that I'm not a committer. I am not either. If I had this power, I would have already pushed your v4 with the docstring reword. :-) >> 2) From my point of view, ’#:recursive?’ needs to be adapted in >> agreement with the discussion [1], quoting Ludo: [...] >> And I do not have a strong opinion. Just a naive remark. [...] > Possibly some name like > #:treat-it-as-a-directory-or-an-executable-file-or-a-symlink-and- > compute-the-alternative-hash-even-if-it-is-regular? > would be clearer and technically more accurate than #:recursive?, but > that's a bit of a mouthful. I trust you, I do not have a strong opinion. I was just a naive remark. >> 3) Whatever the keyword for the current v4 ’#:recursive?’ is picked, I >> still find the current docstring wording unclear. In fact, reading >> the code is more helpful. :-) I am just proposing a reword which >> appears to me clearer than the current v4 one. Maybe, I am missing >> the obvious. Or maybe this proposed rewording is not clearer. :-) > > I've reworded it a bit; it falsely claimed that the nar hash was always > computed when recursive? is 'auto' (even if FILE is a regular file). It > also mentions executable files and SELECT? now. Thank you for your patient work. Cheers, simon
zimoun schreef op wo 05-01-2022 om 16:08 [+0100]: > [...] > Thank you for your patient work. And thank you for double-checking things! I'll drop a message on #guix that all appears to be ready for merging. Greetings, Maxime.
Hi, Maxime Devos <maximedevos@telenet.be> skribis: > 'auto' is indeed a little ambigious, so I adjusted most calls to > file-hash* to set #:recursive? #true/#false appropriately in v3. > But in this particular case (guix/scripts/refresh.scm), it not known in > advance, so some guesswork is necessary. We could move guesswork at the call site. No big deal IMO, though. > Anyway, these calls to file-hash* are bothering me a little: can't > we just record the hash in the 'upstream-source' record or ask the > daemon for the hash of a store item (*) or something? <upstream-source> represents available source that has usually not been downloaded yet (that’s what happens when running ‘guix refresh’ without ‘-u’), so it cannot contain the hash. > That would complicate this patch series more, so I'd prefer to delay > that for a future patch series. Yes, this series LGTM as-is, but let’s keep these improvements in mind. Ludo’.
Maxime Devos <maximedevos@telenet.be> skribis: > zimoun schreef op wo 05-01-2022 om 13:58 [+0100]: [...] >> 2) From my point of view, ’#:recursive?’ needs to be adapted in >> agreement with the discussion [1], quoting Ludo: >> >> Thinking more about it, I think confusion stems from the term >> “recursive” (inherited from Nix) because, as you write, it >> doesn’t necessarily have to do with recursion and directory >> traversal. >> >> Instead, it has to do with the serialization method. >> >> 1: <http://issues.guix.gnu.org/issue/51307> >> >> And I do not have a strong opinion. Just a naive remark. > > I don't think the arguments for (guix scripts hash) apply directly > to (guix hash) -- (guix scripts hash) supports multiple serialisers: > > * none (regular in (guix hash) terminology) > * git > * nar > * swh I think IWBN eventually for ‘file-hash*’ to have a #:serializer argument. (guix scripts hash) would then become a thin layer above (guix hash). That #:serializer would have a default value, probably “none”, but no “auto-detection” as this amount of guessing would make for a fragile interface IMO. (Perhaps ‘file-hash’ is too generic-sounding; for a ‘source-code-hash’ or similarly-named procedure, some defaults and guesswork would be more obvious.) Thanks, Ludo’.
Ludovic Courtès schreef op do 06-01-2022 om 11:13 [+0100]: > That #:serializer would have a default value, probably “none”, but no > “auto-detection” as this amount of guessing would make for a fragile > interface IMO. The auto-detection could be put into a separate procedure (define (guess-nar-or-none-serializer file) ; to be renamed (if directory? 'nar 'none)) so that 'file-hash*' is not fragile, but auto-detection can still happen if explicitely asked for: (file-hash* file #:serializer (guess-nar-or-serializer file)) WDYT?
Hi, On Thu, 06 Jan 2022 at 11:13, Ludovic Courtès <ludo@gnu.org> wrote: > I think IWBN eventually for ‘file-hash*’ to have a #:serializer > argument. (guix scripts hash) would then become a thin layer above > (guix hash). > > That #:serializer would have a default value, probably “none”, but no > “auto-detection” as this amount of guessing would make for a fragile > interface IMO. It was my idea behind when I proposed to rename #:recursive? to #:nar-serializer?. ;-) The name #:nar-serializer? was somehow an intermediary step since the patch was already ready and, as Maxime explained, the change for #:serializer was implying another round for adjusting and v4/v5 was already enough effort put in. :-) Anyway, the patch is ready (probably already pushed \o/ :-)). Nothing prevent us (me?) to propose later a patch for this adjustment. ;-) Cheers, simon
diff --git a/Makefile.am b/Makefile.am index 8c5682a1c6..bc3d0087d0 100644 --- a/Makefile.am +++ b/Makefile.am @@ -99,6 +99,7 @@ MODULES = \ guix/extracting-download.scm \ guix/git-download.scm \ guix/hg-download.scm \ + guix/hash.scm \ guix/swh.scm \ guix/monads.scm \ guix/monad-repl.scm \ diff --git a/guix/hash.scm b/guix/hash.scm new file mode 100644 index 0000000000..19cbc41ad1 --- /dev/null +++ b/guix/hash.scm @@ -0,0 +1,68 @@ +;;; GNU Guix --- Functional package management for GNU +;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev> +;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be> +;;; +;;; This file is part of GNU Guix. +;;; +;;; GNU Guix is free software; you can redistribute it and/or modify it +;;; under the terms of the GNU General Public License as published by +;;; the Free Software Foundation; either version 3 of the License, or (at +;;; your option) any later version. +;;; +;;; GNU Guix is distributed in the hope that it will be useful, but +;;; WITHOUT ANY WARRANTY; without even the implied warranty of +;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;;; GNU General Public License for more details. +;;; +;;; You should have received a copy of the GNU General Public License +;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>. + +(define-module (guix hash) + #:use-module (gcrypt hash) + #:use-module (guix serialization) + #:use-module (srfi srfi-1) + #:use-module (srfi srfi-11) + #:export (vcs-file? + file-hash*)) + +(define (vcs-file? file stat) + "Returns true if FILE is a version control system file." + (case (stat:type stat) + ((directory) + (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS"))) + ((regular) + ;; Git sub-modules have a '.git' file that is a regular text file. + (string=? (basename file) ".git")) + (else + #f))) + +(define* (file-hash* file #:key + (algorithm (hash-algorithm sha256)) + (recursive? 'auto) + (select? (negate vcs-file?))) + "Compute the hash of FILE with ALGORITHM. If RECURSIVE? is #true or 'auto', +recurse into subdirectories of FILE, computing the combined hash (nar hash) of +all files for which (SELECT? FILE STAT) returns true. + +Symbolic links are not dereferenced unless RECURSIVE? is false. + +This procedure must only be used under controlled circumstances; +the detection of symbolic links in FILE is racy. + +Keep in mind that the hash of a regular file depends on RECURSIVE?: +if the recursive hash is desired, it must be set to #true. Otherwise, it must +be set to #false or 'auto'. In most situations, the non-recursive hash is desired +for regular files." + (if (or (eq? recursive? #true) + (and (eq? recursive? 'auto) + ;; Don't change this to (eq? 'directory ...), because otherwise + ;; if 'file' denotes a symbolic link, the 'file-hash' below + ;; would dereference it -- dereferencing symbolic links would + ;; open an avoidable can of potential worms. + (not (eq? 'regular (stat:type (lstat file)))))) + (let-values (((port get-hash) + (open-hash-port algorithm))) + (write-file file port #:select? select?) + (force-output port) + (get-hash)) + (file-hash algorithm file))) diff --git a/guix/scripts/hash.scm b/guix/scripts/hash.scm index d73e3d13dd..28d587b944 100644 --- a/guix/scripts/hash.scm +++ b/guix/scripts/hash.scm @@ -4,6 +4,7 @@ ;;; Copyright © 2016 Jan Nieuwenhuizen <janneke@gnu.org> ;;; Copyright © 2018 Tim Gesthuizen <tim.gesthuizen@yahoo.de> ;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com> +;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev> ;;; ;;; This file is part of GNU Guix. ;;; @@ -24,6 +25,7 @@ #:use-module (gcrypt hash) #:use-module (guix serialization) #:use-module (guix ui) + #:use-module (guix hash) #:use-module (guix scripts) #:use-module (guix base16) #:use-module (guix base32) @@ -46,20 +48,14 @@ (define* (nar-hash file #:optional (algorithm (assoc-ref %default-options 'hash-algorithm)) select?) - (let-values (((port get-hash) - (open-hash-port algorithm))) - (write-file file port #:select? select?) - (force-output port) - (get-hash))) + (file-hash* file #:algorithm algorithm #:select? select? #:recursive? #true)) (define* (default-hash file #:optional (algorithm (assoc-ref %default-options 'hash-algorithm)) select?) (match file ("-" (port-hash algorithm (current-input-port))) - (_ - (call-with-input-file file - (cute port-hash algorithm <>))))) + (_ (file-hash* file #:algorithm algorithm #:recursive? #false)))) (define* (git-hash file #:optional (algorithm (assoc-ref %default-options 'hash-algorithm)) @@ -181,16 +177,6 @@ use '--serializer' instead~%")) (parse-command-line args %options (list %default-options) #:build-options? #f)) - (define (vcs-file? file stat) - (case (stat:type stat) - ((directory) - (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS"))) - ((regular) - ;; Git sub-modules have a '.git' file that is a regular text file. - (string=? (basename file) ".git")) - (else - #f))) - (let* ((opts (parse-options)) (args (filter-map (match-lambda (('argument . value)
From: Sarah Morgensen <iskarian@mgsn.dev> * guix/scripts/hash.scm (guix-hash)[vcs-file?] (nar-hash, default-hash): Extract hashing logic to... * guix/hash.scm (vcs-file?, file-hash*): ... these new procedures in this new file. Modified-by: Maxime Devos <maximedevos@telenet.be> --- Makefile.am | 1 + guix/hash.scm | 68 +++++++++++++++++++++++++++++++++++++++++++ guix/scripts/hash.scm | 22 +++----------- 3 files changed, 73 insertions(+), 18 deletions(-) create mode 100644 guix/hash.scm