Message ID | 871qle6p4x.fsf_-_@inria.fr |
---|---|
State | New |
Headers | show |
Series | [bug#43442] Subversion keyword substitution | expand |
Hey, Ludovic Courtès <ludovic.courtes@inria.fr> writes: > Hi Timothy, > > Timothy Sample <samplet@ngyro.com> skribis: > >> I was starting with doing a simple check for the “easy” Subversion >> repositories. That is, no externals (‘recursive?’) and no >> ‘svn-multi-fetch’ [1]. I immediately hit a problem. Guix hashes the >> export of the repository with the keywords processed, while SWH hashes >> it with unprocessed keywords. > > Ouch. I had forgotten keywords were also a thing in Subversion. :-/ > > Can we tell Subversion to not expand them? That could be the way > forward in Guix, though it won’t help for past revisions. Thinking entirely abstractly, the keywords should be expanded. I’m not really long enough in the tooth (old enough) to know how people use keywords, but one might be tempted to do something like: printf ("This is foo version %s\n", "$Revision$"); If that ever happens, processing the keywords would be very important. Thinking practically, we’ve never encountered anything like that (see below), and compatibility with SWH is nice. It’s not clear to me why SWH passes ‘--ignore-keywords’ to Subversion in the first place. I guess it saves storage, because having identical files allows deduplication. > How frequent is the use of keywords though? Well, you found 11 in the current Guix, and I see 30 when I process everything I have (from version 1.0 to a few weeks ago). Furthermore, the only usage pattern I see is “$Id” in a comment. > So, how about applying the ‘--ignore-keywords’ change and updating > hashes accordingly? It’s probably the right default given the circumstances. It seems like there’s a direct conflict between ease of packaging and ease of time travel. In the hypothetical case that a keyword mattered, it would be a nasty surprise to the package author. They would have to (a) discover the problem and (b) manually do the keyword substitution in Scheme (or work around it). Similarly, for recursive checkouts (including Git), it would be easier to do time travel if we explicitly listed each source and how to compose them. However, that’s a pain for package authors and maintainers. Add to those two examples all the discussions about multihashing or keeping track of the SWHID and something of a theme emerges.... Having said that, it’s nice that the work we are doing is letting us think very clearly about this trade-off. >> [1] More precisely, I was going to process recursive ‘svn-fetch’ >> origins because a lot of them are needlessly marked as recursive. In >> some (many?) cases, the repositories don’t actually have external >> references, so the flag does nothing. I was only going to skip the >> ones where it makes a difference. > > We should remove that recursive flag when it has no effect. Perhaps we > could proceed similarly? Huh. My scripts tell me that we haven’t needed it at all in the last three years. That’s a suspicious enough result that I wonder if there’s a bug in my scripts. The results are looking good so far, but there are a few things I still need to look over. I ran your same experiment, passing all the packages through: (define (make-svn-package-non-recursive pkg) (package (inherit pkg) (source (origin (inherit (package-source pkg)) (uri (match (origin-uri (package-source pkg)) ((? svn-reference? ref) (svn-reference (inherit ref) (recursive? #f))) ((? svn-multi-reference? ref) (svn-multi-reference (inherit ref) (recursive? #f))))))))) All of them were fine. Maybe we just haven’t had a truly recursive Subversion source in recent memory. -- Tim
Hi, Timothy Sample <samplet@ngyro.com> skribis: > Thinking entirely abstractly, the keywords should be expanded. I’m not > really long enough in the tooth (old enough) to know how people use > keywords, but one might be tempted to do something like: > > printf ("This is foo version %s\n", "$Revision$"); > > If that ever happens, processing the keywords would be very important. “Very” might be an overstatement. :-) In practice, these were typically used in source file headers, so that if you exported or copied files around (outside version control), they’d have a timestamp of sorts at the top. [...] > It’s not clear to me why SWH passes ‘--ignore-keywords’ to Subversion in > the first place. I guess it saves storage, because having identical > files allows deduplication. I asked on #swh-devel and the fine folks there hinted at non-reproducibility. Looking at <https://svnbook.red-bean.com/en/1.7/svn.advanced.props.special.keywords.html>, one thing that’s definitely not reproducible is the “local time zone” bit. From that perspective it makes a lot of sense to disable keyword substitution. >> How frequent is the use of keywords though? > > Well, you found 11 in the current Guix, and I see 30 when I process > everything I have (from version 1.0 to a few weeks ago). Furthermore, > the only usage pattern I see is “$Id” in a comment. Interesting. >> So, how about applying the ‘--ignore-keywords’ change and updating >> hashes accordingly? > > It’s probably the right default given the circumstances. OK. I’ll submit a patch to that effect, unless you beat me at it. :-) > It seems like there’s a direct conflict between ease of packaging and > ease of time travel. In the hypothetical case that a keyword mattered, > it would be a nasty surprise to the package author. They would have to > (a) discover the problem and (b) manually do the keyword substitution in > Scheme (or work around it). My intuition is that the worst “problem” we might have is ‘--version’ showing unexpanded keywords. [...] >> We should remove that recursive flag when it has no effect. Perhaps we >> could proceed similarly? > > Huh. My scripts tell me that we haven’t needed it at all in the last > three years. That’s a suspicious enough result that I wonder if there’s > a bug in my scripts. The results are looking good so far, but there are > a few things I still need to look over. Looks like it might be easily addressed! Thanks, Ludo’.
Hi, On lun., 27 mars 2023 at 11:04, Ludovic Courtès <ludovic.courtes@inria.fr> wrote: >> It seems like there’s a direct conflict between ease of packaging and >> ease of time travel. In the hypothetical case that a keyword mattered, >> it would be a nasty surprise to the package author. They would have to >> (a) discover the problem and (b) manually do the keyword substitution in >> Scheme (or work around it). That’s an application of the non-free lunch theorem, no? :-) > My intuition is that the worst “problem” we might have is ‘--version’ > showing unexpanded keywords. Somehow, I would prefer to pay the surprise by package author using Subversion rather than not being fully reproducible over the time. Cheers, simon
Hi! Ludovic Courtès <ludovic.courtes@inria.fr> writes: > OK. I’ll submit a patch to that effect, unless you beat me at it. :-) I’m on it! It might take me another day or two to actually submit the patch. The keyword expansion change affects ‘texlive-bin’, which means a lot of rebuilds, so I guess we will need to coordinate a feature branch or whatever (my understanding is that core-updates is essentially frozen and deprecated at this point). > Timothy Sample <samplet@ngyro.com> skribis: > >> Thinking entirely abstractly, the keywords should be expanded. I’m not >> really long enough in the tooth (old enough) to know how people use >> keywords, but one might be tempted to do something like: >> >> printf ("This is foo version %s\n", "$Revision$"); >> >> If that ever happens, processing the keywords would be very important. > > “Very” might be an overstatement. :-) > > In practice, these were typically used in source file headers, so that > if you exported or copied files around (outside version control), they’d > have a timestamp of sorts at the top. I ended up finding 17 origins that make use of keyword expansion. Two of them indeed do so outside of comments. (1) The “texlive-scripts” source has some Perl scripts that do the Perl equivalent of my example above. (2) There’s some Java code (“geronimo”) that uses keywords in Javadoc comments, which might show up in generated documentation. So no, definitely not “very” important! :) >> Huh. My scripts tell me that we haven’t needed it at all in the last >> three years. That’s a suspicious enough result that I wonder if there’s >> a bug in my scripts. The results are looking good so far, but there are >> a few things I still need to look over. > > Looks like it might be easily addressed! I’ll switch the ‘recursive?’ field to ‘#f’ by default as part of the series I send in. -- Tim
Hi! Timothy Sample <samplet@ngyro.com> skribis: >> So, how about applying the ‘--ignore-keywords’ change and updating >> hashes accordingly? > > It’s probably the right default given the circumstances. Patch submitted: <https://issues.guix.gnu.org/62712>. Ludo’.
diff --git a/guix/build/svn.scm b/guix/build/svn.scm index 2d960cb364..863c48e46d 100644 --- a/guix/build/svn.scm +++ b/guix/build/svn.scm @@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU -;;; Copyright © 2014, 2020 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2014, 2020, 2023 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2014 Sree Harsha Totakura <sreeharsha@totakura.in> ;;; Copyright © 2018 Mark H Weaver <mhw@netris.org> ;;; Copyright © 2020 Simon Tournier <zimon.toutoune@gmail.com> @@ -47,6 +47,11 @@ (define* (svn-fetch url revision directory ;; verify the checksum later. This can be removed when ;; ca-certificates package is added. "--trust-server-cert" "-r" (number->string revision) + + ;; Disable keyword substitution (keywords are CVS-like strings + ;; like "$Date$", "$Id$", and so on). + "--ignore-keywords" + `(,@(if (and user-name password) (list (string-append "--username=" user-name) (string-append "--password=" password))