diff mbox series

[bug#43442] Subversion keyword substitution

Message ID 871qle6p4x.fsf_-_@inria.fr
State New
Headers show
Series [bug#43442] Subversion keyword substitution | expand

Commit Message

Ludovic Courtès March 24, 2023, 5:22 p.m. UTC
Hi Timothy,

Timothy Sample <samplet@ngyro.com> skribis:

> I was starting with doing a simple check for the “easy” Subversion
> repositories.  That is, no externals (‘recursive?’) and no
> ‘svn-multi-fetch’ [1].  I immediately hit a problem.  Guix hashes the
> export of the repository with the keywords processed, while SWH hashes
> it with unprocessed keywords.

Ouch.  I had forgotten keywords were also a thing in Subversion.  :-/

Can we tell Subversion to not expand them?  That could be the way
forward in Guix, though it won’t help for past revisions.

How frequent is the use of keywords though?  I tried the patch below to
turn off keyword substitution, and then ran

  guix build -S --check -v1 --keep-going

for the 407 packages identified by:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define svn (fold-packages (lambda (p lst)
						  (if (and (not (hidden-package? p))
							   (not (package-superseded p))
							   (origin? (package-source p))
							   (memq (origin-method (package-source p))
								 (list svn-fetch svn-multi-fetch)))
						      (cons p lst)
						      lst))
						'()))
scheme@(guile-user)> (length svn)
$8 = 407
--8<---------------cut here---------------end--------------->8---

That led to:

--8<---------------cut here---------------start------------->8---
guix build: error: build of `/gnu/store/2byn59zmdbc4bz2wknnv0df4n67bdvgr-texlive-pdftex-59745-checkout.drv', `/gnu/store/2gj88z4plmwhraghxj5626zpiir1ck6k-libsmpeg-0.4.5-401-checkout.drv', `/gnu/store/2zygylsb2b333rzrvjyrh4qybw799hl3-ghmm-0.9-rc3-0.2341-checkout.drv', `/gnu/store/4a81qlka5w73rprapzi1w63xzb01n0r8-java-geronimo-xbean-reflect-4.5.drv', `/gnu/store/4mabgwil0ygwm3bkka3nzfbrwg1kk0wz-texlive-kpathsea-59745-checkout.drv', `/gnu/store/5ivk83abj22bs9ka10dk1v67kyczcd80-texlive-dvips-59745-checkout.drv', `/gnu/store/6zhnahylfr1zmpwzb8qzh8qp3yf9yl1p-texlive-tex-plain-59745-checkout.drv', `/gnu/store/f1sjmghs0f4v0y2pnljqaplifq52qbn2-texlive-cm-59745-checkout.drv', `/gnu/store/kd7kahaq71gi8j6zbabr0njqw60sjjxc-libsmpeg-0.4.5-399-checkout.drv', `/gnu/store/q3kip5bxkkdh14kx81afakbmpy7l4lr4-texlive-latexconfig-59745-checkout.drv', `/gnu/store/qskxc2c30fdclmrjp7nk35n1q2sl6sp8-texlive-tetex-59745-checkout.drv', `/gnu/store/wibsxy4kxlpq8lr76wvwgdyc2x5farr4-texlive-hyphen-base-59745-checkout.drv' failed
--8<---------------cut here---------------end--------------->8---

That’s 11 failures out of 407.

So, how about applying the ‘--ignore-keywords’ change and updating
hashes accordingly?

> [1] More precisely, I was going to process recursive ‘svn-fetch’ origins
> because a lot of them are needlessly marked as recursive.  In some
> (many?) cases, the repositories don’t actually have external references,
> so the flag does nothing.  I was only going to skip the ones where it
> makes a difference.

We should remove that recursive flag when it has no effect.  Perhaps we
could proceed similarly?

Thanks,
Ludo’.

Comments

Timothy Sample March 24, 2023, 11:31 p.m. UTC | #1
Hey,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> Hi Timothy,
>
> Timothy Sample <samplet@ngyro.com> skribis:
>
>> I was starting with doing a simple check for the “easy” Subversion
>> repositories.  That is, no externals (‘recursive?’) and no
>> ‘svn-multi-fetch’ [1].  I immediately hit a problem.  Guix hashes the
>> export of the repository with the keywords processed, while SWH hashes
>> it with unprocessed keywords.
>
> Ouch.  I had forgotten keywords were also a thing in Subversion.  :-/
>
> Can we tell Subversion to not expand them?  That could be the way
> forward in Guix, though it won’t help for past revisions.

Thinking entirely abstractly, the keywords should be expanded.  I’m not
really long enough in the tooth (old enough) to know how people use
keywords, but one might be tempted to do something like:

    printf ("This is foo version %s\n", "$Revision$");

If that ever happens, processing the keywords would be very important.

Thinking practically, we’ve never encountered anything like that (see
below), and compatibility with SWH is nice.

It’s not clear to me why SWH passes ‘--ignore-keywords’ to Subversion in
the first place.  I guess it saves storage, because having identical
files allows deduplication.

> How frequent is the use of keywords though?

Well, you found 11 in the current Guix, and I see 30 when I process
everything I have (from version 1.0 to a few weeks ago).  Furthermore,
the only usage pattern I see is “$Id” in a comment.

> So, how about applying the ‘--ignore-keywords’ change and updating
> hashes accordingly?

It’s probably the right default given the circumstances.

It seems like there’s a direct conflict between ease of packaging and
ease of time travel.  In the hypothetical case that a keyword mattered,
it would be a nasty surprise to the package author.  They would have to
(a) discover the problem and (b) manually do the keyword substitution in
Scheme (or work around it).

Similarly, for recursive checkouts (including Git), it would be easier
to do time travel if we explicitly listed each source and how to compose
them.  However, that’s a pain for package authors and maintainers.

Add to those two examples all the discussions about multihashing or
keeping track of the SWHID and something of a theme emerges....

Having said that, it’s nice that the work we are doing is letting us
think very clearly about this trade-off.

>> [1] More precisely, I was going to process recursive ‘svn-fetch’
>> origins because a lot of them are needlessly marked as recursive.  In
>> some (many?) cases, the repositories don’t actually have external
>> references, so the flag does nothing.  I was only going to skip the
>> ones where it makes a difference.
>
> We should remove that recursive flag when it has no effect.  Perhaps we
> could proceed similarly?

Huh.  My scripts tell me that we haven’t needed it at all in the last
three years.  That’s a suspicious enough result that I wonder if there’s
a bug in my scripts.  The results are looking good so far, but there are
a few things I still need to look over.

I ran your same experiment, passing all the packages through:

    (define (make-svn-package-non-recursive pkg)
      (package
       (inherit pkg)
       (source
        (origin
         (inherit (package-source pkg))
         (uri (match (origin-uri (package-source pkg))
                ((? svn-reference? ref)
                 (svn-reference
                  (inherit ref)
                  (recursive? #f)))
                ((? svn-multi-reference? ref)
                 (svn-multi-reference
                  (inherit ref)
                  (recursive? #f)))))))))

All of them were fine.  Maybe we just haven’t had a truly recursive
Subversion source in recent memory.


-- Tim
Ludovic Courtès March 27, 2023, 9:04 a.m. UTC | #2
Hi,

Timothy Sample <samplet@ngyro.com> skribis:

> Thinking entirely abstractly, the keywords should be expanded.  I’m not
> really long enough in the tooth (old enough) to know how people use
> keywords, but one might be tempted to do something like:
>
>     printf ("This is foo version %s\n", "$Revision$");
>
> If that ever happens, processing the keywords would be very important.

“Very” might be an overstatement.  :-)

In practice, these were typically used in source file headers, so that
if you exported or copied files around (outside version control), they’d
have a timestamp of sorts at the top.

[...]

> It’s not clear to me why SWH passes ‘--ignore-keywords’ to Subversion in
> the first place.  I guess it saves storage, because having identical
> files allows deduplication.

I asked on #swh-devel and the fine folks there hinted at
non-reproducibility.  Looking at
<https://svnbook.red-bean.com/en/1.7/svn.advanced.props.special.keywords.html>,
one thing that’s definitely not reproducible is the “local time zone”
bit.  From that perspective it makes a lot of sense to disable keyword
substitution.

>> How frequent is the use of keywords though?
>
> Well, you found 11 in the current Guix, and I see 30 when I process
> everything I have (from version 1.0 to a few weeks ago).  Furthermore,
> the only usage pattern I see is “$Id” in a comment.

Interesting.

>> So, how about applying the ‘--ignore-keywords’ change and updating
>> hashes accordingly?
>
> It’s probably the right default given the circumstances.

OK.  I’ll submit a patch to that effect, unless you beat me at it.  :-)

> It seems like there’s a direct conflict between ease of packaging and
> ease of time travel.  In the hypothetical case that a keyword mattered,
> it would be a nasty surprise to the package author.  They would have to
> (a) discover the problem and (b) manually do the keyword substitution in
> Scheme (or work around it).

My intuition is that the worst “problem” we might have is ‘--version’
showing unexpanded keywords.

[...]

>> We should remove that recursive flag when it has no effect.  Perhaps we
>> could proceed similarly?
>
> Huh.  My scripts tell me that we haven’t needed it at all in the last
> three years.  That’s a suspicious enough result that I wonder if there’s
> a bug in my scripts.  The results are looking good so far, but there are
> a few things I still need to look over.

Looks like it might be easily addressed!

Thanks,
Ludo’.
Simon Tournier April 3, 2023, 12:05 p.m. UTC | #3
Hi,

On lun., 27 mars 2023 at 11:04, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

>> It seems like there’s a direct conflict between ease of packaging and
>> ease of time travel.  In the hypothetical case that a keyword mattered,
>> it would be a nasty surprise to the package author.  They would have to
>> (a) discover the problem and (b) manually do the keyword substitution in
>> Scheme (or work around it).

That’s an application of the non-free lunch theorem, no? :-)


> My intuition is that the worst “problem” we might have is ‘--version’
> showing unexpanded keywords.

Somehow, I would prefer to pay the surprise by package author using
Subversion rather than not being fully reproducible over the time.


Cheers,
simon
Timothy Sample April 4, 2023, 5:16 p.m. UTC | #4
Hi!

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> OK.  I’ll submit a patch to that effect, unless you beat me at it.  :-)

I’m on it!  It might take me another day or two to actually submit the
patch.  The keyword expansion change affects ‘texlive-bin’, which means
a lot of rebuilds, so I guess we will need to coordinate a feature
branch or whatever (my understanding is that core-updates is essentially
frozen and deprecated at this point).

> Timothy Sample <samplet@ngyro.com> skribis:
>
>> Thinking entirely abstractly, the keywords should be expanded.  I’m not
>> really long enough in the tooth (old enough) to know how people use
>> keywords, but one might be tempted to do something like:
>>
>>     printf ("This is foo version %s\n", "$Revision$");
>>
>> If that ever happens, processing the keywords would be very important.
>
> “Very” might be an overstatement.  :-)
>
> In practice, these were typically used in source file headers, so that
> if you exported or copied files around (outside version control), they’d
> have a timestamp of sorts at the top.

I ended up finding 17 origins that make use of keyword expansion.  Two
of them indeed do so outside of comments.  (1) The “texlive-scripts”
source has some Perl scripts that do the Perl equivalent of my example
above.  (2) There’s some Java code (“geronimo”) that uses keywords in
Javadoc comments, which might show up in generated documentation.

So no, definitely not “very” important!  :)

>> Huh.  My scripts tell me that we haven’t needed it at all in the last
>> three years.  That’s a suspicious enough result that I wonder if there’s
>> a bug in my scripts.  The results are looking good so far, but there are
>> a few things I still need to look over.
>
> Looks like it might be easily addressed!

I’ll switch the ‘recursive?’ field to ‘#f’ by default as part of the
series I send in.


-- Tim
Ludovic Courtès April 7, 2023, 4:45 p.m. UTC | #5
Hi!

Timothy Sample <samplet@ngyro.com> skribis:

>> So, how about applying the ‘--ignore-keywords’ change and updating
>> hashes accordingly?
>
> It’s probably the right default given the circumstances.

Patch submitted: <https://issues.guix.gnu.org/62712>.

Ludo’.
diff mbox series

Patch

diff --git a/guix/build/svn.scm b/guix/build/svn.scm
index 2d960cb364..863c48e46d 100644
--- a/guix/build/svn.scm
+++ b/guix/build/svn.scm
@@ -1,5 +1,5 @@ 
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2014, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014, 2020, 2023 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2014 Sree Harsha Totakura <sreeharsha@totakura.in>
 ;;; Copyright © 2018 Mark H Weaver <mhw@netris.org>
 ;;; Copyright © 2020 Simon Tournier <zimon.toutoune@gmail.com>
@@ -47,6 +47,11 @@  (define* (svn-fetch url revision directory
            ;; verify the checksum later.  This can be removed when
            ;; ca-certificates package is added.
            "--trust-server-cert" "-r" (number->string revision)
+
+           ;; Disable keyword substitution (keywords are CVS-like strings
+           ;; like "$Date$", "$Id$", and so on).
+           "--ignore-keywords"
+
            `(,@(if (and user-name password)
                    (list (string-append "--username=" user-name)
                          (string-append "--password=" password))