mbox series

[bug#70169,0/7] Reproducible `make dist' tarball in defiance of Autotools and Gettext

Message ID cover.1712165977.git.janneke@gnu.org
Headers show
Series Reproducible `make dist' tarball in defiance of Autotools and Gettext | expand

Message

Janneke Nieuwenhuizen April 3, 2024, 7:08 p.m. UTC
Hi,

The recent XZ-utils <https://www.openwall.com/lists/oss-security/2024/03/29/4>
debacle inspired me to resurrect and finish my patch set for creating a
reproducible source tarball for Guix, i.e., finally have `make dist' be
reproducible (when run from Git).  I've been using a version of these patches
in simpler projects for some years now and stole one from Timothy Samplet's
Gash project.

Autotools and Gettext still make it harder than necessary to do reproducible
(responsible?) computing, which is especially sad given the fact that the
Reproducible Builds project recently had their 10th birthday
<https://reproducible-builds.org/_lfs/presentations/2023-05-27-R-B-the-first-10-years/#/>.

Gettext tooling embeds timestamps found in the file-system, fails to respect
SOURCE_DATE_EPOCH, and lacks options like `--pot-creation-date' so that we
have to resort to SED to fixup.  The caching of all sorts of information, in
separate build stages, also doesn't help.

To create a reproducible source tarball, having a reproducible build
environment is a prerequitite, so this would have to be recorded too.
Using this patch set, I created a tarball doing something like

--8<---------------cut here---------------start------------->8---
guix pull --commit=1dbe492b993a7629df3b35146ce0272b52913776
guix shell
bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc && make dist
guix hash guix-1.3.0.57425-80a228.tar.gz
0mk59ay5k2dxmjni9fx4i8qyfhvlgxbhqzsjpg2pbw381nskkxbj
--8<---------------cut here---------------end--------------->8---

and I've uploaded it to

    https://lilypond.org/janneke/guix/guix-1.3.0.57425-80a228.tar.gz

Who can reproduce it...and WDYT?

(I've also pushed this patch set to `wip-tarball', as a slight difference
may already produce another tarball).

Greetings,
Janneke

Janneke Nieuwenhuizen (6):
  maint: Cater for running `make dist' from a worktree.
  maint: Use reproducible timestamps and name for tarball.
  maint: Help help2man generate reproducible man-pages.
  maint: Generate 'doc/version-LANG.texi' reproducibly.
  maint: Use reproducible Git timestamp for POT-Creation-Date.
  maint: Ensure generated file reproducibility for dist.

Timothy Sample (1):
  maint: Generate 'doc/version.texi' reproducibly.

 Makefile.am     | 18 ++++++++++++++---
 doc/local.mk    | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 po/doc/local.mk | 16 +++++++++++----
 3 files changed, 81 insertions(+), 7 deletions(-)


base-commit: df64d48e6f9f648044aa5279c045b8d6f7bee604

Comments

Ludovic Courtès April 3, 2024, 8:57 p.m. UTC | #1
Hi!

Janneke Nieuwenhuizen <janneke@gnu.org> skribis:

> The recent XZ-utils <https://www.openwall.com/lists/oss-security/2024/03/29/4>
> debacle inspired me to resurrect and finish my patch set for creating a
> reproducible source tarball for Guix, i.e., finally have `make dist' be
> reproducible (when run from Git).  I've been using a version of these patches
> in simpler projects for some years now and stole one from Timothy Samplet's
> Gash project.

Yay, kudos to you and Timothy!

> Autotools and Gettext still make it harder than necessary to do reproducible
> (responsible?) computing, which is especially sad given the fact that the
> Reproducible Builds project recently had their 10th birthday
> <https://reproducible-builds.org/_lfs/presentations/2023-05-27-R-B-the-first-10-years/#/>.
>
> Gettext tooling embeds timestamps found in the file-system, fails to respect
> SOURCE_DATE_EPOCH, and lacks options like `--pot-creation-date' so that we
> have to resort to SED to fixup.  The caching of all sorts of information, in
> separate build stages, also doesn't help.

Sadness indeed.  Hopefully things will improve in the coming weeks, now
that there’s an impetus.

> To create a reproducible source tarball, having a reproducible build
> environment is a prerequitite, so this would have to be recorded too.
> Using this patch set, I created a tarball doing something like
>
> guix pull --commit=1dbe492b993a7629df3b35146ce0272b52913776
> guix shell
> bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc && make dist
> guix hash guix-1.3.0.57425-80a228.tar.gz
> 0mk59ay5k2dxmjni9fx4i8qyfhvlgxbhqzsjpg2pbw381nskkxbj

I applied the whole series on top of
df64d48e6f9f648044aa5279c045b8d6f7bee604 (the ‘base-commit’ at the
bottom of your message).  Thus I got the same content as you but with a
different commit ID.

“make dist” gave me guix-1.3.0.57425-9f4a4a.tar.gz.  The name indeed
corresponds to the tip of my tree:

--8<---------------cut here---------------start------------->8---
$ guix hash guix-1.3.0.57425-9f4a4a.tar.gz
0z3c4f8g6rsi9n0j8cwzwvw4bc59srg6bl3jj8yi60hbr9vrz5ql
$ git log |head
commit 9f4a4adfa778b281b794b61014e06dad98b6c945
Author: Janneke Nieuwenhuizen <janneke@gnu.org>
Date:   Wed Apr 3 21:11:09 2024 +0200

    maint: Ensure generated file reproducibility for dist.
    
    * doc/local.mk (override $(srcdir)/doc/stamp-vti): New target override.
    (doc-clean, man-clean): New targets.
    (auto-clean): Depend on it in new target.
    (DIST_CONFIGURE_FLAGS): New variable.
--8<---------------cut here---------------end--------------->8---

But as a result, I get a different hash, and since the directory in the
tarball has a different name, ‘diffoscope’ isn’t very helpful.

There’s at least one relevant difference in the gzip metadata:

--8<---------------cut here---------------start------------->8---
--- guix-1.3.0.57425-9f4a4a.tar.gz
+++ /tmp/guix-1.3.0.57425-80a228.tar.gz
├── filetype from file(1)
│ @@ -1 +1 @@
│ -gzip compressed data, from Unix, original size modulo 2^32 208138240 gzip compressed data, rese
rved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT)
│ +gzip compressed data, from Unix, original size modulo 2^32 222504960 gzip compressed data, rese
rved method, ASCII, has CRC, was "", has comment, encrypted, from FAT filesystem (MS-DOS, OS/2, NT
--8<---------------cut here---------------end--------------->8---

(Your tarball has a CRC and comment, mine doesn’t.)

Maybe we’ll have to iterate once you’ve pushed a first version, so we
can truly build the same thing.  Or we should push the branch somewhere
(or use the one from <https://data.qa.guix.gnu.org/> once it’s been
created).

Thanks!

Ludo’.
Janneke Nieuwenhuizen April 3, 2024, 9:04 p.m. UTC | #2
Ludovic Courtès writes:

Hello,

> Janneke Nieuwenhuizen <janneke@gnu.org> skribis:
>
>> The recent XZ-utils <https://www.openwall.com/lists/oss-security/2024/03/29/4>
>> debacle inspired me to resurrect and finish my patch set for creating a
>> reproducible source tarball for Guix, i.e., finally have `make dist' be
>> reproducible (when run from Git).  I've been using a version of these patches
>> in simpler projects for some years now and stole one from Timothy Samplet's
>> Gash project.
>
> Yay, kudos to you and Timothy!

\o/

>> Autotools and Gettext still make it harder than necessary to do reproducible
>> (responsible?) computing, which is especially sad given the fact that the
>> Reproducible Builds project recently had their 10th birthday
>> <https://reproducible-builds.org/_lfs/presentations/2023-05-27-R-B-the-first-10-years/#/>.
>>
>> Gettext tooling embeds timestamps found in the file-system, fails to respect
>> SOURCE_DATE_EPOCH, and lacks options like `--pot-creation-date' so that we
>> have to resort to SED to fixup.  The caching of all sorts of information, in
>> separate build stages, also doesn't help.
>
> Sadness indeed.  Hopefully things will improve in the coming weeks, now
> that there’s an impetus.

Yes, that would be nice.  With more people joining the effort, it could
be fixed brilliantly, in no time :)

>> To create a reproducible source tarball, having a reproducible build
>> environment is a prerequitite, so this would have to be recorded too.
>> Using this patch set, I created a tarball doing something like
>>
>> guix pull --commit=1dbe492b993a7629df3b35146ce0272b52913776
>> guix shell
>> bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc && make dist
>> guix hash guix-1.3.0.57425-80a228.tar.gz
>> 0mk59ay5k2dxmjni9fx4i8qyfhvlgxbhqzsjpg2pbw381nskkxbj
>
> I applied the whole series on top of
> df64d48e6f9f648044aa5279c045b8d6f7bee604 (the ‘base-commit’ at the
> bottom of your message).  Thus I got the same content as you but with a
> different commit ID.

Yeah..., that's why I pushed `wip-tarball'.  We even look at the
committer's timestamp (not author, as that could be quite old).

> “make dist” gave me guix-1.3.0.57425-9f4a4a.tar.gz.  The name indeed
> corresponds to the tip of my tree:

[..]

> But as a result, I get a different hash, and since the directory in the
> tarball has a different name, ‘diffoscope’ isn’t very helpful.
>
> There’s at least one relevant difference in the gzip metadata:
>
> --- guix-1.3.0.57425-9f4a4a.tar.gz
> +++ /tmp/guix-1.3.0.57425-80a228.tar.gz
> ├── filetype from file(1)
> │ @@ -1 +1 @@
> │ -gzip compressed data, from Unix, original size modulo 2^32 208138240 gzip compressed data, rese
> rved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT)
> │ +gzip compressed data, from Unix, original size modulo 2^32 222504960 gzip compressed data, rese
> rved method, ASCII, has CRC, was "", has comment, encrypted, from FAT filesystem (MS-DOS, OS/2, NT
>
>
> (Your tarball has a CRC and comment, mine doesn’t.)

I believe this is a red herring.  I saw this all day whenever one file
had a difference...

> Maybe we’ll have to iterate once you’ve pushed a first version, so we
> can truly build the same thing.  Or we should push the branch somewhere

I boldly pushed `origin/wip-tarball', you may try that :)

Be sure to git diff between your tip and origin/wip-tarball to ascertain
I didn't place easter eggs (just kidding).

> (or use the one from <https://data.qa.guix.gnu.org/> once it’s been
> created).

Thanks a lot for trying!

Greetings,
Janneke
Ludovic Courtès April 3, 2024, 9:28 p.m. UTC | #3
Janneke Nieuwenhuizen <janneke@gnu.org> skribis:

> I boldly pushed `origin/wip-tarball', you may try that :)

Silly me. 🤦

First try: I wasn’t running in a UTF-8 locale (in ‘guix shell -CP’) so I
got things like this:

--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/AUTHORS
│ │ @@ -9,35 +9,35 @@
│ │  
│ │   10255     Ricardo Wurmus <rekado@elephly.net>
│ │    7293     Nicolas Goaziou <mail@nicolasgoaziou.fr>
│ │    5991     Efraim Flashner <efraim@flashner.co.il>
│ │    4033     Maxim Cournoyer <maxim.cournoyer@gmail.com>
│ │    3124     Tobias Geerinckx-Rice <me@tobias.gr>
│ │    2356     Marius Bakke <marius@gnu.org>
│ │ -  2306     Ludovic Court??s <ludo@gnu.org>
│ │ +  2306     Ludovic Courtès <ludo@gnu.org>
--8<---------------cut here---------------end--------------->8---

Then there’s prolly a timezone issue with the generated ChangeLog:

--8<---------------cut here---------------start------------->8---
│ │ -2024-02-19  Troy Figiel  <troy@troyfigiel.com>
│ │ +2024-02-20  Troy Figiel  <troy@troyfigiel.com>
│ │  
│ │     gnu: Add go-github-com-coocood-freecache.
│ │     * gnu/packages/golang-xyz.scm (go-github-com-coocood-freecache): New variable.
--8<---------------cut here---------------end--------------->8---

The best thing to do is probably to drop ‘ChangeLog’ generation (maybe
‘AUTHORS’ too) and just add a text inviting users to check the Git log.

Then I must have stale ‘help2man’ byproducts:

--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/doc/guix-challenge.1
│ │ @@ -1,11 +1,11 @@
│ │  .\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.49.2.
│ │  .TH GUIX "1" "April 2024" "GNU" "User Commands"
│ │  .SH NAME
│ │ -guix \- manual page for guix challenge (GNU Guix) 1.3.0.51884-370f8f3
│ │ +guix \- manual page for guix challenge (GNU Guix) 1.3.0.57425-80a228
--8<---------------cut here---------------end--------------->8---

Lots of differences in Info files:

--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/doc/guix-cookbook.fr.info
│ │┄ xxd not available in path. Falling back to Python hexlify.
│ │ @@ -1,6 +1,8216 @@
│ │  5468697320697320677569782d636f6f6b626f6f6b2e66722e696e666f2c2070
│ │  726f6475636564206279206d616b65696e666f2076657273696f6e20362e3820
│ │ -66726f6d0a677569782d636f6f6b626f6f6b2e66722e746578692e0a0a0a1f0a
│ │ -546167205461626c653a0a1f0a456e6420546167205461626c650a0a1f0a4c6f
│ │ -63616c205661726961626c65733a0a636f64696e673a207574662d380a456e64
│ │ -3a0a
│ │ +66726f6d0a677569782d636f6f6b626f6f6b2e66722e746578692e0a0a436f70
│ │ +79726967687420c2a920323031392c2032303232205269636172646f20577572
│ │ +6d75730a436f7079726967687420c2a920323031392045667261696d20466c61
│ │ +73686e65720a436f7079726967687420c2a9203230313920506965727265204e
--8<---------------cut here---------------end--------------->8---

Something with PO files not being regenerated (?):

--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/po/packages/en@boldquot.po
│ │ @@ -1,11 +1,11 @@
│ │  # English translations for guix package.
│ │ -# Copyright (C) 2020 the authors of Guix (msgids)
│ │ +# Copyright (C) 2024 the authors of Guix (msgids)
│ │  # This file is distributed under the same license as the guix package.
│ │ -# Automatically generated, 2020.
│ │ +# Automatically generated, 2024.
│ │  #
│ │  # All this catalog "translates" are quotation characters.
│ │  # The msgids must be ASCII and therefore cannot contain real quotation
│ │  # characters, only substitutes like grave accent (0x60), apostrophe (0x27)
│ │  # and double quote (0x22). These substitutes look strange; see
│ │  # http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
│ │  #
│ │ @@ -26,118 +26,85 @@
│ │  # transliterated to 0x22.
│ │  #
│ │  # This catalog furthermore displays the text between the quotation marks in
│ │  # bold face, assuming the VT100/XTerm escape sequences.
│ │  #
│ │  msgid ""
│ │  msgstr ""
│ │ -"Project-Id-Version: guix 1.2.0\n"
│ │ +"Project-Id-Version: guix 1.3.0.57419-5a2b40\n"
│ │  "Report-Msgid-Bugs-To: bug-guix@gnu.org\n"
│ │ -"POT-Creation-Date: 2020-11-22 20:33+0100\n"
│ │ -"PO-Revision-Date: 2020-11-22 20:33+0100\n"
│ │ +"POT-Creation-Date: 2024-04-03 09:04+0200\n"
│ │ +"PO-Revision-Date: 2024-04-03 09:04+0200\n"
--8<---------------cut here---------------end--------------->8---

and possibly a timezone issue for POT files:

--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/po/packages/guix-packages.pot
│ │ @@ -4,15 +4,15 @@
│ │  # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
│ │  #
│ │  #, fuzzy
│ │  msgid ""
│ │  msgstr ""
│ │  "Project-Id-Version: guix 1.3.0.57425-80a228\n"
│ │  "Report-Msgid-Bugs-To: bug-guix@gnu.org\n"
│ │ -"POT-Creation-Date: 2024-04-03 21:14+0000\n"
│ │ +"POT-Creation-Date: 2024-04-03 20:56+0200\n"
│ │  "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
--8<---------------cut here---------------end--------------->8---

Note that in ‘guix shell -CP’ I had no TZ and LC_* variable set and
/etc/timezone is missing.

That’s it for today!

Ludo’.
Janneke Nieuwenhuizen April 6, 2024, 9:16 p.m. UTC | #4
Ludovic Courtès writes:

Hello!

> Janneke Nieuwenhuizen <janneke@gnu.org> skribis:
>
>> I boldly pushed `origin/wip-tarball', you may try that :)
>
> Silly me. 🤦

Yeah, when you see it, it's obvious :)

> First try: I wasn’t running in a UTF-8 locale (in ‘guix shell -CP’) so I
> got things like this:
>
> │ ├── guix-1.3.0.57425-80a228/AUTHORS
[..]
> │ │ -  2306     Ludovic Court??s <ludo@gnu.org>
> │ │ +  2306     Ludovic Courtès <ludo@gnu.org>

Ah, not good.

> Then there’s prolly a timezone issue with the generated ChangeLog:
>
> │ │ -2024-02-19  Troy Figiel  <troy@troyfigiel.com>
> │ │ +2024-02-20  Troy Figiel  <troy@troyfigiel.com>
> │ │  
> │ │     gnu: Add go-github-com-coocood-freecache.
> │ │     * gnu/packages/golang-xyz.scm (go-github-com-coocood-freecache): New variable.

Yes!

> The best thing to do is probably to drop ‘ChangeLog’ generation (maybe
> ‘AUTHORS’ too) and just add a text inviting users to check the Git log.

I think AUTHORS and ChangeLog are amongst the simplest of our worries.
Setting TZ=UTC0 and LC_ALL=en_US.UTF-8 should address these.

> Then I must have stale ‘help2man’ byproducts:
>
> │ ├── guix-1.3.0.57425-80a228/doc/guix-challenge.1
> │ │ @@ -1,11 +1,11 @@
> │ │  .\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.49.2.
> │ │  .TH GUIX "1" "April 2024" "GNU" "User Commands"
> │ │  .SH NAME
> │ │ -guix \- manual page for guix challenge (GNU Guix) 1.3.0.51884-370f8f3
> │ │ +guix \- manual page for guix challenge (GNU Guix) 1.3.0.57425-80a228

That looks like a dependency or parallelism problem.  I made the
auto-clean dependency more strict and clean even more Autotools caching.

> Lots of differences in Info files:
> │ ├── guix-1.3.0.57425-80a228/doc/guix-cookbook.fr.info
> │ │┄ xxd not available in path. Falling back to Python hexlify.
> │ │ @@ -1,6 +1,8216 @@

Yeah, there's also something with the (building of) the cookbooks.  I'm
hoping the strict auto-clean dependency fixes this too.

> Something with PO files not being regenerated (?):
>
> │ ├── guix-1.3.0.57425-80a228/po/packages/en@boldquot.po
> │ │ @@ -1,11 +1,11 @@
> │ │  # English translations for guix package.
> │ │ -# Copyright (C) 2020 the authors of Guix (msgids)
> │ │ +# Copyright (C) 2024 the authors of Guix (msgids)

Hmm.  I've added a naive xgettext.scm wrapper to take care of this.

> Note that in ‘guix shell -CP’ I had no TZ and LC_* variable set and
> /etc/timezone is missing.

Okay, thanks for the hint.  "Of course", you'll have to use something
like

    guix shell -CP -m manifest.scm fontconfig font-ghostscript \
        graphviz imagemagick texlive-bin

to make it succeed now.  When running in a container from a worktree
you'll also have to expose the master .git directory.

I fixed some image generation rules that would silently fail without
graphviz or imagemagic, or when fonts cannot be found.

> That’s it for today!

Thanks, find a V2 soon.

Greetings,
Janneke