Message ID | cover.1712165977.git.janneke@gnu.org |
---|---|
Headers | show |
Series | Reproducible `make dist' tarball in defiance of Autotools and Gettext | expand |
Hi! Janneke Nieuwenhuizen <janneke@gnu.org> skribis: > The recent XZ-utils <https://www.openwall.com/lists/oss-security/2024/03/29/4> > debacle inspired me to resurrect and finish my patch set for creating a > reproducible source tarball for Guix, i.e., finally have `make dist' be > reproducible (when run from Git). I've been using a version of these patches > in simpler projects for some years now and stole one from Timothy Samplet's > Gash project. Yay, kudos to you and Timothy! > Autotools and Gettext still make it harder than necessary to do reproducible > (responsible?) computing, which is especially sad given the fact that the > Reproducible Builds project recently had their 10th birthday > <https://reproducible-builds.org/_lfs/presentations/2023-05-27-R-B-the-first-10-years/#/>. > > Gettext tooling embeds timestamps found in the file-system, fails to respect > SOURCE_DATE_EPOCH, and lacks options like `--pot-creation-date' so that we > have to resort to SED to fixup. The caching of all sorts of information, in > separate build stages, also doesn't help. Sadness indeed. Hopefully things will improve in the coming weeks, now that there’s an impetus. > To create a reproducible source tarball, having a reproducible build > environment is a prerequitite, so this would have to be recorded too. > Using this patch set, I created a tarball doing something like > > guix pull --commit=1dbe492b993a7629df3b35146ce0272b52913776 > guix shell > bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc && make dist > guix hash guix-1.3.0.57425-80a228.tar.gz > 0mk59ay5k2dxmjni9fx4i8qyfhvlgxbhqzsjpg2pbw381nskkxbj I applied the whole series on top of df64d48e6f9f648044aa5279c045b8d6f7bee604 (the ‘base-commit’ at the bottom of your message). Thus I got the same content as you but with a different commit ID. “make dist” gave me guix-1.3.0.57425-9f4a4a.tar.gz. The name indeed corresponds to the tip of my tree: --8<---------------cut here---------------start------------->8--- $ guix hash guix-1.3.0.57425-9f4a4a.tar.gz 0z3c4f8g6rsi9n0j8cwzwvw4bc59srg6bl3jj8yi60hbr9vrz5ql $ git log |head commit 9f4a4adfa778b281b794b61014e06dad98b6c945 Author: Janneke Nieuwenhuizen <janneke@gnu.org> Date: Wed Apr 3 21:11:09 2024 +0200 maint: Ensure generated file reproducibility for dist. * doc/local.mk (override $(srcdir)/doc/stamp-vti): New target override. (doc-clean, man-clean): New targets. (auto-clean): Depend on it in new target. (DIST_CONFIGURE_FLAGS): New variable. --8<---------------cut here---------------end--------------->8--- But as a result, I get a different hash, and since the directory in the tarball has a different name, ‘diffoscope’ isn’t very helpful. There’s at least one relevant difference in the gzip metadata: --8<---------------cut here---------------start------------->8--- --- guix-1.3.0.57425-9f4a4a.tar.gz +++ /tmp/guix-1.3.0.57425-80a228.tar.gz ├── filetype from file(1) │ @@ -1 +1 @@ │ -gzip compressed data, from Unix, original size modulo 2^32 208138240 gzip compressed data, rese rved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT) │ +gzip compressed data, from Unix, original size modulo 2^32 222504960 gzip compressed data, rese rved method, ASCII, has CRC, was "", has comment, encrypted, from FAT filesystem (MS-DOS, OS/2, NT --8<---------------cut here---------------end--------------->8--- (Your tarball has a CRC and comment, mine doesn’t.) Maybe we’ll have to iterate once you’ve pushed a first version, so we can truly build the same thing. Or we should push the branch somewhere (or use the one from <https://data.qa.guix.gnu.org/> once it’s been created). Thanks! Ludo’.
Ludovic Courtès writes: Hello, > Janneke Nieuwenhuizen <janneke@gnu.org> skribis: > >> The recent XZ-utils <https://www.openwall.com/lists/oss-security/2024/03/29/4> >> debacle inspired me to resurrect and finish my patch set for creating a >> reproducible source tarball for Guix, i.e., finally have `make dist' be >> reproducible (when run from Git). I've been using a version of these patches >> in simpler projects for some years now and stole one from Timothy Samplet's >> Gash project. > > Yay, kudos to you and Timothy! \o/ >> Autotools and Gettext still make it harder than necessary to do reproducible >> (responsible?) computing, which is especially sad given the fact that the >> Reproducible Builds project recently had their 10th birthday >> <https://reproducible-builds.org/_lfs/presentations/2023-05-27-R-B-the-first-10-years/#/>. >> >> Gettext tooling embeds timestamps found in the file-system, fails to respect >> SOURCE_DATE_EPOCH, and lacks options like `--pot-creation-date' so that we >> have to resort to SED to fixup. The caching of all sorts of information, in >> separate build stages, also doesn't help. > > Sadness indeed. Hopefully things will improve in the coming weeks, now > that there’s an impetus. Yes, that would be nice. With more people joining the effort, it could be fixed brilliantly, in no time :) >> To create a reproducible source tarball, having a reproducible build >> environment is a prerequitite, so this would have to be recorded too. >> Using this patch set, I created a tarball doing something like >> >> guix pull --commit=1dbe492b993a7629df3b35146ce0272b52913776 >> guix shell >> bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc && make dist >> guix hash guix-1.3.0.57425-80a228.tar.gz >> 0mk59ay5k2dxmjni9fx4i8qyfhvlgxbhqzsjpg2pbw381nskkxbj > > I applied the whole series on top of > df64d48e6f9f648044aa5279c045b8d6f7bee604 (the ‘base-commit’ at the > bottom of your message). Thus I got the same content as you but with a > different commit ID. Yeah..., that's why I pushed `wip-tarball'. We even look at the committer's timestamp (not author, as that could be quite old). > “make dist” gave me guix-1.3.0.57425-9f4a4a.tar.gz. The name indeed > corresponds to the tip of my tree: [..] > But as a result, I get a different hash, and since the directory in the > tarball has a different name, ‘diffoscope’ isn’t very helpful. > > There’s at least one relevant difference in the gzip metadata: > > --- guix-1.3.0.57425-9f4a4a.tar.gz > +++ /tmp/guix-1.3.0.57425-80a228.tar.gz > ├── filetype from file(1) > │ @@ -1 +1 @@ > │ -gzip compressed data, from Unix, original size modulo 2^32 208138240 gzip compressed data, rese > rved method, ASCII, extra field, encrypted, from FAT filesystem (MS-DOS, OS/2, NT) > │ +gzip compressed data, from Unix, original size modulo 2^32 222504960 gzip compressed data, rese > rved method, ASCII, has CRC, was "", has comment, encrypted, from FAT filesystem (MS-DOS, OS/2, NT > > > (Your tarball has a CRC and comment, mine doesn’t.) I believe this is a red herring. I saw this all day whenever one file had a difference... > Maybe we’ll have to iterate once you’ve pushed a first version, so we > can truly build the same thing. Or we should push the branch somewhere I boldly pushed `origin/wip-tarball', you may try that :) Be sure to git diff between your tip and origin/wip-tarball to ascertain I didn't place easter eggs (just kidding). > (or use the one from <https://data.qa.guix.gnu.org/> once it’s been > created). Thanks a lot for trying! Greetings, Janneke
Janneke Nieuwenhuizen <janneke@gnu.org> skribis:
> I boldly pushed `origin/wip-tarball', you may try that :)
Silly me. 🤦
First try: I wasn’t running in a UTF-8 locale (in ‘guix shell -CP’) so I
got things like this:
--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/AUTHORS
│ │ @@ -9,35 +9,35 @@
│ │
│ │ 10255 Ricardo Wurmus <rekado@elephly.net>
│ │ 7293 Nicolas Goaziou <mail@nicolasgoaziou.fr>
│ │ 5991 Efraim Flashner <efraim@flashner.co.il>
│ │ 4033 Maxim Cournoyer <maxim.cournoyer@gmail.com>
│ │ 3124 Tobias Geerinckx-Rice <me@tobias.gr>
│ │ 2356 Marius Bakke <marius@gnu.org>
│ │ - 2306 Ludovic Court??s <ludo@gnu.org>
│ │ + 2306 Ludovic Courtès <ludo@gnu.org>
--8<---------------cut here---------------end--------------->8---
Then there’s prolly a timezone issue with the generated ChangeLog:
--8<---------------cut here---------------start------------->8---
│ │ -2024-02-19 Troy Figiel <troy@troyfigiel.com>
│ │ +2024-02-20 Troy Figiel <troy@troyfigiel.com>
│ │
│ │ gnu: Add go-github-com-coocood-freecache.
│ │ * gnu/packages/golang-xyz.scm (go-github-com-coocood-freecache): New variable.
--8<---------------cut here---------------end--------------->8---
The best thing to do is probably to drop ‘ChangeLog’ generation (maybe
‘AUTHORS’ too) and just add a text inviting users to check the Git log.
Then I must have stale ‘help2man’ byproducts:
--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/doc/guix-challenge.1
│ │ @@ -1,11 +1,11 @@
│ │ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
│ │ .TH GUIX "1" "April 2024" "GNU" "User Commands"
│ │ .SH NAME
│ │ -guix \- manual page for guix challenge (GNU Guix) 1.3.0.51884-370f8f3
│ │ +guix \- manual page for guix challenge (GNU Guix) 1.3.0.57425-80a228
--8<---------------cut here---------------end--------------->8---
Lots of differences in Info files:
--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/doc/guix-cookbook.fr.info
│ │┄ xxd not available in path. Falling back to Python hexlify.
│ │ @@ -1,6 +1,8216 @@
│ │ 5468697320697320677569782d636f6f6b626f6f6b2e66722e696e666f2c2070
│ │ 726f6475636564206279206d616b65696e666f2076657273696f6e20362e3820
│ │ -66726f6d0a677569782d636f6f6b626f6f6b2e66722e746578692e0a0a0a1f0a
│ │ -546167205461626c653a0a1f0a456e6420546167205461626c650a0a1f0a4c6f
│ │ -63616c205661726961626c65733a0a636f64696e673a207574662d380a456e64
│ │ -3a0a
│ │ +66726f6d0a677569782d636f6f6b626f6f6b2e66722e746578692e0a0a436f70
│ │ +79726967687420c2a920323031392c2032303232205269636172646f20577572
│ │ +6d75730a436f7079726967687420c2a920323031392045667261696d20466c61
│ │ +73686e65720a436f7079726967687420c2a9203230313920506965727265204e
--8<---------------cut here---------------end--------------->8---
Something with PO files not being regenerated (?):
--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/po/packages/en@boldquot.po
│ │ @@ -1,11 +1,11 @@
│ │ # English translations for guix package.
│ │ -# Copyright (C) 2020 the authors of Guix (msgids)
│ │ +# Copyright (C) 2024 the authors of Guix (msgids)
│ │ # This file is distributed under the same license as the guix package.
│ │ -# Automatically generated, 2020.
│ │ +# Automatically generated, 2024.
│ │ #
│ │ # All this catalog "translates" are quotation characters.
│ │ # The msgids must be ASCII and therefore cannot contain real quotation
│ │ # characters, only substitutes like grave accent (0x60), apostrophe (0x27)
│ │ # and double quote (0x22). These substitutes look strange; see
│ │ # http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
│ │ #
│ │ @@ -26,118 +26,85 @@
│ │ # transliterated to 0x22.
│ │ #
│ │ # This catalog furthermore displays the text between the quotation marks in
│ │ # bold face, assuming the VT100/XTerm escape sequences.
│ │ #
│ │ msgid ""
│ │ msgstr ""
│ │ -"Project-Id-Version: guix 1.2.0\n"
│ │ +"Project-Id-Version: guix 1.3.0.57419-5a2b40\n"
│ │ "Report-Msgid-Bugs-To: bug-guix@gnu.org\n"
│ │ -"POT-Creation-Date: 2020-11-22 20:33+0100\n"
│ │ -"PO-Revision-Date: 2020-11-22 20:33+0100\n"
│ │ +"POT-Creation-Date: 2024-04-03 09:04+0200\n"
│ │ +"PO-Revision-Date: 2024-04-03 09:04+0200\n"
--8<---------------cut here---------------end--------------->8---
and possibly a timezone issue for POT files:
--8<---------------cut here---------------start------------->8---
│ ├── guix-1.3.0.57425-80a228/po/packages/guix-packages.pot
│ │ @@ -4,15 +4,15 @@
│ │ # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
│ │ #
│ │ #, fuzzy
│ │ msgid ""
│ │ msgstr ""
│ │ "Project-Id-Version: guix 1.3.0.57425-80a228\n"
│ │ "Report-Msgid-Bugs-To: bug-guix@gnu.org\n"
│ │ -"POT-Creation-Date: 2024-04-03 21:14+0000\n"
│ │ +"POT-Creation-Date: 2024-04-03 20:56+0200\n"
│ │ "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
--8<---------------cut here---------------end--------------->8---
Note that in ‘guix shell -CP’ I had no TZ and LC_* variable set and
/etc/timezone is missing.
That’s it for today!
Ludo’.
Ludovic Courtès writes: Hello! > Janneke Nieuwenhuizen <janneke@gnu.org> skribis: > >> I boldly pushed `origin/wip-tarball', you may try that :) > > Silly me. 🤦 Yeah, when you see it, it's obvious :) > First try: I wasn’t running in a UTF-8 locale (in ‘guix shell -CP’) so I > got things like this: > > │ ├── guix-1.3.0.57425-80a228/AUTHORS [..] > │ │ - 2306 Ludovic Court??s <ludo@gnu.org> > │ │ + 2306 Ludovic Courtès <ludo@gnu.org> Ah, not good. > Then there’s prolly a timezone issue with the generated ChangeLog: > > │ │ -2024-02-19 Troy Figiel <troy@troyfigiel.com> > │ │ +2024-02-20 Troy Figiel <troy@troyfigiel.com> > │ │ > │ │ gnu: Add go-github-com-coocood-freecache. > │ │ * gnu/packages/golang-xyz.scm (go-github-com-coocood-freecache): New variable. Yes! > The best thing to do is probably to drop ‘ChangeLog’ generation (maybe > ‘AUTHORS’ too) and just add a text inviting users to check the Git log. I think AUTHORS and ChangeLog are amongst the simplest of our worries. Setting TZ=UTC0 and LC_ALL=en_US.UTF-8 should address these. > Then I must have stale ‘help2man’ byproducts: > > │ ├── guix-1.3.0.57425-80a228/doc/guix-challenge.1 > │ │ @@ -1,11 +1,11 @@ > │ │ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. > │ │ .TH GUIX "1" "April 2024" "GNU" "User Commands" > │ │ .SH NAME > │ │ -guix \- manual page for guix challenge (GNU Guix) 1.3.0.51884-370f8f3 > │ │ +guix \- manual page for guix challenge (GNU Guix) 1.3.0.57425-80a228 That looks like a dependency or parallelism problem. I made the auto-clean dependency more strict and clean even more Autotools caching. > Lots of differences in Info files: > │ ├── guix-1.3.0.57425-80a228/doc/guix-cookbook.fr.info > │ │┄ xxd not available in path. Falling back to Python hexlify. > │ │ @@ -1,6 +1,8216 @@ Yeah, there's also something with the (building of) the cookbooks. I'm hoping the strict auto-clean dependency fixes this too. > Something with PO files not being regenerated (?): > > │ ├── guix-1.3.0.57425-80a228/po/packages/en@boldquot.po > │ │ @@ -1,11 +1,11 @@ > │ │ # English translations for guix package. > │ │ -# Copyright (C) 2020 the authors of Guix (msgids) > │ │ +# Copyright (C) 2024 the authors of Guix (msgids) Hmm. I've added a naive xgettext.scm wrapper to take care of this. > Note that in ‘guix shell -CP’ I had no TZ and LC_* variable set and > /etc/timezone is missing. Okay, thanks for the hint. "Of course", you'll have to use something like guix shell -CP -m manifest.scm fontconfig font-ghostscript \ graphviz imagemagick texlive-bin to make it succeed now. When running in a container from a worktree you'll also have to expose the master .git directory. I fixed some image generation rules that would silently fail without graphviz or imagemagic, or when fonts cannot be found. > That’s it for today! Thanks, find a V2 soon. Greetings, Janneke