Message ID | cover.1731747114.git.runciter@whispers-vpn.org |
---|---|
Headers | show |
Series | Add DICT and FreeDict projects packages | expand |
The build of the package freedict-dictionaries is non-deterministic, at least because the output dictionaries are compressed by the utility dictzip, which includes a timestamp in the compressed file headers. I have not confirmed it yet but I think I could fix it: it would involve a patch to the freedict-tools package, and then output dictionaries of freedict-dictionaries would not be compressed. The maintainers should just tell me if it's worth it to fix non-determinism in this way: we have to carry forward a patch (unless a better idea comes up), and we lose the benefit of compressing the outputs.
On 2024-11-18 06:37, Runciter via Guix-patches via wrote: > The build of the package freedict-dictionaries is non-deterministic, at > least because the output dictionaries are compressed by the utility > dictzip, which includes a timestamp in the compressed file headers. Are you sure there are no options at compression time to force determinism on the archive? Usually, there are some, see tar invocations for instance: (invoke "tar" "cvfa" (string-append this-file ".tar") "--mtime=1" "--owner=root:0" "--group=root:0" ; determinism "--sort=name" ".") The sort is here for determinism too. > > I have not confirmed it yet but I think I could fix it: it would involve > a patch to the freedict-tools package, and then output dictionaries of > freedict-dictionaries would not be compressed. > > The maintainers should just tell me if it's worth it to fix > non-determinism in this way: we have to carry forward a patch (unless a > better idea comes up), and we lose the benefit of compressing the outputs.
"Nicolas Graves" <ngraves@ngraves.fr> writes: > On 2024-11-18 06:37, Runciter via Guix-patches via wrote: > >> The build of the package freedict-dictionaries is non-deterministic, at >> least because the output dictionaries are compressed by the utility >> dictzip, which includes a timestamp in the compressed file headers. > > Are you sure there are no options at compression time to force > determinism on the archive? > I'm as sure as one can be when one has read the man page of dictzip. That is to say, not sure about undocumented features. Independently of dictzip capabilities, one problem actually is having to patch the freedict-tools package if one wants to change anything to the output compression of freedict-dictionaries. From what I found online, FreeDict does not document how to fine-tune or disable dictzip compression within its build system. In the place where it's done at compile-time, I don't see any handles I could use; you can look if you want, it's in the source of the freedict-tools package, the 'install-base' target found in the file mk/dicts.mk. Now that I think about it I figure, IF a patch has to be done, then surely anyway some command-line hack could be inserted into the target from the patch that would make up for the lack of a usable command-line switch in dictzip... It's lame to have to create, setup and maintain a patch, but if it has to be done we might as well enjoy the flexibility. Incidentally, gzip as a subsitute for dictzip is documented to work in dictd, some dictd optimizations would be lost though, I guess gzipped dictionaries may also work in dicod and I also guess, dicod probably does not bother to make detailed optimizations tailored to the dictzip format specifics. This and other considerations makes it potentially worthwhile to experiment a little by creating a dictd service in my system, if that is practical, as well as playing around with the relevant dicod handler. Let me study in those directions until the end of the week and get back to you. Regards,
Version 2 of the patch is a change to the freedict-tools package which is a fix for the non-determinism of dictzip headers. Before compressing .dict files, a shell command involving touch and date sets the file's mtime to the start of the epoch taken on the Greenwich meridian. The build is now repeatable on my machine, and probably on any single machine. As for reproducibility across machines, well, good chance it'll be reproducible: the design intent is that the compressed file headers should be insensitive to the system's configured time zone. This will need to be assured by challenging a substitutes build from a computer which is configured on a timezone different from its substitute server. Runciter