mbox series

[bug#74411,0/4] Add DICT and FreeDict projects packages

Message ID cover.1731747114.git.runciter@whispers-vpn.org
Headers show
Series Add DICT and FreeDict projects packages | expand

Message

Runciter Nov. 17, 2024, 6:06 p.m. UTC
Add the dictd package from the DICT project along with its library dependency
libmaa.

Add the human-written bilingual dictionaries from the FreeDict projects
freedict-dictionaries, with its toolchain dependency freedict-tools.

Runciter (4):
  gnu: Add (gnu packages dictd).
  gnu: Add dictd-1.13.1.
  gnu: Add freedict-tools-0.6.0.
  gnu: Add freedict-dictionaries.

 gnu/local.mk                  |   2 +
 gnu/packages/dictd.scm        |  98 ++++++++++++++++++++++++++++++++
 gnu/packages/dictionaries.scm | 103 +++++++++++++++++++++++++++++++++-
 3 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 gnu/packages/dictd.scm


base-commit: b790db7589858fc77989b4d1f369c52bca6d6e7c

Comments

Runciter Nov. 18, 2024, 6:37 a.m. UTC | #1
The build of the package freedict-dictionaries is non-deterministic, at
least because the output dictionaries are compressed by the utility
dictzip, which includes a timestamp in the compressed file headers.

I have not confirmed it yet but I think I could fix it: it would involve
a patch to the freedict-tools package, and then output dictionaries of
freedict-dictionaries would not be compressed.

The maintainers should just tell me if it's worth it to fix
non-determinism in this way: we have to carry forward a patch (unless a
better idea comes up), and we lose the benefit of compressing the outputs.
Nicolas Graves Nov. 19, 2024, 8:43 a.m. UTC | #2
On 2024-11-18 06:37, Runciter via Guix-patches via wrote:

> The build of the package freedict-dictionaries is non-deterministic, at
> least because the output dictionaries are compressed by the utility
> dictzip, which includes a timestamp in the compressed file headers.

Are you sure there are no options at compression time to force
determinism on the archive?

Usually, there are some, see tar invocations for instance: 
(invoke "tar" "cvfa" (string-append this-file ".tar")
      "--mtime=1" "--owner=root:0" "--group=root:0" ; determinism
      "--sort=name" ".")

The sort is here for determinism too.

>
> I have not confirmed it yet but I think I could fix it: it would involve
> a patch to the freedict-tools package, and then output dictionaries of
> freedict-dictionaries would not be compressed.
>
> The maintainers should just tell me if it's worth it to fix
> non-determinism in this way: we have to carry forward a patch (unless a
> better idea comes up), and we lose the benefit of compressing the outputs.
Runciter Nov. 19, 2024, 3:13 p.m. UTC | #3
"Nicolas Graves" <ngraves@ngraves.fr> writes:

> On 2024-11-18 06:37, Runciter via Guix-patches via wrote:
>
>> The build of the package freedict-dictionaries is non-deterministic, at
>> least because the output dictionaries are compressed by the utility
>> dictzip, which includes a timestamp in the compressed file headers.
>
> Are you sure there are no options at compression time to force
> determinism on the archive?
>

I'm as sure as one can be when one has read the man page of
dictzip. That is to say, not sure about undocumented features.

Independently of dictzip capabilities, one problem actually is having to
patch the freedict-tools package if one wants to change anything to the
output compression of freedict-dictionaries. From what I found online,
FreeDict does not document how to fine-tune or disable dictzip
compression within its build system. In the place where it's done at
compile-time, I don't see any handles I could use; you can look if you
want, it's in the source of the freedict-tools package, the
'install-base' target found in the file mk/dicts.mk.

Now that I think about it I figure, IF a patch has to be done, then
surely anyway some command-line hack could be inserted into the target
from the patch that would make up for the lack of a usable command-line
switch in dictzip... It's lame to have to create, setup and maintain a
patch, but if it has to be done we might as well enjoy the flexibility.

Incidentally, gzip as a subsitute for dictzip is documented to work in
dictd, some dictd optimizations would be lost though, I guess gzipped
dictionaries may also work in dicod and I also guess, dicod probably
does not bother to make detailed optimizations tailored to the dictzip
format specifics. This and other considerations makes it potentially
worthwhile to experiment a little by creating a dictd service in my
system, if that is practical, as well as playing around with the
relevant dicod handler.

Let me study in those directions until the end of the week and get back
to you.

Regards,
Runciter Nov. 21, 2024, 1:15 a.m. UTC | #4
Version 2 of the patch is a change to the freedict-tools package which
is a fix for the non-determinism of dictzip headers.

Before compressing .dict files, a shell command involving touch and date
sets the file's mtime to the start of the epoch taken on the Greenwich
meridian.

The build is now repeatable on my machine, and probably on any single
machine. As for reproducibility across machines, well, good chance it'll
be reproducible: the design intent is that the compressed file headers
should be insensitive to the system's configured time zone. This will
need to be assured by challenging a substitutes build from a computer
which is configured on a timezone different from its substitute server.

Runciter