diff mbox series

[bug#43516,core-updates] packages: Enable multi-threaded xz compression when repacking source.

Message ID 20200919170357.13583-1-maxim.cournoyer@gmail.com
State Accepted
Headers show
Series [bug#43516,core-updates] packages: Enable multi-threaded xz compression when repacking source. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job

Commit Message

Maxim Cournoyer Sept. 19, 2020, 5:03 p.m. UTC
The xz compression is slow; using multiple threads/cores yields a linear
performance improvement.

* guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
by setting the XZ_DEFAULTS environment variable.
---
 guix/packages.scm | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Leo Famulari Sept. 19, 2020, 7:38 p.m. UTC | #1
On Sat, Sep 19, 2020 at 01:03:57PM -0400, Maxim Cournoyer wrote:
> The xz compression is slow; using multiple threads/cores yields a linear
> performance improvement.
> 
> * guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
> by setting the XZ_DEFAULTS environment variable.

We tried this previous but reverted it because the archives were not
bit-reproducible:

https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3e95125e9bd0676d4a9add9105217ad3eaef3ff0

It's really a shame... it would be nice to reduce the time used for XZ
compression. But the bandwidth used to move the results is even more
expensive in terms of time and money, since most people should get
substitutes.
Maxim Cournoyer Sept. 22, 2020, 2 a.m. UTC | #2
Hi Leo!

> On Sat, Sep 19, 2020 at 01:03:57PM -0400, Maxim Cournoyer wrote:
> > The xz compression is slow; using multiple threads/cores yields a linear
> > performance improvement.
> >
> > * guix/packages.scm (patch-and-repack): Ensure xz is invoked with --threads=N
> > by setting the XZ_DEFAULTS environment variable.

> We tried this previous but reverted it because the archives were not
> bit-reproducible:

> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3e95125e9bd0676d4a9add9105217ad3eaef3ff0

Thanks for bringing this to my attention!  I've studied what others have done
about it, and found a solution that seems to work well on the OpenEmbedded
mailing list [0].  Debian uses something similar in their dpkg.

The important point is that xz will produce reproducible results as long as it
operates in either the single thread mode OR the multi-thread mode (we can't
go from one mode to another reproducibly).  So the following v2 patch ensures
we always use --threads=2 at a minimum, forcing the xz code path into
multi-thread operation.  The --memlimit=50% argument limits the RAM use of xz
to at most half of the available memory, which allows xz to reduce the number
of threads used to meet this requirement.

I've rebuilt the world or core-updates to test this and got impressive
results, such as when building the linux-libre source with 24 cores instead of
1:

$ time guix build --source linux-libre --check

With this change, on a 24 cores/32 GiB system: 24 cores used, 2.9 GiB max memory used, 36.76 s.
On master (same machine): 1 core used, 95 MiB max memory used, 4 m 10 s.

[0]  https://patchwork.openembedded.org/patch/170475/
[1]  https://sources.debian.org/src/dpkg/1.19.7/lib/dpkg/compress.c/#L566-L574

> It's really a shame... it would be nice to reduce the time used for XZ
> compression.

Seems we can have our cake and eat it, too!

Maxim
Leo Famulari Sept. 22, 2020, 3:19 p.m. UTC | #3
On Mon, Sep 21, 2020 at 10:00:02PM -0400, Maxim Cournoyer wrote:
> Seems we can have our cake and eat it, too!

Amazing! I don't have time to check it myself but please proceed as you
see fit.
Maxim Cournoyer Oct. 9, 2020, 2:17 a.m. UTC | #4
Leo Famulari <leo@famulari.name> writes:

> On Mon, Sep 21, 2020 at 10:00:02PM -0400, Maxim Cournoyer wrote:
>> Seems we can have our cake and eat it, too!
>
> Amazing! I don't have time to check it myself but please proceed as you
> see fit.

Pushed with commit 5a0997ef7f to core-updates.

Enjoy!

Maxim
diff mbox series

Patch

diff --git a/guix/packages.scm b/guix/packages.scm
index 6598bd3149..678007a807 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -5,6 +5,7 @@ 
 ;;; Copyright © 2016 Alex Kost <alezost@gmail.com>
 ;;; Copyright © 2017, 2019, 2020 Efraim Flashner <efraim@flashner.co.il>
 ;;; Copyright © 2019 Marius Bakke <mbakke@fastmail.com>
+;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -693,6 +694,11 @@  specifies modules in scope when evaluating SNIPPET."
             (setenv "PATH" (string-append #+xz "/bin" ":"
                                           #+decomp "/bin"))
 
+            ;; Enable multi-threaded compression for xz.
+            (setenv "XZ_DEFAULTS" (string-append "--threads="
+                                                 (number->string
+                                                  (parallel-job-count))))
+
             ;; SOURCE may be either a directory or a tarball.
             (if (file-is-directory? #+source)
                 (let* ((store     (%store-directory))