[bug#33643] gnu-build-system: Enable xz to decompress in parallel.

Message ID 20181206075615.4637-1-mail@cbaines.net
State Accepted
Delegated to: Christopher Baines
Headers show
Series [bug#33643] gnu-build-system: Enable xz to decompress in parallel. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/applying patch success Successfully applied

Commit Message

Christopher Baines Dec. 6, 2018, 7:56 a.m. UTC
It can take a little while to decompress some packages with large xz
compressed source tar files. xz includes support for parallelism, so enable
this using the parallel job count for the overall derivation.

* guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option
to xz to enable it to work in parallel if appropriate.
---
 guix/build/gnu-build-system.scm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Christopher Baines Dec. 6, 2018, 8:08 a.m. UTC | #1
Christopher Baines <mail@cbaines.net> writes:

> It can take a little while to decompress some packages with large xz
> compressed source tar files. xz includes support for parallelism, so enable
> this using the parallel job count for the overall derivation.

I'm guessing this is only suitable for core-updates, as it'll cause a
lot of rebuilds. I'm also not sure if it's worth it, but it does seem to
make building some packages at least start faster.
Leo Famulari Dec. 6, 2018, 8:13 a.m. UTC | #2
On Thu, Dec 06, 2018 at 07:56:15AM +0000, Christopher Baines wrote:
> It can take a little while to decompress some packages with large xz
> compressed source tar files. xz includes support for parallelism, so enable
> this using the parallel job count for the overall derivation.

The xz man page says that multi-threaded decompression isn't implemented
yet, unfortunately.
Christopher Baines Dec. 6, 2018, 7:38 p.m. UTC | #3
Leo Famulari <leo@famulari.name> writes:

> On Thu, Dec 06, 2018 at 07:56:15AM +0000, Christopher Baines wrote:
>> It can take a little while to decompress some packages with large xz
>> compressed source tar files. xz includes support for parallelism, so enable
>> this using the parallel job count for the overall derivation.
>
> The xz man page says that multi-threaded decompression isn't implemented
> yet, unfortunately.

Ah, interesting. Having a read myself now, it also says it:

  "will work on files that contain multiple blocks with size information
   in block headers.  All files compressed in multi-threaded mode meet
   this condition, but files compressed in single- threaded mode don't
   even if --block-size=size is used."

So, if -T was used to compress the data, then it sounds like it'll work
to decompress it. I guess this adds a little more uncertainty to the
benefit of this change, as the impact is dependent on the way the source
data is compressed.
Leo Famulari Dec. 6, 2018, 9:06 p.m. UTC | #4
On Thu, Dec 06, 2018 at 07:38:21PM +0000, Christopher Baines wrote:
> So, if -T was used to compress the data, then it sounds like it'll work
> to decompress it. I guess this adds a little more uncertainty to the
> benefit of this change, as the impact is dependent on the way the source
> data is compressed.

Right. When parallel decompression is implemented, I think we should
enable it in order to get some benefit from upstream tarballs that may
have been created with multi-threaded compression. 

However, we probably won't be able to use the parallel compression
within Guix because it is apparently not deterministic:

<https://bugs.gnu.org/31015>
Efraim Flashner Dec. 9, 2018, 2:32 p.m. UTC | #5
On Thu, Dec 06, 2018 at 04:06:53PM -0500, Leo Famulari wrote:
> On Thu, Dec 06, 2018 at 07:38:21PM +0000, Christopher Baines wrote:
> > So, if -T was used to compress the data, then it sounds like it'll work
> > to decompress it. I guess this adds a little more uncertainty to the
> > benefit of this change, as the impact is dependent on the way the source
> > data is compressed.
> 
> Right. When parallel decompression is implemented, I think we should
> enable it in order to get some benefit from upstream tarballs that may
> have been created with multi-threaded compression. 
> 
> However, we probably won't be able to use the parallel compression
> within Guix because it is apparently not deterministic:
> 
> <https://bugs.gnu.org/31015>

If the tarball is compressed in parallel then it can be decompressed in
parallel.

As for compressing in parallel, it *might work* to pass it through our
non-bootstrap tar for 'tar --sort=name' and then pass it through xz
-T(pick-a-num).
Leo Famulari Dec. 10, 2018, 4:24 p.m. UTC | #6
On Sun, Dec 09, 2018 at 04:32:01PM +0200, Efraim Flashner wrote:
> If the tarball is compressed in parallel then it can be decompressed in
> parallel.

The xz documentation says that parallel decompression is not
implemented? Is that no longer the case?

> As for compressing in parallel, it *might work* to pass it through our
> non-bootstrap tar for 'tar --sort=name' and then pass it through xz
> -T(pick-a-num).

That could be helpful!
Efraim Flashner Dec. 10, 2018, 6:48 p.m. UTC | #7
On Mon, Dec 10, 2018 at 11:24:29AM -0500, Leo Famulari wrote:
> On Sun, Dec 09, 2018 at 04:32:01PM +0200, Efraim Flashner wrote:
> > If the tarball is compressed in parallel then it can be decompressed in
> > parallel.
> 
> The xz documentation says that parallel decompression is not
> implemented? Is that no longer the case?

Looks like I got caught up with the original release notes.
https://git.tukaani.org/?p=xz.git;a=blob;f=NEWS;hb=HEAD#l94
Looks like it's specifically only compression.

> 
> > As for compressing in parallel, it *might work* to pass it through our
> > non-bootstrap tar for 'tar --sort=name' and then pass it through xz
> > -T(pick-a-num).
> 
> That could be helpful!
Christopher Baines May 13, 2020, 6:20 p.m. UTC | #8
Christopher Baines <mail@cbaines.net> writes:

> It can take a little while to decompress some packages with large xz
> compressed source tar files. xz includes support for parallelism, so enable
> this using the parallel job count for the overall derivation.
>
> * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option
> to xz to enable it to work in parallel if appropriate.
> ---
>  guix/build/gnu-build-system.scm | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
> index e5f3197b0..9d11e5b1e 100644
> --- a/guix/build/gnu-build-system.scm
> +++ b/guix/build/gnu-build-system.scm
> @@ -147,7 +147,7 @@ chance to be set."
>                locale (strerror (system-error-errno args)))
>        #t)))
>
> -(define* (unpack #:key source #:allow-other-keys)
> +(define* (unpack #:key source parallel-build? #:allow-other-keys)
>    "Unpack SOURCE in the working directory, and change directory within the
>  source.  When SOURCE is a directory, copy it in a sub-directory of the current
>  working directory."
> @@ -161,6 +161,10 @@ working directory."
>          (copy-recursively source "."
>                            #:keep-mtime? #t))
>        (begin
> +        (when parallel-build?
> +          (setenv "XZ_OPT"
> +                  (format #f "-T~d" (parallel-job-count))))
> +
>          (if (string-suffix? ".zip" source)
>              (invoke "unzip" source)
>              (invoke "tar" "xvf" source))

It's been a long long while, but now that core-updates has recently been
merged, I'd like to try and take a look at this again.

I think the consensus was that this will only help for xz compressed
files where they have been compressed in parallel. I think it's still
worth doing though, as some of the big xz files that need decompressing
have been compressed in parallel, and this will speed up the builds when
multiple cores are available.

Thanks,

Chris
Efraim Flashner May 13, 2020, 7:07 p.m. UTC | #9
On Wed, May 13, 2020 at 07:20:08PM +0100, Christopher Baines wrote:
> 
> Christopher Baines <mail@cbaines.net> writes:
> 
> > It can take a little while to decompress some packages with large xz
> > compressed source tar files. xz includes support for parallelism, so enable
> > this using the parallel job count for the overall derivation.
> >
> > * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option
> > to xz to enable it to work in parallel if appropriate.
> > ---
> >  guix/build/gnu-build-system.scm | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
> > index e5f3197b0..9d11e5b1e 100644
> > --- a/guix/build/gnu-build-system.scm
> > +++ b/guix/build/gnu-build-system.scm
> > @@ -147,7 +147,7 @@ chance to be set."
> >                locale (strerror (system-error-errno args)))
> >        #t)))
> >
> > -(define* (unpack #:key source #:allow-other-keys)
> > +(define* (unpack #:key source parallel-build? #:allow-other-keys)
> >    "Unpack SOURCE in the working directory, and change directory within the
> >  source.  When SOURCE is a directory, copy it in a sub-directory of the current
> >  working directory."
> > @@ -161,6 +161,10 @@ working directory."
> >          (copy-recursively source "."
> >                            #:keep-mtime? #t))
> >        (begin
> > +        (when parallel-build?
> > +          (setenv "XZ_OPT"
> > +                  (format #f "-T~d" (parallel-job-count))))
> > +
> >          (if (string-suffix? ".zip" source)
> >              (invoke "unzip" source)
> >              (invoke "tar" "xvf" source))
> 
> It's been a long long while, but now that core-updates has recently been
> merged, I'd like to try and take a look at this again.
> 
> I think the consensus was that this will only help for xz compressed
> files where they have been compressed in parallel. I think it's still
> worth doing though, as some of the big xz files that need decompressing
> have been compressed in parallel, and this will speed up the builds when
> multiple cores are available.
> 
> Thanks,
> 
> Chris

I thought the last time we looked into this we figured out that there
was a mistake in release notes or something and that parallel
decompression isn't actually supported.
Christopher Baines May 14, 2020, 7:37 a.m. UTC | #10
Efraim Flashner <efraim@flashner.co.il> writes:

> On Wed, May 13, 2020 at 07:20:08PM +0100, Christopher Baines wrote:
>>
>> Christopher Baines <mail@cbaines.net> writes:
>>
>> > It can take a little while to decompress some packages with large xz
>> > compressed source tar files. xz includes support for parallelism, so enable
>> > this using the parallel job count for the overall derivation.
>> >
>> > * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option
>> > to xz to enable it to work in parallel if appropriate.
>> > ---
>> >  guix/build/gnu-build-system.scm | 6 +++++-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
>> > index e5f3197b0..9d11e5b1e 100644
>> > --- a/guix/build/gnu-build-system.scm
>> > +++ b/guix/build/gnu-build-system.scm
>> > @@ -147,7 +147,7 @@ chance to be set."
>> >                locale (strerror (system-error-errno args)))
>> >        #t)))
>> >
>> > -(define* (unpack #:key source #:allow-other-keys)
>> > +(define* (unpack #:key source parallel-build? #:allow-other-keys)
>> >    "Unpack SOURCE in the working directory, and change directory within the
>> >  source.  When SOURCE is a directory, copy it in a sub-directory of the current
>> >  working directory."
>> > @@ -161,6 +161,10 @@ working directory."
>> >          (copy-recursively source "."
>> >                            #:keep-mtime? #t))
>> >        (begin
>> > +        (when parallel-build?
>> > +          (setenv "XZ_OPT"
>> > +                  (format #f "-T~d" (parallel-job-count))))
>> > +
>> >          (if (string-suffix? ".zip" source)
>> >              (invoke "unzip" source)
>> >              (invoke "tar" "xvf" source))
>>
>> It's been a long long while, but now that core-updates has recently been
>> merged, I'd like to try and take a look at this again.
>>
>> I think the consensus was that this will only help for xz compressed
>> files where they have been compressed in parallel. I think it's still
>> worth doing though, as some of the big xz files that need decompressing
>> have been compressed in parallel, and this will speed up the builds when
>> multiple cores are available.
>>
>> Thanks,
>>
>> Chris
>
> I thought the last time we looked into this we figured out that there
> was a mistake in release notes or something and that parallel
> decompression isn't actually supported.

Hmm, I had a look to see if I could find some examples of where this
would apply, but I couldn't find any xz archives that we use in Guix
where it's been compressed in a way that allows multithreaded
decompression...

I'm pretty sure I had some examples before, but maybe somethings changed
in the intervening year.

Anyway, if I discover this again, I'll actually make a note of where
it's applicable.

Patch

diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index e5f3197b0..9d11e5b1e 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -147,7 +147,7 @@  chance to be set."
               locale (strerror (system-error-errno args)))
       #t)))
 
-(define* (unpack #:key source #:allow-other-keys)
+(define* (unpack #:key source parallel-build? #:allow-other-keys)
   "Unpack SOURCE in the working directory, and change directory within the
 source.  When SOURCE is a directory, copy it in a sub-directory of the current
 working directory."
@@ -161,6 +161,10 @@  working directory."
         (copy-recursively source "."
                           #:keep-mtime? #t))
       (begin
+        (when parallel-build?
+          (setenv "XZ_OPT"
+                  (format #f "-T~d" (parallel-job-count))))
+
         (if (string-suffix? ".zip" source)
             (invoke "unzip" source)
             (invoke "tar" "xvf" source))