Message ID | 20181206075615.4637-1-mail@cbaines.net |
---|---|
State | Accepted |
Delegated to: | Christopher Baines |
Headers | show |
Series | [bug#33643] gnu-build-system: Enable xz to decompress in parallel. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/applying patch | success | Successfully applied |
Christopher Baines <mail@cbaines.net> writes: > It can take a little while to decompress some packages with large xz > compressed source tar files. xz includes support for parallelism, so enable > this using the parallel job count for the overall derivation. I'm guessing this is only suitable for core-updates, as it'll cause a lot of rebuilds. I'm also not sure if it's worth it, but it does seem to make building some packages at least start faster.
On Thu, Dec 06, 2018 at 07:56:15AM +0000, Christopher Baines wrote: > It can take a little while to decompress some packages with large xz > compressed source tar files. xz includes support for parallelism, so enable > this using the parallel job count for the overall derivation. The xz man page says that multi-threaded decompression isn't implemented yet, unfortunately.
Leo Famulari <leo@famulari.name> writes: > On Thu, Dec 06, 2018 at 07:56:15AM +0000, Christopher Baines wrote: >> It can take a little while to decompress some packages with large xz >> compressed source tar files. xz includes support for parallelism, so enable >> this using the parallel job count for the overall derivation. > > The xz man page says that multi-threaded decompression isn't implemented > yet, unfortunately. Ah, interesting. Having a read myself now, it also says it: "will work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single- threaded mode don't even if --block-size=size is used." So, if -T was used to compress the data, then it sounds like it'll work to decompress it. I guess this adds a little more uncertainty to the benefit of this change, as the impact is dependent on the way the source data is compressed.
On Thu, Dec 06, 2018 at 07:38:21PM +0000, Christopher Baines wrote: > So, if -T was used to compress the data, then it sounds like it'll work > to decompress it. I guess this adds a little more uncertainty to the > benefit of this change, as the impact is dependent on the way the source > data is compressed. Right. When parallel decompression is implemented, I think we should enable it in order to get some benefit from upstream tarballs that may have been created with multi-threaded compression. However, we probably won't be able to use the parallel compression within Guix because it is apparently not deterministic: <https://bugs.gnu.org/31015>
On Thu, Dec 06, 2018 at 04:06:53PM -0500, Leo Famulari wrote: > On Thu, Dec 06, 2018 at 07:38:21PM +0000, Christopher Baines wrote: > > So, if -T was used to compress the data, then it sounds like it'll work > > to decompress it. I guess this adds a little more uncertainty to the > > benefit of this change, as the impact is dependent on the way the source > > data is compressed. > > Right. When parallel decompression is implemented, I think we should > enable it in order to get some benefit from upstream tarballs that may > have been created with multi-threaded compression. > > However, we probably won't be able to use the parallel compression > within Guix because it is apparently not deterministic: > > <https://bugs.gnu.org/31015> If the tarball is compressed in parallel then it can be decompressed in parallel. As for compressing in parallel, it *might work* to pass it through our non-bootstrap tar for 'tar --sort=name' and then pass it through xz -T(pick-a-num).
On Sun, Dec 09, 2018 at 04:32:01PM +0200, Efraim Flashner wrote: > If the tarball is compressed in parallel then it can be decompressed in > parallel. The xz documentation says that parallel decompression is not implemented? Is that no longer the case? > As for compressing in parallel, it *might work* to pass it through our > non-bootstrap tar for 'tar --sort=name' and then pass it through xz > -T(pick-a-num). That could be helpful!
On Mon, Dec 10, 2018 at 11:24:29AM -0500, Leo Famulari wrote: > On Sun, Dec 09, 2018 at 04:32:01PM +0200, Efraim Flashner wrote: > > If the tarball is compressed in parallel then it can be decompressed in > > parallel. > > The xz documentation says that parallel decompression is not > implemented? Is that no longer the case? Looks like I got caught up with the original release notes. https://git.tukaani.org/?p=xz.git;a=blob;f=NEWS;hb=HEAD#l94 Looks like it's specifically only compression. > > > As for compressing in parallel, it *might work* to pass it through our > > non-bootstrap tar for 'tar --sort=name' and then pass it through xz > > -T(pick-a-num). > > That could be helpful!
Christopher Baines <mail@cbaines.net> writes: > It can take a little while to decompress some packages with large xz > compressed source tar files. xz includes support for parallelism, so enable > this using the parallel job count for the overall derivation. > > * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option > to xz to enable it to work in parallel if appropriate. > --- > guix/build/gnu-build-system.scm | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm > index e5f3197b0..9d11e5b1e 100644 > --- a/guix/build/gnu-build-system.scm > +++ b/guix/build/gnu-build-system.scm > @@ -147,7 +147,7 @@ chance to be set." > locale (strerror (system-error-errno args))) > #t))) > > -(define* (unpack #:key source #:allow-other-keys) > +(define* (unpack #:key source parallel-build? #:allow-other-keys) > "Unpack SOURCE in the working directory, and change directory within the > source. When SOURCE is a directory, copy it in a sub-directory of the current > working directory." > @@ -161,6 +161,10 @@ working directory." > (copy-recursively source "." > #:keep-mtime? #t)) > (begin > + (when parallel-build? > + (setenv "XZ_OPT" > + (format #f "-T~d" (parallel-job-count)))) > + > (if (string-suffix? ".zip" source) > (invoke "unzip" source) > (invoke "tar" "xvf" source)) It's been a long long while, but now that core-updates has recently been merged, I'd like to try and take a look at this again. I think the consensus was that this will only help for xz compressed files where they have been compressed in parallel. I think it's still worth doing though, as some of the big xz files that need decompressing have been compressed in parallel, and this will speed up the builds when multiple cores are available. Thanks, Chris
On Wed, May 13, 2020 at 07:20:08PM +0100, Christopher Baines wrote: > > Christopher Baines <mail@cbaines.net> writes: > > > It can take a little while to decompress some packages with large xz > > compressed source tar files. xz includes support for parallelism, so enable > > this using the parallel job count for the overall derivation. > > > > * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option > > to xz to enable it to work in parallel if appropriate. > > --- > > guix/build/gnu-build-system.scm | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm > > index e5f3197b0..9d11e5b1e 100644 > > --- a/guix/build/gnu-build-system.scm > > +++ b/guix/build/gnu-build-system.scm > > @@ -147,7 +147,7 @@ chance to be set." > > locale (strerror (system-error-errno args))) > > #t))) > > > > -(define* (unpack #:key source #:allow-other-keys) > > +(define* (unpack #:key source parallel-build? #:allow-other-keys) > > "Unpack SOURCE in the working directory, and change directory within the > > source. When SOURCE is a directory, copy it in a sub-directory of the current > > working directory." > > @@ -161,6 +161,10 @@ working directory." > > (copy-recursively source "." > > #:keep-mtime? #t)) > > (begin > > + (when parallel-build? > > + (setenv "XZ_OPT" > > + (format #f "-T~d" (parallel-job-count)))) > > + > > (if (string-suffix? ".zip" source) > > (invoke "unzip" source) > > (invoke "tar" "xvf" source)) > > It's been a long long while, but now that core-updates has recently been > merged, I'd like to try and take a look at this again. > > I think the consensus was that this will only help for xz compressed > files where they have been compressed in parallel. I think it's still > worth doing though, as some of the big xz files that need decompressing > have been compressed in parallel, and this will speed up the builds when > multiple cores are available. > > Thanks, > > Chris I thought the last time we looked into this we figured out that there was a mistake in release notes or something and that parallel decompression isn't actually supported.
Efraim Flashner <efraim@flashner.co.il> writes: > On Wed, May 13, 2020 at 07:20:08PM +0100, Christopher Baines wrote: >> >> Christopher Baines <mail@cbaines.net> writes: >> >> > It can take a little while to decompress some packages with large xz >> > compressed source tar files. xz includes support for parallelism, so enable >> > this using the parallel job count for the overall derivation. >> > >> > * guix/build/gnu-build-system.scm (unpack): Set XZ_OPT to pass the -T option >> > to xz to enable it to work in parallel if appropriate. >> > --- >> > guix/build/gnu-build-system.scm | 6 +++++- >> > 1 file changed, 5 insertions(+), 1 deletion(-) >> > >> > diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm >> > index e5f3197b0..9d11e5b1e 100644 >> > --- a/guix/build/gnu-build-system.scm >> > +++ b/guix/build/gnu-build-system.scm >> > @@ -147,7 +147,7 @@ chance to be set." >> > locale (strerror (system-error-errno args))) >> > #t))) >> > >> > -(define* (unpack #:key source #:allow-other-keys) >> > +(define* (unpack #:key source parallel-build? #:allow-other-keys) >> > "Unpack SOURCE in the working directory, and change directory within the >> > source. When SOURCE is a directory, copy it in a sub-directory of the current >> > working directory." >> > @@ -161,6 +161,10 @@ working directory." >> > (copy-recursively source "." >> > #:keep-mtime? #t)) >> > (begin >> > + (when parallel-build? >> > + (setenv "XZ_OPT" >> > + (format #f "-T~d" (parallel-job-count)))) >> > + >> > (if (string-suffix? ".zip" source) >> > (invoke "unzip" source) >> > (invoke "tar" "xvf" source)) >> >> It's been a long long while, but now that core-updates has recently been >> merged, I'd like to try and take a look at this again. >> >> I think the consensus was that this will only help for xz compressed >> files where they have been compressed in parallel. I think it's still >> worth doing though, as some of the big xz files that need decompressing >> have been compressed in parallel, and this will speed up the builds when >> multiple cores are available. >> >> Thanks, >> >> Chris > > I thought the last time we looked into this we figured out that there > was a mistake in release notes or something and that parallel > decompression isn't actually supported. Hmm, I had a look to see if I could find some examples of where this would apply, but I couldn't find any xz archives that we use in Guix where it's been compressed in a way that allows multithreaded decompression... I'm pretty sure I had some examples before, but maybe somethings changed in the intervening year. Anyway, if I discover this again, I'll actually make a note of where it's applicable.
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm index e5f3197b0..9d11e5b1e 100644 --- a/guix/build/gnu-build-system.scm +++ b/guix/build/gnu-build-system.scm @@ -147,7 +147,7 @@ chance to be set." locale (strerror (system-error-errno args))) #t))) -(define* (unpack #:key source #:allow-other-keys) +(define* (unpack #:key source parallel-build? #:allow-other-keys) "Unpack SOURCE in the working directory, and change directory within the source. When SOURCE is a directory, copy it in a sub-directory of the current working directory." @@ -161,6 +161,10 @@ working directory." (copy-recursively source "." #:keep-mtime? #t)) (begin + (when parallel-build? + (setenv "XZ_OPT" + (format #f "-T~d" (parallel-job-count)))) + (if (string-suffix? ".zip" source) (invoke "unzip" source) (invoke "tar" "xvf" source))