mbox

[bug#45253,0/6] Pipeline substitute integrity check, deduplication, and canonicalization

Message ID 20201215093830.10322-1-ludo@gnu.org
Headers show

Message

Ludovic Courtès Dec. 15, 2020, 9:38 a.m. UTC
Hello Guix!

This is a followup to <https://issues.guix.gnu.org/45018>.  It is
meant to be applied on top of <https://issues.guix.gnu.org/44760>.

Until now, guix-daemon would check the hash of store items just
substituted, reset timestamps/permissions, and deduplicate.  This
would lead to extra I/O: the whole set of files is traversed three
times by the daemon and read two times.

This patch series is about delegating that work to ‘guix substitute’,
which it can do directly as it restores file, thereby reducing I/O
to the minimum necessary.

I tested with substitutes that contain many files:

  guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
    emacs-no-x emacs-no-x-toolkit

On my laptop with an SSD, the wall-clock time is almost unchanged
when fetching lzip substitutes.  You can see that the throughput
displayed while downloading is slightly lower than before, which
is consistent because lzip downloads are CPU-bound¹, but this is
compensated by the lack of processing time between substitutes.
With gzip substitutes, I see a 10% speedup on the wall-clock time
on my laptop.

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html

Ludovic Courtès (6):
  tests: Check the build trace for hash mismatches on substitutes.
  daemon: Let 'guix substitute' perform hash checks.
  tests: Check the mtime and permissions of substituted items.
  daemon: Do not reset timestamps and permissions on substituted items.
  tests: Make sure substituted items are deduplicated.
  daemon: Delegate deduplication to 'guix substitute'.

 guix/scripts/substitute.scm | 70 +++++++++++++++++++++++++-----
 guix/serialization.scm      |  8 +++-
 nix/libstore/build.cc       | 85 ++++++++++++++++++++-----------------
 tests/store.scm             | 82 +++++++++++++++++++++++++++++++++++
 tests/substitute.scm        | 58 ++++++++++++++++++++++---
 5 files changed, 248 insertions(+), 55 deletions(-)

Comments

Maxim Cournoyer Dec. 19, 2020, 2:07 a.m. UTC | #1
Hey Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tested with substitutes that contain many files:
>>
>>   guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
>>     emacs-no-x emacs-no-x-toolkit
>>
>> On my laptop with an SSD, the wall-clock time is almost unchanged
>> when fetching lzip substitutes.  You can see that the throughput
>> displayed while downloading is slightly lower than before, which
>> is consistent because lzip downloads are CPU-bound¹, but this is
>> compensated by the lack of processing time between substitutes.
>> With gzip substitutes, I see a 10% speedup on the wall-clock time
>> on my laptop.
>
> Picture!  First the timechart with the current daemon (gzip
> substitutes, downloading from the LAN):
>
>
>
>
> Notice how guix-daemon is busy in between substitute downloads: that’s
> the time it takes to compute the nar hash of the store item, reset its
> timestamps, and deduplicate its files.
>
> Now the same operation after with this patch series:
>
>
>
>
> This time guix-daemon remains idle all along whereas ‘guix substitute’
> is almost 100% busy.  There’s no pause time between substitutes.
>
> Ludo’.

Very nice!  Thanks for this work!  I can't wait to try it.

Maxim