diff mbox series

[bug#50384,v3] Optimise search-patch (reducing I/O)

Message ID 04603bca34f16b284a5e3052a4b0765b60952817.camel@telenet.be
State New
Headers show
Series [bug#50384,v3] Optimise search-patch (reducing I/O) | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job
cbaines/issue success View issue

Commit Message

M Sept. 9, 2021, 8:25 p.m. UTC
Hi guix,

This is a v3, without the base16 and base32 optimisations which
are split-off into <https://issues.guix.gnu.org/50456>.  It doesn't
seem this patch series will bring improvements, but feel free to test
(in particular, I wonder if this will help people using a remote daemon,
where transmitting data can take (relatively) long?).

(guix scripts hash) is broken, which would need to be fixed in the final
version, if any.  Ludovic has some concerns about dependency tracking in
search-patch which need to be adressed.

I think a more fruitful goal is to somehow parallelize the derivation
computation, with multiple separate connections to the store, such that
if one connection is blocking, the other one can be used for something
separate (threads aren't necessary if current-read-waiter,
current-write-waiter and non-blocking I/O are used).

Now, what improvements does this version of the patch series bring?
(Make sure to start the daemon with ./pre-inst-env guix daemon ...,
and set --localstatedir=/var!  Some changes to the daemon were made.)

1.  RPC count (tested in a local checkout)

    After the patch series:
make && GUIX_PROFILING=rpc ./pre-inst-env guix build -d pigx --no-grafts
accepted connection from pid 4917, user [USER]

/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Remote procedure call summary: 5754 RPCs
  built-in-builders              ...     1
  add-to-store                   ...     3
  add-to-store/tree              ...    26
  add-temp-root-and-valid-path?  ...   195
  add-text-to-store              ...  5529

  After the patch series, with (if sha256 ...) replaced with (if #f ...)
  in (guix gexp), to simulate the situation before the patch series

/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Remote procedure call summary: 5749 RPCs
  built-in-builders              ...     1
  add-to-store/tree              ...    26
  add-to-store                   ...   193
  add-text-to-store              ...  5529

(add-to-store RPCs are converted to add-temp-root-and-valid-path? RPCs)

2. Timing

   First do
   	 echo powersave | sudo tee /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/scaling_governor
   to eliminate CPU frequency scaling effects.
   To automatically repeat the tests and compute the standard deviation,
   'hyperfine' is used:
   HYP=/gnu/store/3ya4iw6fzq1ns73bv1g3a96jvwhbv60c-hyperfine-1.11.0/bin/hyperfine

   To determine the effect of the change to 'local-file-compiler' and
   'search-patch' and nothing else, I will compare the performance of guix
   after the patch series with the performance of guix after the patch series
   and 'sha256' replaced by #false.

   With #f, --runs=60:
   make && ./pre-inst-env $HYP --runs=60 --warmup 1 -- 'guix build -d pigx --no-grafts'
   Time (mean ± σ):     15.428 s ±  0.385 s    [User: 15.925 s, System: 0.652 s]
   Range (min … max):   14.768 s … 16.550 s    60 runs

   With sha256, --runs=60
   make && ./pre-inst-env $HYP --runs=60 --warmup 1 -- 'guix build -d pigx --no-grafts'
   Time (mean ± σ):     15.493 s ±  0.252 s    [User: 15.585 s, System: 0.680 s]
   Range (min … max):   14.981 s … 16.294 s    60 runs

  These numbers don't have a clear difference.  Maybe statistics can help?   First,
  formulate a null-hypothesis.  As the total number of RPCs didn't change, the amount
  of data sent to the daemon is reduced and some "stats", "open" and "reads" are avoided,
  I would expect that the mean decreases.  Thus, as null-hypothesis, I choose:

  H0: the (theoretical) mean for ‘with sha256’ is less than the mean for ‘with #f’

  In the timing tests, the observed mean for 'with sha256’ is actually larger.
  But is this significant?

  guix environment --ad-hoc r
  before.mean   = 15.428
  before.stddev = 0.385
  after.mean    = 15.493
  after.stddev  = 0.252
  samples = 60

  # ‘statistical’ crate used by hyperfine
  # performs N/(N-1) correction XXX

  t = (before.mean - after.mean)/(sqrt(samples) * sqrt(before.stddev^2 + after.stddev^2))
  v = (samples - 1) * (before.stddev^2 + after.stddev^2)^2/(before.stddev^4 + after.stddev^4)

  q = dt(-t, v); q
  # p-value: 0.5072571
  # Null-hypothesis is not rejected

  It's not rejected, though that doesn't prove much since t is almost zero,
  so this test cannot reject the hypothesis ‘the means are equal’ or ‘the patch
  series makes things slower’ either.

  I don't think this patch series helps on my laptop (at least on a hot disk cache, I'd have
  to check for a cold cache).  However, I wonder if this would help a little for people
  using a remote build daemon (with a nfs setup or something) (see GUIX_DAEMON_SOCKET)?

Greetings,
Maxime.

Comments

M Sept. 10, 2021, 9:54 a.m. UTC | #1
It doesn't seem like the patch series is going
to end up improving anything, so I'm closing it.

Greetings,
Maxime
diff mbox series

Patch

From d359fefabf2831e42aea6edf646a9e0373be5d0f Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Sat, 4 Sep 2021 18:10:32 +0200
Subject: [PATCH 10/10] gexp: Do not intern if the file is already in the
 store.

* guix/gexp.scm (local-file-compiler): When the file is already in the
  store, re-use the fixed output path instead of interning the file
  again.
* guix/gexp.scm (add-temp-root-and-valid-path?*): New procedure.
---
 guix/gexp.scm | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/guix/gexp.scm b/guix/gexp.scm
index c69e4aa299..20c9d93170 100644
--- a/guix/gexp.scm
+++ b/guix/gexp.scm
@@ -528,16 +528,34 @@  appears."
 'system-error' exception is raised if FILE could not be found."
   (force (%local-file-absolute-file-name file)))
 
+(define add-temp-root-and-valid-path?* (store-lift add-temp-root-and-valid-path?))
+
 (define-gexp-compiler (local-file-compiler (file <local-file>) system target)
   ;; "Compile" FILE by adding it to the store.
   (match file
-    (($ <local-file> file (= force absolute) name sha256 recursive? select?)
-     ;; Canonicalize FILE so that if it's a symlink, it is resolved.  Failing
-     ;; to do that, when RECURSIVE? is #t, we could end up creating a dangling
-     ;; symlink in the store, and when RECURSIVE? is #f 'add-to-store' would
-     ;; just throw an error, both of which are inconvenient.
-     (interned-file absolute name
-                    #:recursive? recursive? #:select? select?))))
+    ;; Delay computing the absolute file name until 'intern', as this
+    ;; might be a relatively expensive computation (e.g. if search-patch
+    ;; is used), especially on a spinning disk.
+    (($ <local-file> file absolute-promise name sha256 recursive? select?)
+     (let ()
+       (define (intern)
+         ;; Canonicalize FILE so that if it's a symlink, it is resolved.
+         ;; Failing to do that, when RECURSIVE? is #t, we could end up creating
+         ;; a dangling symlink in the store, and when RECURSIVE? is #f
+         ;; 'add-to-store' would just throw an error, both of which are
+         ;; inconvenient.
+         (interned-file (force absolute-promise) name
+                        #:recursive? recursive? #:select? select?))
+       ;; If the hash is known in advance and the store already has the
+       ;; item, there is no need to intern the file.
+       (if sha256
+           (let ((path (fixed-output-path name sha256 #:recursive? recursive?)))
+             (mlet %store-monad ((valid? (add-temp-root-and-valid-path?* path)))
+               (if valid?
+                   (return path)
+                   (intern))))
+           ;; If PATH does not yet exist, fall back to interning.
+           (intern))))))
 
 (define-record-type <plain-file>
   (%plain-file name content references)
-- 
2.33.0