diff mbox series

[bug#50456] Optimise bytevector->nix-base32-string and bytevector->base16-string.

Message ID 7831fcdd8b8aab99cc95ba904076014b4c3cb6d2.camel@telenet.be
State Accepted
Headers show
Series [bug#50456] Optimise bytevector->nix-base32-string and bytevector->base16-string. | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job
cbaines/issue success View issue

Commit Message

M Sept. 7, 2021, 3:34 p.m. UTC
Hi guix,

The two atached patches optimise bytevector->nix-base32-string and
bytevector->base16-string, making them about 20% and two times
faster respectively, by reducing allocations.  They are called
from 'output-path', 'fixed-output-path' and 'store-path'
in (guix store).

Unfortunately, this does not decrease timings to a noticable degree,
but it does decrease the allocated memory by 2.3% (*), and it does not
seem to increase timings.  (See perf-numbers.txt.)

(*) GUIX_PROFILING=gc guix build -d pigx --no-grafts

Greetings,
Maxime.
while true; do time GUIX_PROFILING=gc ./the-unoptimised-baseN-guix/bin/guix build -d pigx --no-grafts; done

# First run removed
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.20 MiB
  GC times:         18
  time spent in GC: 3.69 seconds (24% of user time)

real	0m15,066s
user	0m15,149s
sys	0m0,709s
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.19 MiB
  GC times:         18
  time spent in GC: 3.70 seconds (24% of user time)

real	0m15,924s
user	0m15,695s
sys	0m0,836s
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.20 MiB
  GC times:         18
  time spent in GC: 3.66 seconds (24% of user time)

real	0m15,369s
user	0m15,339s
sys	0m0,704s
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.20 MiB
  GC times:         18
  time spent in GC: 3.69 seconds (25% of user time)

real	0m14,889s
user	0m15,066s
sys	0m0,679s

Summary.

(define (avg . r) (/ (apply + r) (length r)))
(define (stddev . r) (* (/ (length r) (- (length r) 1)) (sqrt (apply avg (map (lambda (x) (expt (- x (apply avg r)) 2)) r)))))

(define  %time/gc '(3.69 3.70 3.66 3.69))
(values (apply avg %time/gc) (apply stddev %time/gc))
$7 = 3.685
$8 = 0.01999999999999997

(define  %real '(15.066 15.924 15.369 14.889))
(define  %user '(15.149 15.695 15.339 15.066))
(define  %sys  '(0.709 0.836 0.704 0.679))

(values (apply avg %real) (apply stddev %real))
$1 = 15.312000000000001
$2 = 0.5237633053202561

(values (apply avg %user) (apply stddev %user))
$3 = 15.31225
$4 = 0.32283655995634153

(values (apply avg %sys) (apply stddev %sys))
$5 = 0.732
$6 = 0.08148074073737369

while true; do time GUIX_PROFILING=gc ./the-optimised-baseN-guix/bin/guix build -d pigx --no-grafts; done

/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        317.83 MiB
  GC times:         18
  time spent in GC: 3.71 seconds (22% of user time)

real	0m17,646s
user	0m16,539s
sys	0m0,705s
/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        317.83 MiB
  GC times:         18
  time spent in GC: 3.63 seconds (22% of user time)

real	0m18,733s
user	0m16,698s
sys	0m0,691s
/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        317.82 MiB
  GC times:         18
  time spent in GC: 3.72 seconds (24% of user time)

real	0m15,429s
user	0m15,448s
sys	0m0,696s
/gnu/store/jfjfg7dnis7v6947a0rncxdn3y1nz0ad-pigx-0.0.3.drv
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        317.82 MiB
  GC times:         18
  time spent in GC: 3.70 seconds (24% of user time)

real	0m15,292s
user	0m15,315s
sys	0m0,635s

(define %time/gc '(3.71 3.63 3.72 3.70))
(define %time/real '(17.646 18.733 15.429 15.292))
(define %time/user '(16.539 16.698 15.448 15.315))
(define %time/sys '(0.705 0.691 0.696 0.635))

(values (apply avg %time/gc) (apply stddev %time/gc))
$17 = 3.6900000000000004
$18 = 0.04714045207910329
(values (apply avg %time/real) (apply stddev %time/real))
$19 = 16.775000000000002
$20 = 1.9554380015172506
(values (apply avg %time/user) (apply stddev %time/user))
$21 = 16.0
$22 = 0.8304360300468671
(values (apply avg %time/sys) (apply stddev %time/sys))
$23 = 0.6817499999999999
$24 = 0.03660449274186007


Tests show neither a decrease nor an increase in timings.
Now looking at the allocation count:

The heap size before:
  heap size:        93.85 MiB
  allocated:        325.20 MiB

The heap size after:
  heap size:        93.85 MiB
  allocated:        317.82 MiB

A small improvement (-2.3%).

Comments

M Sept. 7, 2021, 3:36 p.m. UTC | #1
I split off the base16 and base32 optimisations
to a separate patch series: 50456@debbugs.gnu.org.
M Sept. 7, 2021, 3:37 p.m. UTC | #2
retitle [PATCH] Optimise bytevector->nix-base32-string and bytevector->base16-string.
thanks
Ludovic Courtès Sept. 9, 2021, 2:29 p.m. UTC | #3
Hello,

Maxime Devos <maximedevos@telenet.be> skribis:

> The two atached patches optimise bytevector->nix-base32-string and
> bytevector->base16-string, making them about 20% and two times
> faster respectively, by reducing allocations.  They are called
> from 'output-path', 'fixed-output-path' and 'store-path'
> in (guix store).

Thanks a lot for looking into this!

> Unfortunately, this does not decrease timings to a noticable degree,
> but it does decrease the allocated memory by 2.3% (*), and it does not
> seem to increase timings.  (See perf-numbers.txt.)

Yeah, base32 code is usually pretty low in profiles of calls to
‘package-derivation’.

> From a93bad629e2746c77446cacddb9986506ce9ba88 Mon Sep 17 00:00:00 2001
> From: Maxime Devos <maximedevos@telenet.be>
> Date: Sun, 5 Sep 2021 16:28:33 +0200
> Subject: [PATCH 1/2] base32: Reduce GC pressure in
>  make-bytevector->base32-string.
>
> The following code has been used to compare performance:
>
> ;; first 20 bytes of sha256 of #vu8(#xde #xad #xbe #xef)
> (define bv #vu8(95 120 195 50 116 228 63 169 222 86 89 38 92 29 145 126 37 192 55 34))
> ,profile
> (let loop ((n 0))
>   (when (< n #e1e6)
>      ((@ (guix base32) bytevector->nix-base32-string) bv)
>      (loop (+ n 1))))
>
> Before this change, the output was:
>
> [...]
> Sample count: 1140
> Total time: 27.465560018 seconds (10.659331433 seconds in GC)
>
> After this change, the output was:
>
> [...]
> Sample count: 957
> Total time: 20.478847143 seconds (6.139721189 seconds in GC)

Note that ,profile (statprof) is intrusive; additional, the REPL uses
the “debug” VM engine, which is slightly slower than the “regular” one
(info "(guile) Command-line Options").

To measure “actual” performance, it’s best to write the code down in a
file and then run:

  time guile -l that-file.scm

or, alternatively, use (ice-9 time) and wrap the body of the relevant
code in (time …), which is a bit more accurate than using the shell’s
‘time’ command since it allows you to dismiss Guile startup time.

(You also need to make sure that the loop counter remains below
‘most-positive-fixnum’, otherwise you’ll end up measuring GC activity
due to the use of bignums, but 10⁶ is definitely OK.)

> * guix/base32.scm
>   (make-bytevector->base32-string): Eliminate 'reverse', use mutation instead.

[...]

> +    (let* ((start (cons #f #f))
> +           (end (quintet-fold (lambda (q r)
> +                                (define pair
> +                                  (cons (vector-ref base32-chars q) #f))
> +                                (set-cdr! r pair)
> +                                pair)
> +                              start
> +                              bv)))
> +      (set-cdr! end '())
> +      (list->string (cdr start)))))

Does replacing (reverse chars) with (reverse! chars) has the same
effect?

I’m reluctant to resorting to micro-optimizations like the one above
since they make code harder to reason about.

> From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001
> From: Maxime Devos <maximedevos@telenet.be>
> Date: Mon, 6 Sep 2021 00:46:17 +0200
> Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string.
>
> This makes bytevector->base16-string two times faster.
>
> * guix/base16.scm (bytevector->base16-string): Use utf8->string
>   and iteration instead of string-concatenate and named let.

LGTM.  How did you measure performance for this one?

Thanks,
Ludo’.
M Sept. 9, 2021, 2:42 p.m. UTC | #4
Ludovic Courtès schreef op do 09-09-2021 om 16:29 [+0200]:
> Hello,
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > The two atached patches optimise bytevector->nix-base32-string and
> > bytevector->base16-string, making them about 20% and two times
> > faster respectively, by reducing allocations.  They are called
> > from 'output-path', 'fixed-output-path' and 'store-path'
> > in (guix store).
> 
> Thanks a lot for looking into this!
> 
> > Unfortunately, this does not decrease timings to a noticable degree,
> > but it does decrease the allocated memory by 2.3% (*), and it does not
> > seem to increase timings.  (See perf-numbers.txt.)
> 
> Yeah, base32 code is usually pretty low in profiles of calls to
> ‘package-derivation’.
> 
> > From a93bad629e2746c77446cacddb9986506ce9ba88 Mon Sep 17 00:00:00 2001
> > From: Maxime Devos <maximedevos@telenet.be>
> > Date: Sun, 5 Sep 2021 16:28:33 +0200
> > Subject: [PATCH 1/2] base32: Reduce GC pressure in
> >  make-bytevector->base32-string.
> > 
> > The following code has been used to compare performance:
> > 
> > ;; first 20 bytes of sha256 of #vu8(#xde #xad #xbe #xef)
> > (define bv #vu8(95 120 195 50 116 228 63 169 222 86 89 38 92 29 145 126 37 192 55 34))
> > ,profile
> > (let loop ((n 0))
> >   (when (< n #e1e6)
> >      ((@ (guix base32) bytevector->nix-base32-string) bv)
> >      (loop (+ n 1))))
> > 
> > Before this change, the output was:
> > 
> > [...]
> > Sample count: 1140
> > Total time: 27.465560018 seconds (10.659331433 seconds in GC)
> > 
> > After this change, the output was:
> > 
> > [...]
> > Sample count: 957
> > Total time: 20.478847143 seconds (6.139721189 seconds in GC)
> 
> Note that ,profile (statprof) is intrusive; additional, the REPL uses
> the “debug” VM engine, which is slightly slower than the “regular” one
> (info "(guile) Command-line Options").
> 
> To measure “actual” performance, it’s best to write the code down in a
> file and then run:
> 
>   time guile -l that-file.scm
> 
> or, alternatively, use (ice-9 time) and wrap the body of the relevant
> code in (time …), which is a bit more accurate than using the shell’s
> ‘time’ command since it allows you to dismiss Guile startup time.

I'll test with ((@ (ice-9 time) ...).

> (You also need to make sure that the loop counter remains below
> ‘most-positive-fixnum’, otherwise you’ll end up measuring GC activity
> due to the use of bignums, but 10⁶ is definitely OK.)
> 
> > * guix/base32.scm
> >   (make-bytevector->base32-string): Eliminate 'reverse', use mutation instead.
> 
> [...]
> 
> > +    (let* ((start (cons #f #f))
> > +           (end (quintet-fold (lambda (q r)
> > +                                (define pair
> > +                                  (cons (vector-ref base32-chars q) #f))
> > +                                (set-cdr! r pair)
> > +                                pair)
> > +                              start
> > +                              bv)))
> > +      (set-cdr! end '())
> > +      (list->string (cdr start)))))
> 
> Does replacing (reverse chars) with (reverse! chars) has the same
> effect?

Not tested.

> I’m reluctant to resorting to micro-optimizations like the one above
> since they make code harder to reason about.

Agreed, let's drop that patch.

> > From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001
> > From: Maxime Devos <maximedevos@telenet.be>
> > Date: Mon, 6 Sep 2021 00:46:17 +0200
> > Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string.
> > 
> > This makes bytevector->base16-string two times faster.
> > 
> > * guix/base16.scm (bytevector->base16-string): Use utf8->string
> >   and iteration instead of string-concatenate and named let.
> 
> LGTM.  How did you measure performance for this one?

IIRC, the same way as with bytevector->base32-string.  I'll retest
with (ice-9 time).

Greetings,
Maxime.
M Sept. 9, 2021, 3:15 p.m. UTC | #5
Hi,

Here are the test results using (ice-9 time) with
the attached "time.scm" and guix/base16.scm,
to be run with ‘make && ./pre-inst-env guix repl time.scm’:

old:
clock utime stime cutime cstime gctime
 3.93  6.32  0.03   0.00   0.00   3.59
clock utime stime cutime cstime gctime
 3.92  6.32  0.03   0.00   0.00   3.59
clock utime stime cutime cstime gctime
 3.86  6.24  0.02   0.00   0.00   3.54
new:
clock utime stime cutime cstime gctime
 2.43  3.60  0.02   0.00   0.00   1.76
clock utime stime cutime cstime gctime
 2.49  3.67  0.01   0.00   0.00   1.77
clock utime stime cutime cstime gctime
 2.64  3.77  0.01   0.00   0.00   1.77

About half as much time is spent in GC.
The ‘utime’ is also half as much.  Not sure
what ‘clock’ means exactly, but it is reduced
as well.

Greetings,
Maxime
M Sept. 11, 2021, 3:54 p.m. UTC | #6
Partially merged with a87d8c912d64382d8d7489c156249bc2b2638df0.
Closing.
diff mbox series

Patch

From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Mon, 6 Sep 2021 00:46:17 +0200
Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string.

This makes bytevector->base16-string two times faster.

* guix/base16.scm (bytevector->base16-string): Use utf8->string
  and iteration instead of string-concatenate and named let.
---
 guix/base16.scm | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/guix/base16.scm b/guix/base16.scm
index 6c15a9f588..9ac964dff0 100644
--- a/guix/base16.scm
+++ b/guix/base16.scm
@@ -1,5 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2012, 2014, 2017 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2021 Maxime Devos <maximedevos@telenet.be>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -32,27 +33,28 @@ 
 
 (define (bytevector->base16-string bv)
   "Return the hexadecimal representation of BV's contents."
-  (define len
-    (bytevector-length bv))
-
-  (let-syntax ((base16-chars (lambda (s)
-                               (syntax-case s ()
-                                 (_
-                                  (let ((v (list->vector
-                                            (unfold (cut > <> 255)
-                                                    (lambda (n)
-                                                      (format #f "~2,'0x" n))
-                                                    1+
-                                                    0))))
-                                    v))))))
-    (define chars base16-chars)
-    (let loop ((i len)
-               (r '()))
-      (if (zero? i)
-          (string-concatenate r)
-          (let ((i (- i 1)))
-            (loop i
-                  (cons (vector-ref chars (bytevector-u8-ref bv i)) r)))))))
+  (define len (bytevector-length bv))
+  (define utf8 (make-bytevector (* len 2)))
+  (let-syntax ((base16-octet-pairs
+                (lambda (s)
+                  (syntax-case s ()
+                    (_
+                     (string->utf8
+                      (string-concatenate
+                       (unfold (cut > <> 255)
+                               (lambda (n)
+                                 (format #f "~2,'0x" n))
+                               1+
+                               0))))))))
+    (define octet-pairs base16-octet-pairs)
+    (let loop ((i 0))
+      (when (< i len)
+        (bytevector-u16-native-set!
+         utf8 (* 2 i)
+         (bytevector-u16-native-ref octet-pairs
+                                    (* 2 (bytevector-u8-ref bv i))))
+        (loop (+ i 1))))
+    (utf8->string utf8)))
 
 (define base16-string->bytevector
   (let ((chars->value (fold (lambda (i r)
-- 
2.33.0