Message ID | 7831fcdd8b8aab99cc95ba904076014b4c3cb6d2.camel@telenet.be |
---|---|
State | Accepted |
Headers | show |
Series | [bug#50456] Optimise bytevector->nix-base32-string and bytevector->base16-string. | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
I split off the base16 and base32 optimisations to a separate patch series: 50456@debbugs.gnu.org.
retitle [PATCH] Optimise bytevector->nix-base32-string and bytevector->base16-string. thanks
Hello, Maxime Devos <maximedevos@telenet.be> skribis: > The two atached patches optimise bytevector->nix-base32-string and > bytevector->base16-string, making them about 20% and two times > faster respectively, by reducing allocations. They are called > from 'output-path', 'fixed-output-path' and 'store-path' > in (guix store). Thanks a lot for looking into this! > Unfortunately, this does not decrease timings to a noticable degree, > but it does decrease the allocated memory by 2.3% (*), and it does not > seem to increase timings. (See perf-numbers.txt.) Yeah, base32 code is usually pretty low in profiles of calls to ‘package-derivation’. > From a93bad629e2746c77446cacddb9986506ce9ba88 Mon Sep 17 00:00:00 2001 > From: Maxime Devos <maximedevos@telenet.be> > Date: Sun, 5 Sep 2021 16:28:33 +0200 > Subject: [PATCH 1/2] base32: Reduce GC pressure in > make-bytevector->base32-string. > > The following code has been used to compare performance: > > ;; first 20 bytes of sha256 of #vu8(#xde #xad #xbe #xef) > (define bv #vu8(95 120 195 50 116 228 63 169 222 86 89 38 92 29 145 126 37 192 55 34)) > ,profile > (let loop ((n 0)) > (when (< n #e1e6) > ((@ (guix base32) bytevector->nix-base32-string) bv) > (loop (+ n 1)))) > > Before this change, the output was: > > [...] > Sample count: 1140 > Total time: 27.465560018 seconds (10.659331433 seconds in GC) > > After this change, the output was: > > [...] > Sample count: 957 > Total time: 20.478847143 seconds (6.139721189 seconds in GC) Note that ,profile (statprof) is intrusive; additional, the REPL uses the “debug” VM engine, which is slightly slower than the “regular” one (info "(guile) Command-line Options"). To measure “actual” performance, it’s best to write the code down in a file and then run: time guile -l that-file.scm or, alternatively, use (ice-9 time) and wrap the body of the relevant code in (time …), which is a bit more accurate than using the shell’s ‘time’ command since it allows you to dismiss Guile startup time. (You also need to make sure that the loop counter remains below ‘most-positive-fixnum’, otherwise you’ll end up measuring GC activity due to the use of bignums, but 10⁶ is definitely OK.) > * guix/base32.scm > (make-bytevector->base32-string): Eliminate 'reverse', use mutation instead. [...] > + (let* ((start (cons #f #f)) > + (end (quintet-fold (lambda (q r) > + (define pair > + (cons (vector-ref base32-chars q) #f)) > + (set-cdr! r pair) > + pair) > + start > + bv))) > + (set-cdr! end '()) > + (list->string (cdr start))))) Does replacing (reverse chars) with (reverse! chars) has the same effect? I’m reluctant to resorting to micro-optimizations like the one above since they make code harder to reason about. > From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001 > From: Maxime Devos <maximedevos@telenet.be> > Date: Mon, 6 Sep 2021 00:46:17 +0200 > Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string. > > This makes bytevector->base16-string two times faster. > > * guix/base16.scm (bytevector->base16-string): Use utf8->string > and iteration instead of string-concatenate and named let. LGTM. How did you measure performance for this one? Thanks, Ludo’.
Ludovic Courtès schreef op do 09-09-2021 om 16:29 [+0200]: > Hello, > > Maxime Devos <maximedevos@telenet.be> skribis: > > > The two atached patches optimise bytevector->nix-base32-string and > > bytevector->base16-string, making them about 20% and two times > > faster respectively, by reducing allocations. They are called > > from 'output-path', 'fixed-output-path' and 'store-path' > > in (guix store). > > Thanks a lot for looking into this! > > > Unfortunately, this does not decrease timings to a noticable degree, > > but it does decrease the allocated memory by 2.3% (*), and it does not > > seem to increase timings. (See perf-numbers.txt.) > > Yeah, base32 code is usually pretty low in profiles of calls to > ‘package-derivation’. > > > From a93bad629e2746c77446cacddb9986506ce9ba88 Mon Sep 17 00:00:00 2001 > > From: Maxime Devos <maximedevos@telenet.be> > > Date: Sun, 5 Sep 2021 16:28:33 +0200 > > Subject: [PATCH 1/2] base32: Reduce GC pressure in > > make-bytevector->base32-string. > > > > The following code has been used to compare performance: > > > > ;; first 20 bytes of sha256 of #vu8(#xde #xad #xbe #xef) > > (define bv #vu8(95 120 195 50 116 228 63 169 222 86 89 38 92 29 145 126 37 192 55 34)) > > ,profile > > (let loop ((n 0)) > > (when (< n #e1e6) > > ((@ (guix base32) bytevector->nix-base32-string) bv) > > (loop (+ n 1)))) > > > > Before this change, the output was: > > > > [...] > > Sample count: 1140 > > Total time: 27.465560018 seconds (10.659331433 seconds in GC) > > > > After this change, the output was: > > > > [...] > > Sample count: 957 > > Total time: 20.478847143 seconds (6.139721189 seconds in GC) > > Note that ,profile (statprof) is intrusive; additional, the REPL uses > the “debug” VM engine, which is slightly slower than the “regular” one > (info "(guile) Command-line Options"). > > To measure “actual” performance, it’s best to write the code down in a > file and then run: > > time guile -l that-file.scm > > or, alternatively, use (ice-9 time) and wrap the body of the relevant > code in (time …), which is a bit more accurate than using the shell’s > ‘time’ command since it allows you to dismiss Guile startup time. I'll test with ((@ (ice-9 time) ...). > (You also need to make sure that the loop counter remains below > ‘most-positive-fixnum’, otherwise you’ll end up measuring GC activity > due to the use of bignums, but 10⁶ is definitely OK.) > > > * guix/base32.scm > > (make-bytevector->base32-string): Eliminate 'reverse', use mutation instead. > > [...] > > > + (let* ((start (cons #f #f)) > > + (end (quintet-fold (lambda (q r) > > + (define pair > > + (cons (vector-ref base32-chars q) #f)) > > + (set-cdr! r pair) > > + pair) > > + start > > + bv))) > > + (set-cdr! end '()) > > + (list->string (cdr start))))) > > Does replacing (reverse chars) with (reverse! chars) has the same > effect? Not tested. > I’m reluctant to resorting to micro-optimizations like the one above > since they make code harder to reason about. Agreed, let's drop that patch. > > From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001 > > From: Maxime Devos <maximedevos@telenet.be> > > Date: Mon, 6 Sep 2021 00:46:17 +0200 > > Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string. > > > > This makes bytevector->base16-string two times faster. > > > > * guix/base16.scm (bytevector->base16-string): Use utf8->string > > and iteration instead of string-concatenate and named let. > > LGTM. How did you measure performance for this one? IIRC, the same way as with bytevector->base32-string. I'll retest with (ice-9 time). Greetings, Maxime.
Hi, Here are the test results using (ice-9 time) with the attached "time.scm" and guix/base16.scm, to be run with ‘make && ./pre-inst-env guix repl time.scm’: old: clock utime stime cutime cstime gctime 3.93 6.32 0.03 0.00 0.00 3.59 clock utime stime cutime cstime gctime 3.92 6.32 0.03 0.00 0.00 3.59 clock utime stime cutime cstime gctime 3.86 6.24 0.02 0.00 0.00 3.54 new: clock utime stime cutime cstime gctime 2.43 3.60 0.02 0.00 0.00 1.76 clock utime stime cutime cstime gctime 2.49 3.67 0.01 0.00 0.00 1.77 clock utime stime cutime cstime gctime 2.64 3.77 0.01 0.00 0.00 1.77 About half as much time is spent in GC. The ‘utime’ is also half as much. Not sure what ‘clock’ means exactly, but it is reduced as well. Greetings, Maxime
Partially merged with a87d8c912d64382d8d7489c156249bc2b2638df0. Closing.
From dfd9b7557e31823320fcbd7abed77de295b7dce1 Mon Sep 17 00:00:00 2001 From: Maxime Devos <maximedevos@telenet.be> Date: Mon, 6 Sep 2021 00:46:17 +0200 Subject: [PATCH 2/2] base16: Reduce GC pressure in bytevector->base16-string. This makes bytevector->base16-string two times faster. * guix/base16.scm (bytevector->base16-string): Use utf8->string and iteration instead of string-concatenate and named let. --- guix/base16.scm | 44 +++++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/guix/base16.scm b/guix/base16.scm index 6c15a9f588..9ac964dff0 100644 --- a/guix/base16.scm +++ b/guix/base16.scm @@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2012, 2014, 2017 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2021 Maxime Devos <maximedevos@telenet.be> ;;; ;;; This file is part of GNU Guix. ;;; @@ -32,27 +33,28 @@ (define (bytevector->base16-string bv) "Return the hexadecimal representation of BV's contents." - (define len - (bytevector-length bv)) - - (let-syntax ((base16-chars (lambda (s) - (syntax-case s () - (_ - (let ((v (list->vector - (unfold (cut > <> 255) - (lambda (n) - (format #f "~2,'0x" n)) - 1+ - 0)))) - v)))))) - (define chars base16-chars) - (let loop ((i len) - (r '())) - (if (zero? i) - (string-concatenate r) - (let ((i (- i 1))) - (loop i - (cons (vector-ref chars (bytevector-u8-ref bv i)) r))))))) + (define len (bytevector-length bv)) + (define utf8 (make-bytevector (* len 2))) + (let-syntax ((base16-octet-pairs + (lambda (s) + (syntax-case s () + (_ + (string->utf8 + (string-concatenate + (unfold (cut > <> 255) + (lambda (n) + (format #f "~2,'0x" n)) + 1+ + 0)))))))) + (define octet-pairs base16-octet-pairs) + (let loop ((i 0)) + (when (< i len) + (bytevector-u16-native-set! + utf8 (* 2 i) + (bytevector-u16-native-ref octet-pairs + (* 2 (bytevector-u8-ref bv i)))) + (loop (+ i 1)))) + (utf8->string utf8))) (define base16-string->bytevector (let ((chars->value (fold (lambda (i r) -- 2.33.0