diff mbox series

[bug#68266,v2] guix: store: Add report-object-cache-duplication.

Message ID 89c875f974d1ad81ddd03f664ef08e397771d224.1704891443.git.mail@cbaines.net
State New
Headers show
Series [bug#68266,v2] guix: store: Add report-object-cache-duplication. | expand

Commit Message

Christopher Baines Jan. 10, 2024, 12:57 p.m. UTC
This is intended to help with spotting duplication in the object cache, so
where many keys, for example package records map to the same derivation. This
represents an opportunity for improved performance if you can reduce this
duplication in the cache, and better take advantage of the already present
cache entries.

I'm thinking this can be used by the data service, but maybe it could also be
worked in to guix commands.

* guix/store.scm (report-object-cache-duplication): New procedure.

Change-Id: Ia6c816f871d10cae6807543224250110099d764f
---
 guix/store.scm | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)


base-commit: e541f9593f8bfc84b6140c2408b393243289fae6

Comments

Ludovic Courtès Jan. 12, 2024, 2:22 p.m. UTC | #1
Christopher Baines <mail@cbaines.net> skribis:

> This is intended to help with spotting duplication in the object cache, so
> where many keys, for example package records map to the same derivation. This
> represents an opportunity for improved performance if you can reduce this
> duplication in the cache, and better take advantage of the already present
> cache entries.

Another way to detect this is by looking at ‘add-data-to-store-cache’
stats:

--8<---------------cut here---------------start------------->8---
$ GUIX_PROFILING="add-data-to-store-cache object-cache" guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d 
/gnu/store/mivzv83wryv9gp5bjncg5m1831dx2xwr-circtools-1.0.0.drv
/gnu/store/4xf7kh9mi0vpvs8m1ak4x8w1rpsdpv6z-du-dust-0.8.6.drv
/gnu/store/ayk54gvlbc1qam6irzf9kaig56dhzni0-wlgreet-0.4.1.drv
/gnu/store/r601i40cii9ic5w1k4hy5c2yngfayh64-greetd-0.9.0.drv
Object Cache:
  fresh caches:    22
  lookups:      40435
  hits:         36821 (91.1%)
  cache size:    3613 entries

'add-data-to-store' cache:
  lookups:       4090
  hits:           958 (23.4%)
  .drv files:    3062 (74.9%)
  Scheme files:   916 (22.4%)
$ GUIX_PROFILING="add-data-to-store-cache object-cache" ./pre-inst-env guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d 
/gnu/store/1wsldmvigjb8w2gk418npbnfznlb0ck1-circtools-1.0.0.drv
/gnu/store/b5c73fawjdvkgy431qxz9l6l9y9a9lhz-du-dust-0.8.6.drv
/gnu/store/zwc7qzsbzf62dgbbzy74lki4hsr406bw-wlgreet-0.4.1.drv
/gnu/store/vjdd23hc82701afb132z1ajcqa7hfd74-greetd-0.9.0.drv
Object Cache:
  fresh caches:    22
  lookups:      37942
  hits:         34523 (91.0%)
  cache size:    3418 entries

'add-data-to-store' cache:
  lookups:       3895
  hits:           763 (19.6%)
  .drv files:    2957 (75.9%)
  Scheme files:   826 (21.2%)
--8<---------------cut here---------------end--------------->8---

Ideally, the hit rate there would be 0% and we could remove it.

If there’s a positive hit rate, it means we keep adding the same .drv
and/or *-builder files to the store, meaning that the object cache was
ineffective.

> +(define* (report-object-cache-duplication store #:key (threshold 10)
> +                                          (port (current-error-port)))

Do you have an example output of this?

How helpful does it look to you in practice?

I guess it we’d need to add a ‘register-profiling-hook!’ call somewhere?

If we were to add this, I’d suggest adding a docstring and following the
same style as the rest of the code (no long identifiers for local
variables and no car/cdr, for example).

Thanks,
Ludo’.
Christopher Baines Jan. 12, 2024, 6:26 p.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> This is intended to help with spotting duplication in the object cache, so
>> where many keys, for example package records map to the same derivation. This
>> represents an opportunity for improved performance if you can reduce this
>> duplication in the cache, and better take advantage of the already present
>> cache entries.
>
> Another way to detect this is by looking at ‘add-data-to-store-cache’
> stats:
>
> $ GUIX_PROFILING="add-data-to-store-cache object-cache" guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d 
> /gnu/store/mivzv83wryv9gp5bjncg5m1831dx2xwr-circtools-1.0.0.drv
> /gnu/store/4xf7kh9mi0vpvs8m1ak4x8w1rpsdpv6z-du-dust-0.8.6.drv
> /gnu/store/ayk54gvlbc1qam6irzf9kaig56dhzni0-wlgreet-0.4.1.drv
> /gnu/store/r601i40cii9ic5w1k4hy5c2yngfayh64-greetd-0.9.0.drv
> Object Cache:
>   fresh caches:    22
>   lookups:      40435
>   hits:         36821 (91.1%)
>   cache size:    3613 entries
>
> 'add-data-to-store' cache:
>   lookups:       4090
>   hits:           958 (23.4%)
>   .drv files:    3062 (74.9%)
>   Scheme files:   916 (22.4%)
> $ GUIX_PROFILING="add-data-to-store-cache object-cache" ./pre-inst-env guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d 
> /gnu/store/1wsldmvigjb8w2gk418npbnfznlb0ck1-circtools-1.0.0.drv
> /gnu/store/b5c73fawjdvkgy431qxz9l6l9y9a9lhz-du-dust-0.8.6.drv
> /gnu/store/zwc7qzsbzf62dgbbzy74lki4hsr406bw-wlgreet-0.4.1.drv
> /gnu/store/vjdd23hc82701afb132z1ajcqa7hfd74-greetd-0.9.0.drv
> Object Cache:
>   fresh caches:    22
>   lookups:      37942
>   hits:         34523 (91.0%)
>   cache size:    3418 entries
>
> 'add-data-to-store' cache:
>   lookups:       3895
>   hits:           763 (19.6%)
>   .drv files:    2957 (75.9%)
>   Scheme files:   826 (21.2%)
>
> Ideally, the hit rate there would be 0% and we could remove it.
>
> If there’s a positive hit rate, it means we keep adding the same .drv
> and/or *-builder files to the store, meaning that the object cache was
> ineffective.
>
>> +(define* (report-object-cache-duplication store #:key (threshold 10)
>> +                                          (port (current-error-port)))
>
> Do you have an example output of this?
>
> How helpful does it look to you in practice?

Yep, so here's some output I get from computing all the cross
derivations to i586-pc-gnu:

  value #<derivation
    /gnu/store/lqxlksh00cshc816xqfq47r3jjdfj2p9-subversion-1.14.2.drv =>
    /gnu/store/92avphfdcrcaxx8m5a6ihmw558bj3np8-subversion-1.14.2 7fcfbe5f9280>
  cached 4174 times
  example keys:
    - #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0>
    - #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0>
    - #<file-append #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> "/bin/svn">
    - #<file-append #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> "/bin/svn">

So it's not immediately obvious what the issue is, but if you search for
"/bin/svn", then you find the file-append calls in svn-fetch, so there's
a couple of new file-append records added to the cache for each
svn-fetch.


  value /gnu/store/qi1km3qlv5hdsql6h3ibwvz6v4z8lqbz-ld-wrapper.in cached 755 times
  example keys:
    - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>>
    - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>>
    - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>>
    - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>>


Similar to file-append, it's not immediately obvious where all these
local-file's are coming from, but searching for ld-wrapper.in suggests
make-ld-wrapper.

My thinking here is that maybe it's worth not caching everything in the
object cache. Particularly for file-append, I'm not sure what the cache
is actually doing. As for local-file, given there's lower level caching,
maybe it's worth leaning on that and not caching local-file either. I
tried this out, and it seemed to not make performance worse at least,
and it did remove the duplication from the cache.

Other things that show up include:

  value #<derivation /gnu/store/z28jfsm43wg50cvdfq0548pbf5jfhk8r-binutils-cross-i586-pc-gnu-2.38.drv => /gnu/store/lkmwq399jllig2a5r323v6y9nflpn6gn-binutils-cross-i586-pc-gnu-2.38 7fcfbd358000> cached 1228 times
  example keys:
    - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3bb8f0>
    - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b6630>
    - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b6420>
    - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b62c0>
diff mbox series

Patch

diff --git a/guix/store.scm b/guix/store.scm
index 97c4f32a5b..86ca293cac 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -70,6 +70,7 @@  (define-module (guix store)
             current-store-protocol-version        ;for internal use
             cache-lookup-recorder                 ;for internal use
             mcached
+            report-object-cache-duplication
 
             &store-error store-error?
             &store-connection-error store-connection-error?
@@ -2037,6 +2038,64 @@  (define-syntax mcached
     ((_ mvalue object keys ...)
      (mcached eq? mvalue object keys ...))))
 
+(define* (report-object-cache-duplication store #:key (threshold 10)
+                                          (port (current-error-port)))
+  (define cache-values-to-keys
+    (make-hash-table))
+
+  (define (insert key val)
+    (hash-set!
+     cache-values-to-keys
+     key
+     (or (and=> (hash-ref cache-values-to-keys
+                          key)
+                (lambda (existing-values)
+                  (cons val existing-values)))
+         (list val))))
+
+  (let* ((cache-size
+          (vhash-fold
+           (lambda (key value result)
+             (match value
+               ((item . keys*)
+                (insert item key)))
+
+             (+ 1 result))
+           0
+           (store-connection-cache store %object-cache-id)))
+         (cached-values-by-key-count
+          (sort
+           (hash-map->list
+            (lambda (key value)
+              (cons key (length value)))
+            cache-values-to-keys)
+           (lambda (a b)
+             (< (cdr a) (cdr b))))))
+
+    (filter-map
+     (match-lambda
+       ((value . count)
+        (if (>= count threshold)
+            (begin
+              (when port
+                (simple-format port "value ~A cached ~A times\n" value count)
+                (simple-format port "example keys:\n"))
+
+              (let ((keys (hash-ref cache-values-to-keys value)))
+                (when port
+                  (for-each
+                   (lambda (key)
+                     (simple-format #t "  - ~A\n" key))
+                   (if (> count 10)
+                       (take keys 10)
+                       keys))
+                  (newline port))
+
+                `((value . ,value)
+                  (keys . ,keys))))
+            #f)))
+     cached-values-by-key-count)))
+
 (define (preserve-documentation original proc)
   "Return PROC with documentation taken from ORIGINAL."
   (set-object-property! proc 'documentation