Message ID | 89c875f974d1ad81ddd03f664ef08e397771d224.1704891443.git.mail@cbaines.net |
---|---|
State | New |
Headers | show |
Series | [bug#68266,v2] guix: store: Add report-object-cache-duplication. | expand |
Christopher Baines <mail@cbaines.net> skribis: > This is intended to help with spotting duplication in the object cache, so > where many keys, for example package records map to the same derivation. This > represents an opportunity for improved performance if you can reduce this > duplication in the cache, and better take advantage of the already present > cache entries. Another way to detect this is by looking at ‘add-data-to-store-cache’ stats: --8<---------------cut here---------------start------------->8--- $ GUIX_PROFILING="add-data-to-store-cache object-cache" guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d /gnu/store/mivzv83wryv9gp5bjncg5m1831dx2xwr-circtools-1.0.0.drv /gnu/store/4xf7kh9mi0vpvs8m1ak4x8w1rpsdpv6z-du-dust-0.8.6.drv /gnu/store/ayk54gvlbc1qam6irzf9kaig56dhzni0-wlgreet-0.4.1.drv /gnu/store/r601i40cii9ic5w1k4hy5c2yngfayh64-greetd-0.9.0.drv Object Cache: fresh caches: 22 lookups: 40435 hits: 36821 (91.1%) cache size: 3613 entries 'add-data-to-store' cache: lookups: 4090 hits: 958 (23.4%) .drv files: 3062 (74.9%) Scheme files: 916 (22.4%) $ GUIX_PROFILING="add-data-to-store-cache object-cache" ./pre-inst-env guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d /gnu/store/1wsldmvigjb8w2gk418npbnfznlb0ck1-circtools-1.0.0.drv /gnu/store/b5c73fawjdvkgy431qxz9l6l9y9a9lhz-du-dust-0.8.6.drv /gnu/store/zwc7qzsbzf62dgbbzy74lki4hsr406bw-wlgreet-0.4.1.drv /gnu/store/vjdd23hc82701afb132z1ajcqa7hfd74-greetd-0.9.0.drv Object Cache: fresh caches: 22 lookups: 37942 hits: 34523 (91.0%) cache size: 3418 entries 'add-data-to-store' cache: lookups: 3895 hits: 763 (19.6%) .drv files: 2957 (75.9%) Scheme files: 826 (21.2%) --8<---------------cut here---------------end--------------->8--- Ideally, the hit rate there would be 0% and we could remove it. If there’s a positive hit rate, it means we keep adding the same .drv and/or *-builder files to the store, meaning that the object cache was ineffective. > +(define* (report-object-cache-duplication store #:key (threshold 10) > + (port (current-error-port))) Do you have an example output of this? How helpful does it look to you in practice? I guess it we’d need to add a ‘register-profiling-hook!’ call somewhere? If we were to add this, I’d suggest adding a docstring and following the same style as the rest of the code (no long identifiers for local variables and no car/cdr, for example). Thanks, Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Christopher Baines <mail@cbaines.net> skribis: > >> This is intended to help with spotting duplication in the object cache, so >> where many keys, for example package records map to the same derivation. This >> represents an opportunity for improved performance if you can reduce this >> duplication in the cache, and better take advantage of the already present >> cache entries. > > Another way to detect this is by looking at ‘add-data-to-store-cache’ > stats: > > $ GUIX_PROFILING="add-data-to-store-cache object-cache" guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d > /gnu/store/mivzv83wryv9gp5bjncg5m1831dx2xwr-circtools-1.0.0.drv > /gnu/store/4xf7kh9mi0vpvs8m1ak4x8w1rpsdpv6z-du-dust-0.8.6.drv > /gnu/store/ayk54gvlbc1qam6irzf9kaig56dhzni0-wlgreet-0.4.1.drv > /gnu/store/r601i40cii9ic5w1k4hy5c2yngfayh64-greetd-0.9.0.drv > Object Cache: > fresh caches: 22 > lookups: 40435 > hits: 36821 (91.1%) > cache size: 3613 entries > > 'add-data-to-store' cache: > lookups: 4090 > hits: 958 (23.4%) > .drv files: 3062 (74.9%) > Scheme files: 916 (22.4%) > $ GUIX_PROFILING="add-data-to-store-cache object-cache" ./pre-inst-env guix build --no-grafts greetd wlgreet du-dust circtools --target=aarch64-linux-gnu -d > /gnu/store/1wsldmvigjb8w2gk418npbnfznlb0ck1-circtools-1.0.0.drv > /gnu/store/b5c73fawjdvkgy431qxz9l6l9y9a9lhz-du-dust-0.8.6.drv > /gnu/store/zwc7qzsbzf62dgbbzy74lki4hsr406bw-wlgreet-0.4.1.drv > /gnu/store/vjdd23hc82701afb132z1ajcqa7hfd74-greetd-0.9.0.drv > Object Cache: > fresh caches: 22 > lookups: 37942 > hits: 34523 (91.0%) > cache size: 3418 entries > > 'add-data-to-store' cache: > lookups: 3895 > hits: 763 (19.6%) > .drv files: 2957 (75.9%) > Scheme files: 826 (21.2%) > > Ideally, the hit rate there would be 0% and we could remove it. > > If there’s a positive hit rate, it means we keep adding the same .drv > and/or *-builder files to the store, meaning that the object cache was > ineffective. > >> +(define* (report-object-cache-duplication store #:key (threshold 10) >> + (port (current-error-port))) > > Do you have an example output of this? > > How helpful does it look to you in practice? Yep, so here's some output I get from computing all the cross derivations to i586-pc-gnu: value #<derivation /gnu/store/lqxlksh00cshc816xqfq47r3jjdfj2p9-subversion-1.14.2.drv => /gnu/store/92avphfdcrcaxx8m5a6ihmw558bj3np8-subversion-1.14.2 7fcfbe5f9280> cached 4174 times example keys: - #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> - #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> - #<file-append #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> "/bin/svn"> - #<file-append #<package subversion@1.14.2 gnu/packages/version-control.scm:2316 7fcfc1c924d0> "/bin/svn"> So it's not immediately obvious what the issue is, but if you search for "/bin/svn", then you find the file-append calls in svn-fetch, so there's a couple of new file-append records added to the cache for each svn-fetch. value /gnu/store/qi1km3qlv5hdsql6h3ibwvz6v4z8lqbz-ld-wrapper.in cached 755 times example keys: - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>> - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>> - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>> - #<<local-file> file: "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in" absolute: #<promise "/home/chris/Projects/Guix/guix/gnu/packages/ld-wrapper.in"> name: "ld-wrapper.in" recursive?: #t select?: #<procedure true (file stat)>> Similar to file-append, it's not immediately obvious where all these local-file's are coming from, but searching for ld-wrapper.in suggests make-ld-wrapper. My thinking here is that maybe it's worth not caching everything in the object cache. Particularly for file-append, I'm not sure what the cache is actually doing. As for local-file, given there's lower level caching, maybe it's worth leaning on that and not caching local-file either. I tried this out, and it seemed to not make performance worse at least, and it did remove the duplication from the cache. Other things that show up include: value #<derivation /gnu/store/z28jfsm43wg50cvdfq0548pbf5jfhk8r-binutils-cross-i586-pc-gnu-2.38.drv => /gnu/store/lkmwq399jllig2a5r323v6y9nflpn6gn-binutils-cross-i586-pc-gnu-2.38 7fcfbd358000> cached 1228 times example keys: - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3bb8f0> - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b6630> - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b6420> - #<package binutils-cross-i586-pc-gnu@2.38 gnu/packages/cross-base.scm:79 7fcfbf3b62c0>
diff --git a/guix/store.scm b/guix/store.scm index 97c4f32a5b..86ca293cac 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -70,6 +70,7 @@ (define-module (guix store) current-store-protocol-version ;for internal use cache-lookup-recorder ;for internal use mcached + report-object-cache-duplication &store-error store-error? &store-connection-error store-connection-error? @@ -2037,6 +2038,64 @@ (define-syntax mcached ((_ mvalue object keys ...) (mcached eq? mvalue object keys ...)))) +(define* (report-object-cache-duplication store #:key (threshold 10) + (port (current-error-port))) + (define cache-values-to-keys + (make-hash-table)) + + (define (insert key val) + (hash-set! + cache-values-to-keys + key + (or (and=> (hash-ref cache-values-to-keys + key) + (lambda (existing-values) + (cons val existing-values))) + (list val)))) + + (let* ((cache-size + (vhash-fold + (lambda (key value result) + (match value + ((item . keys*) + (insert item key))) + + (+ 1 result)) + 0 + (store-connection-cache store %object-cache-id))) + (cached-values-by-key-count + (sort + (hash-map->list + (lambda (key value) + (cons key (length value))) + cache-values-to-keys) + (lambda (a b) + (< (cdr a) (cdr b)))))) + + (filter-map + (match-lambda + ((value . count) + (if (>= count threshold) + (begin + (when port + (simple-format port "value ~A cached ~A times\n" value count) + (simple-format port "example keys:\n")) + + (let ((keys (hash-ref cache-values-to-keys value))) + (when port + (for-each + (lambda (key) + (simple-format #t " - ~A\n" key)) + (if (> count 10) + (take keys 10) + keys)) + (newline port)) + + `((value . ,value) + (keys . ,keys)))) + #f))) + cached-values-by-key-count))) + (define (preserve-documentation original proc) "Return PROC with documentation taken from ORIGINAL." (set-object-property! proc 'documentation