Message ID | 20200327162654.18785-2-arunisaac@systemreboot.net |
---|---|
State | Work in progress |
Headers | show |
Series | Package metadata cache for guix search | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
Hi Arun, Arun Isaac <arunisaac@systemreboot.net> skribis: > * gnu/packages.scm (%package-metadata-cache-file): New variable. > (generate-package-metadata-cache): New function. > * guix/channels.scm (package-metadata-cache-file): New function. > (%channel-profile-hooks): Add package-metadata-cache-file. This is short and sweet, nice! > + (define (expand-cache package result) > + (cons `#(,(package-name package) > + ,(package-version package) > + ,(delete-duplicates > + (map package-full-name > + (sort (filter package? (package-direct-inputs package)) > + package<?))) > + ,(package-outputs package) > + ,(package-supported-systems package) > + ,(package-synopsis package) > + ,(package-description package) > + ,(package-home-page package) > + ,(let ((location (package-location package))) > + (list (location-file location) > + (location-line location) > + (location-column location)))) I was wondering if we could omit inputs, which are not that useful. Apart from that it LGTM. Note that this is probably the place where we could eventually add the computation of an inverted index like zimoun suggested in <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>. > + #:properties '((type . profile-hook) > + (hook . package-cache)) ‘package-metadata-cache’, even (it’s for UI purposes). Nitpick: I’d use “packages:” as the prefix in the subject line. Thanks, Ludo’.
On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote: > > + (define (expand-cache package result) > > + (cons `#(,(package-name package) > > + ,(package-version package) > > + ,(delete-duplicates > > + (map package-full-name > > + (sort (filter package? (package-direct-inputs package)) > > + package<?))) > > + ,(package-outputs package) > > + ,(package-supported-systems package) > > + ,(package-synopsis package) > > + ,(package-description package) > > + ,(package-home-page package) > > + ,(let ((location (package-location package))) > > + (list (location-file location) > > + (location-line location) > > + (location-column location)))) > > I was wondering if we could omit inputs, which are not that useful. Agree. > Note that this is probably the place where we could eventually add the > computation of an inverted index like zimoun suggested in > <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>. We should first agree on the extra cost (time) we are ready to pay to build improvements. See the lengthy message [1] about only the caching "inverted index" using the current 'relevance' scoring function. [1] http://issues.guix.gnu.org/39258#78 Cheers, simon
Hi Simon, zimoun <zimon.toutoune@gmail.com> skribis: > On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote: [...] >> Note that this is probably the place where we could eventually add the >> computation of an inverted index like zimoun suggested in >> <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>. > > We should first agree on the extra cost (time) we are ready to pay to > build improvements. It’s complicated. As it stands, I’d rather not add overhead to ‘guix pull’, especially since current ‘guix search’ on my SSD is fast enough and can hardly be made any faster. Realistically though, I understand that things are different on slower machines and/or spinning disks. That’s why I’m interested in seeing how Arun’s proposed changes can affect such machines. If, as a bonus, it allows us to have an inverted index and thus improve the quality of search results, that’s great! Thanks, Ludo’.
Hi Ludo! Ludovic Courtès <ludo@gnu.org> writes: > It’s complicated. As it stands, I’d rather not add overhead to ‘guix > pull’, especially since current ‘guix search’ on my SSD is fast enough > and can hardly be made any faster. The question is, what is fast enough? I have an NVMe here that has a throughput of some 2GB/s, and yet --8<---------------cut here---------------start------------->8--- time guix search emacs > /dev/null real 0m1.545s user 0m1.938s sys 0m0.080s --8<---------------cut here---------------end--------------->8--- on a hot cache, which is too slow in my opinion :p Mildly impatient users might be slightly discouraged from iterating search queries. It also makes `guix search` very impractical to use in (non-guile) script. Which is too bad considering that the recsel-formatting makes `guix search` a very good candidate for scripting. Cheers!
Hi Ludo, On Sun, 26 Apr 2020 at 16:35, Ludovic Courtès <ludo@gnu.org> wrote: > Realistically though, I understand that things are different on slower > machines and/or spinning disks. That’s why I’m interested in seeing how > Arun’s proposed changes can affect such machines. I understand. I have done a small benchmark [1] of the 3 ways: the current, the v2 using Xapian (which is not an option on the long term) and the v3. My "slower" machine is at my office... but it provides already interesting numbers, IMHO. [1] http://issues.guix.gnu.org/39258#78 > If, as a bonus, it allows us to have an inverted index and thus improve > the quality of search results, that’s great! This "issue" is: any improvement on both sides performance and accuracy would add an somehow extra cost. The question is what is the maximum users would accept to pay for? Well, it is complicated as you said. :-) A trade off between extra cost, maintenance, complexity, etc is not easy to draw, as you said too elsewhere. I am seeing all that as experimental: explore ideas to see if they are worth or not. And what should be concluded now could change in the (near) future; for example if the computations of derivations are faster, resulting on "guix pull" faster, etc.. Cheer, simon
Hey! Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> It’s complicated. As it stands, I’d rather not add overhead to ‘guix >> pull’, especially since current ‘guix search’ on my SSD is fast enough >> and can hardly be made any faster. > > The question is, what is fast enough? I have an NVMe here that has a > throughput of some 2GB/s, and yet > > time guix search emacs > /dev/null > real 0m1.545s > user 0m1.938s > sys 0m0.080s That accounts for the time to render 864 entries: $ guix search emacs| grep ^name| wc -l 864 Compare with: $ time guix search emacs | head -100 > /dev/null real 0m0.674s user 0m0.802s sys 0m0.048s Again, this is not to say it cannot be improved, but it’s quite a challenge to do better on such hardware. Though as discussed with Arun, there may be low-hanging optimization fruits in Texinfo parsing and rendering. I guess we need to go ahead fire up statprof now. :-) Ludo’.
diff --git a/gnu/packages.scm b/gnu/packages.scm index d22c992bb1..c0b527acf0 100644 --- a/gnu/packages.scm +++ b/gnu/packages.scm @@ -4,6 +4,7 @@ ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;; @@ -64,7 +65,8 @@ specification->location specifications->manifest - generate-package-cache)) + generate-package-cache + generate-package-metadata-cache)) ;;; Commentary: ;;; @@ -426,6 +428,52 @@ reducing the memory footprint." #:opts '(#:to-file? #t))))) cache-file) +(define %package-metadata-cache-file + ;; Location of the package metadata cache. + "/lib/guix/package-metadata.cache") + +(define (generate-package-metadata-cache directory) + "Generate under DIRECTORY a cache of the metadata of all available packages. + +The primary purpose of this cache is to speed up package metadata lookup +during package search so that we don't have to traverse and load all the +package modules." + (define cache-file + (string-append directory %package-metadata-cache-file)) + + (define (package<? p1 p2) + (string<? (package-full-name p1) (package-full-name p2))) + + (define (expand-cache package result) + (cons `#(,(package-name package) + ,(package-version package) + ,(delete-duplicates + (map package-full-name + (sort (filter package? (package-direct-inputs package)) + package<?))) + ,(package-outputs package) + ,(package-supported-systems package) + ,(package-synopsis package) + ,(package-description package) + ,(package-home-page package) + ,(let ((location (package-location package))) + (list (location-file location) + (location-line location) + (location-column location)))) + result)) + + (define exp + (fold-packages expand-cache '())) + + (mkdir-p (dirname cache-file)) + (call-with-output-file cache-file + (lambda (port) + (put-bytevector port + (compile `'(,@exp) + #:to 'bytecode + #:opts '(#:to-file? #t))))) + cache-file) + (define %sigint-prompt ;; The prompt to jump to upon SIGINT. diff --git a/guix/channels.scm b/guix/channels.scm index f0261dc2da..c4efaa7300 100644 --- a/guix/channels.scm +++ b/guix/channels.scm @@ -2,6 +2,7 @@ ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net> ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;; @@ -581,9 +582,40 @@ be used as a profile hook." (hook . package-cache)) #:local-build? #t))) +(define (package-metadata-cache-file manifest) + "Build a package metadata cache file for the instance in MANIFEST. This is +meant to be used as a profile hook." + (mlet %store-monad ((profile (profile-derivation manifest + #:hooks '()))) + + (define build + #~(begin + (use-modules (gnu packages)) + + (if (defined? 'generate-package-metadata-cache) + (begin + ;; Delegate package cache generation to the inferior. + (format (current-error-port) + "Generating package metadata cache for '~a'...~%" + #$profile) + (generate-package-metadata-cache #$output)) + (mkdir #$output)))) + + (gexp->derivation-in-inferior "guix-package-metadata-cache" build + profile + + ;; If the Guix in PROFILE is too old and + ;; lacks 'guix repl', don't build the cache + ;; instead of failing. + #:silent-failure? #t + + #:properties '((type . profile-hook) + (hook . package-cache)) + #:local-build? #t))) + (define %channel-profile-hooks ;; The default channel profile hooks. - (cons package-cache-file %default-profile-hooks)) + (cons* package-cache-file package-metadata-cache-file %default-profile-hooks)) (define (channel-instances->derivation instances) "Return the derivation of the profile containing INSTANCES, a list of