diff mbox series

[bug#39258,v3,1/3] guix: Generate package metadata cache.

Message ID 20200327162654.18785-2-arunisaac@systemreboot.net
State Work in progress
Headers show
Series Package metadata cache for guix search | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job

Commit Message

Arun Isaac March 27, 2020, 4:26 p.m. UTC
* gnu/packages.scm (%package-metadata-cache-file): New variable.
(generate-package-metadata-cache): New function.
* guix/channels.scm (package-metadata-cache-file): New function.
(%channel-profile-hooks): Add package-metadata-cache-file.
---
 gnu/packages.scm  | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
 guix/channels.scm | 34 +++++++++++++++++++++++++++++++-
 2 files changed, 82 insertions(+), 2 deletions(-)

Comments

Ludovic Courtès April 24, 2020, 8:48 p.m. UTC | #1
Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> skribis:

> * gnu/packages.scm (%package-metadata-cache-file): New variable.
> (generate-package-metadata-cache): New function.
> * guix/channels.scm (package-metadata-cache-file): New function.
> (%channel-profile-hooks): Add package-metadata-cache-file.

This is short and sweet, nice!

> +  (define (expand-cache package result)
> +    (cons `#(,(package-name package)
> +             ,(package-version package)
> +             ,(delete-duplicates
> +               (map package-full-name
> +                    (sort (filter package? (package-direct-inputs package))
> +                          package<?)))
> +             ,(package-outputs package)
> +             ,(package-supported-systems package)
> +             ,(package-synopsis package)
> +             ,(package-description package)
> +             ,(package-home-page package)
> +             ,(let ((location (package-location package)))
> +                (list (location-file location)
> +                      (location-line location)
> +                      (location-column location))))

I was wondering if we could omit inputs, which are not that useful.

Apart from that it LGTM.

Note that this is probably the place where we could eventually add the
computation of an inverted index like zimoun suggested in
<https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>.

> +                                  #:properties '((type . profile-hook)
> +                                                 (hook . package-cache))

‘package-metadata-cache’, even (it’s for UI purposes).

Nitpick: I’d use “packages:” as the prefix in the subject line.

Thanks,
Ludo’.
Simon Tournier April 26, 2020, 9:48 a.m. UTC | #2
On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote:

> > +  (define (expand-cache package result)
> > +    (cons `#(,(package-name package)
> > +             ,(package-version package)
> > +             ,(delete-duplicates
> > +               (map package-full-name
> > +                    (sort (filter package? (package-direct-inputs package))
> > +                          package<?)))
> > +             ,(package-outputs package)
> > +             ,(package-supported-systems package)
> > +             ,(package-synopsis package)
> > +             ,(package-description package)
> > +             ,(package-home-page package)
> > +             ,(let ((location (package-location package)))
> > +                (list (location-file location)
> > +                      (location-line location)
> > +                      (location-column location))))
>
> I was wondering if we could omit inputs, which are not that useful.

Agree.


> Note that this is probably the place where we could eventually add the
> computation of an inverted index like zimoun suggested in
> <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>.

We should first agree on the extra cost (time) we are ready to pay to
build improvements.
See the lengthy message [1] about only the caching "inverted index"
using the current 'relevance' scoring function.

[1] http://issues.guix.gnu.org/39258#78



Cheers,
simon
Ludovic Courtès April 26, 2020, 2:35 p.m. UTC | #3
Hi Simon,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

>> Note that this is probably the place where we could eventually add the
>> computation of an inverted index like zimoun suggested in
>> <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>.
>
> We should first agree on the extra cost (time) we are ready to pay to
> build improvements.

It’s complicated.  As it stands, I’d rather not add overhead to ‘guix
pull’, especially since current ‘guix search’ on my SSD is fast enough
and can hardly be made any faster.

Realistically though, I understand that things are different on slower
machines and/or spinning disks.  That’s why I’m interested in seeing how
Arun’s proposed changes can affect such machines.

If, as a bonus, it allows us to have an inverted index and thus improve
the quality of search results, that’s great!

Thanks,
Ludo’.
Pierre Neidhardt April 26, 2020, 2:54 p.m. UTC | #4
Hi Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

> It’s complicated.  As it stands, I’d rather not add overhead to ‘guix
> pull’, especially since current ‘guix search’ on my SSD is fast enough
> and can hardly be made any faster.

The question is, what is fast enough?  I have an NVMe here that has a
throughput of some 2GB/s, and yet

--8<---------------cut here---------------start------------->8---
time guix search emacs > /dev/null
real    0m1.545s
user    0m1.938s
sys     0m0.080s
--8<---------------cut here---------------end--------------->8---

on a hot cache, which is too slow in my opinion :p

Mildly impatient users might be slightly discouraged from iterating
search queries.

It also makes `guix search` very impractical to use in (non-guile)
script.  Which is too bad considering that the recsel-formatting makes
`guix search` a very good candidate for scripting.

Cheers!
Simon Tournier April 26, 2020, 3:05 p.m. UTC | #5
Hi Ludo,

On Sun, 26 Apr 2020 at 16:35, Ludovic Courtès <ludo@gnu.org> wrote:

> Realistically though, I understand that things are different on slower
> machines and/or spinning disks.  That’s why I’m interested in seeing how
> Arun’s proposed changes can affect such machines.

I understand. I have done a small benchmark [1] of the 3 ways: the
current, the v2 using Xapian (which is not an option on the long term)
and the v3.

My "slower" machine is at my office... but it provides already
interesting numbers, IMHO.

[1] http://issues.guix.gnu.org/39258#78


> If, as a bonus, it allows us to have an inverted index and thus improve
> the quality of search results, that’s great!

This "issue" is: any improvement on both sides performance and
accuracy would add an somehow extra cost. The question is what is the
maximum users would accept to pay for?


Well, it is complicated as you said. :-)
A trade off between extra cost, maintenance, complexity, etc is not
easy to draw, as you said too elsewhere.
I am seeing all that as experimental: explore ideas to see if they are
worth or not.
And what should be concluded now could change in the (near) future;
for example if the computations of derivations are faster, resulting
on "guix pull" faster, etc..


Cheer,
simon
Ludovic Courtès April 26, 2020, 3:33 p.m. UTC | #6
Hey!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> It’s complicated.  As it stands, I’d rather not add overhead to ‘guix
>> pull’, especially since current ‘guix search’ on my SSD is fast enough
>> and can hardly be made any faster.
>
> The question is, what is fast enough?  I have an NVMe here that has a
> throughput of some 2GB/s, and yet
>
> time guix search emacs > /dev/null
> real    0m1.545s
> user    0m1.938s
> sys     0m0.080s

That accounts for the time to render 864 entries:

  $ guix search emacs| grep ^name| wc -l
  864

Compare with:

  $ time guix search emacs | head -100 > /dev/null

  real    0m0.674s
  user    0m0.802s
  sys     0m0.048s

Again, this is not to say it cannot be improved, but it’s quite a
challenge to do better on such hardware.

Though as discussed with Arun, there may be low-hanging optimization
fruits in Texinfo parsing and rendering.  I guess we need to go ahead
fire up statprof now.  :-)

Ludo’.
diff mbox series

Patch

diff --git a/gnu/packages.scm b/gnu/packages.scm
index d22c992bb1..c0b527acf0 100644
--- a/gnu/packages.scm
+++ b/gnu/packages.scm
@@ -4,6 +4,7 @@ 
 ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org>
 ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com>
 ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>
+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -64,7 +65,8 @@ 
             specification->location
             specifications->manifest
 
-            generate-package-cache))
+            generate-package-cache
+            generate-package-metadata-cache))
 
 ;;; Commentary:
 ;;;
@@ -426,6 +428,52 @@  reducing the memory footprint."
                                #:opts '(#:to-file? #t)))))
   cache-file)
 
+(define %package-metadata-cache-file
+  ;; Location of the package metadata cache.
+  "/lib/guix/package-metadata.cache")
+
+(define (generate-package-metadata-cache directory)
+  "Generate under DIRECTORY a cache of the metadata of all available packages.
+
+The primary purpose of this cache is to speed up package metadata lookup
+during package search so that we don't have to traverse and load all the
+package modules."
+  (define cache-file
+    (string-append directory %package-metadata-cache-file))
+
+  (define (package<? p1 p2)
+    (string<? (package-full-name p1) (package-full-name p2)))
+
+  (define (expand-cache package result)
+    (cons `#(,(package-name package)
+             ,(package-version package)
+             ,(delete-duplicates
+               (map package-full-name
+                    (sort (filter package? (package-direct-inputs package))
+                          package<?)))
+             ,(package-outputs package)
+             ,(package-supported-systems package)
+             ,(package-synopsis package)
+             ,(package-description package)
+             ,(package-home-page package)
+             ,(let ((location (package-location package)))
+                (list (location-file location)
+                      (location-line location)
+                      (location-column location))))
+          result))
+
+  (define exp
+    (fold-packages expand-cache '()))
+
+  (mkdir-p (dirname cache-file))
+  (call-with-output-file cache-file
+    (lambda (port)
+      (put-bytevector port
+                      (compile `'(,@exp)
+                               #:to 'bytecode
+                               #:opts '(#:to-file? #t)))))
+  cache-file)
+
 
 (define %sigint-prompt
   ;; The prompt to jump to upon SIGINT.
diff --git a/guix/channels.scm b/guix/channels.scm
index f0261dc2da..c4efaa7300 100644
--- a/guix/channels.scm
+++ b/guix/channels.scm
@@ -2,6 +2,7 @@ 
 ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net>
 ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org>
+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -581,9 +582,40 @@  be used as a profile hook."
                                                  (hook . package-cache))
                                   #:local-build? #t)))
 
+(define (package-metadata-cache-file manifest)
+  "Build a package metadata cache file for the instance in MANIFEST.  This is
+meant to be used as a profile hook."
+  (mlet %store-monad ((profile (profile-derivation manifest
+                                                   #:hooks '())))
+
+    (define build
+      #~(begin
+          (use-modules (gnu packages))
+
+          (if (defined? 'generate-package-metadata-cache)
+              (begin
+                ;; Delegate package cache generation to the inferior.
+                (format (current-error-port)
+                        "Generating package metadata cache for '~a'...~%"
+                        #$profile)
+                (generate-package-metadata-cache #$output))
+              (mkdir #$output))))
+
+    (gexp->derivation-in-inferior "guix-package-metadata-cache" build
+                                  profile
+
+                                  ;; If the Guix in PROFILE is too old and
+                                  ;; lacks 'guix repl', don't build the cache
+                                  ;; instead of failing.
+                                  #:silent-failure? #t
+
+                                  #:properties '((type . profile-hook)
+                                                 (hook . package-cache))
+                                  #:local-build? #t)))
+
 (define %channel-profile-hooks
   ;; The default channel profile hooks.
-  (cons package-cache-file %default-profile-hooks))
+  (cons* package-cache-file package-metadata-cache-file %default-profile-hooks))
 
 (define (channel-instances->derivation instances)
   "Return the derivation of the profile containing INSTANCES, a list of