mbox

[bug#39258,0/4] Optimize guix search

Message ID 20200601000030.7443-1-arunisaac@systemreboot.net
Headers show

Message

Arun Isaac June 1, 2020, midnight UTC
Hi,

Sorry for the long delay in replying to this thread.

I think Ludo is right in that we can improve guix search performance with only
simple code improvements rather than including xapian or improving our
existing cache. Here are a few patches on those lines.

In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
we might as well not match the remaining regexps. Patch 1 does this early cut
off optimization.

Often our search strings are only literal strings. So, we can save some time
by using string-contains instead of invoking the regexp engine. Patch 2 does
this. In addition, guile's string-contains uses a naive O(n^2) string search
algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
mentions this. If implemented, the KMP algorithm could speed up guix search
further.

[1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Patch 3 and 4 are minor improvements.

Here's a rough performance comparison.

--8<---------------cut here---------------start------------->8---
time ./pre-inst-env guix search game

real	0m2.261s
user	0m2.351s
sys	0m0.104s
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
time guix search game

real	0m2.661s
user	0m2.843s
sys	0m0.080s
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
time ./pre-inst-env guix search strategy game

real	0m1.613s
user	0m1.635s
sys	0m0.096s
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
time guix search strategy game

real	0m2.520s
user	0m2.583s
sys	0m0.112s
--8<---------------cut here---------------end--------------->8---

Arun Isaac (4):
  ui: Cut off search early if any regexp does not match.
  ui: Use string matching with literal search strings.
  ui: Do not translate package synopsis a second time.
  ui: Use package-description-string.

 guix/scripts/package.scm | 12 +++++--
 guix/ui.scm              | 68 ++++++++++++++++++++++++----------------
 2 files changed, 50 insertions(+), 30 deletions(-)

Comments

Simon Tournier June 1, 2020, 1:25 a.m. UTC | #1
Hi Arun,

On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac@systemreboot.net> wrote:

> Sorry for the long delay in replying to this thread.

Based on the Ludo's comments [1] on v4 which is a simple re-write of
your v3, I am finishing a vN+1.. but time flies and I am late on the
topic too. :-)

Well, this still unsent vN+1 series has the same performance of v4 on
"guix pull" which is a key point compared to v3.  Obviously, the
performance on "guix search" are equivalent on both version.  This
vN+1 builds two caches -- to avoid binary breakage -- in only one go;
the consuming 'fold-modules-public-variables*' is applied only once.

[1] http://issues.guix.gnu.org/39258#93


> I think Ludo is right in that we can improve guix search performance with only
> simple code improvements rather than including xapian or improving our
> existing cache. Here are a few patches on those lines.

Well, improving the cache is easy; at least as you did in v3 by adding
another one.
The most annoying part is the arguments rewrite of 'package->recutils'
to be compliant.

However after some comparisons, I am not convinced that BM25 will be
worth to implement...


> In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
> we might as well not match the remaining regexps. Patch 1 does this early cut
> off optimization.

Interesting.


> Often our search strings are only literal strings. So, we can save some time
> by using string-contains instead of invoking the regexp engine. Patch 2 does
> this. In addition, guile's string-contains uses a naive O(n^2) string search
> algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
> fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
> mentions this. If implemented, the KMP algorithm could speed up guix search
> further.
>
> [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Really interesting idea,


> Patch 3 and 4 are minor improvements.
>
> Here's a rough performance comparison.

On cold or warm cache?


> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search game
>
> real    0m2.261s
> user    0m2.351s
> sys     0m0.104s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search game
>
> real    0m2.661s
> user    0m2.843s
> sys     0m0.080s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search strategy game
>
> real    0m1.613s
> user    0m1.635s
> sys     0m0.096s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search strategy game
>
> real    0m2.520s
> user    0m2.583s
> sys     0m0.112s
> --8<---------------cut here---------------end--------------->8---

So in the best case, you have the ratio old/new is 1.5; this new
version is 1.5 faster.

Well, in the extra cache approach (v3 or v4) the ration old/new is
really higher: 3.1 faster on cold cache (which is the one I am
interested in) and 2.4 faster on warm cache.


I will give a look to this new series and report what happens on my
laptop.  But basically, I would like "guix search" under the 1.0
second on my machine. ;-)


Thank you for this new input.

Cheers,
simon