Message ID | 20200601000030.7443-1-arunisaac@systemreboot.net |
---|---|
Headers | show |
Hi Arun, On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac@systemreboot.net> wrote: > Sorry for the long delay in replying to this thread. Based on the Ludo's comments [1] on v4 which is a simple re-write of your v3, I am finishing a vN+1.. but time flies and I am late on the topic too. :-) Well, this still unsent vN+1 series has the same performance of v4 on "guix pull" which is a key point compared to v3. Obviously, the performance on "guix search" are equivalent on both version. This vN+1 builds two caches -- to avoid binary breakage -- in only one go; the consuming 'fold-modules-public-variables*' is applied only once. [1] http://issues.guix.gnu.org/39258#93 > I think Ludo is right in that we can improve guix search performance with only > simple code improvements rather than including xapian or improving our > existing cache. Here are a few patches on those lines. Well, improving the cache is easy; at least as you did in v3 by adding another one. The most annoying part is the arguments rewrite of 'package->recutils' to be compliant. However after some comparisons, I am not convinced that BM25 will be worth to implement... > In `relevance`, we set our score to 0 if any of the regexps don't match. Then, > we might as well not match the remaining regexps. Patch 1 does this early cut > off optimization. Interesting. > Often our search strings are only literal strings. So, we can save some time > by using string-contains instead of invoking the regexp engine. Patch 2 does > this. In addition, guile's string-contains uses a naive O(n^2) string search > algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In > fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code > mentions this. If implemented, the KMP algorithm could speed up guix search > further. > > [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm Really interesting idea, > Patch 3 and 4 are minor improvements. > > Here's a rough performance comparison. On cold or warm cache? > --8<---------------cut here---------------start------------->8--- > time ./pre-inst-env guix search game > > real 0m2.261s > user 0m2.351s > sys 0m0.104s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time guix search game > > real 0m2.661s > user 0m2.843s > sys 0m0.080s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time ./pre-inst-env guix search strategy game > > real 0m1.613s > user 0m1.635s > sys 0m0.096s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time guix search strategy game > > real 0m2.520s > user 0m2.583s > sys 0m0.112s > --8<---------------cut here---------------end--------------->8--- So in the best case, you have the ratio old/new is 1.5; this new version is 1.5 faster. Well, in the extra cache approach (v3 or v4) the ration old/new is really higher: 3.1 faster on cold cache (which is the one I am interested in) and 2.4 faster on warm cache. I will give a look to this new series and report what happens on my laptop. But basically, I would like "guix search" under the 1.0 second on my machine. ;-) Thank you for this new input. Cheers, simon