Message ID | 20200227204150.30985-1-arunisaac@systemreboot.net |
---|---|
Headers | show |
Hi Arun, Really cool! Thank you! On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote: > * Speed improvement > > Despite search-package-index in gnu/packages.scm taking only around 1.5ms, I > see an overall speedup in `guix search` of only a factor of 2 -- from around > 2s to around 1s. I wonder what else in `guix search` is taking up so much > time. Interesting... maybe an hidden 'fold-packages'? Well, I have not yet looked into your code. > * Currently indexing only the package descriptions > > In this patchset, I have only indexed the package descriptions. In the next > version of this patchset, I will index all other terms as specified in > %package-metrics of guix/ui.scm. Yes, it appears to me a detail that should be easy to fix. I mean, it does not seems blocking. > * Should I add guile-xapian as a propagated input to guix in > gnu/packages/package-management.scm? IMHO, yes. I mean, I guess. :-) > * Drop regexp search support > > In this patchset, I have retained the older regexp search support. But, I > think we should drop it and only have xapian search. In cases where the search > index is not authoritative, we can build an in-memory xapian search index on > the fly and use it to search. This will slow down the search, but will ensure > our search results are consistent and do not depend on the authoritativeness > of the search index. I understand why you have turned off the regexp support. It is not necessary at the first experimentation to see if it is worth the addition or not. So, before investigating how some better regexp could be used with Xapian, let start to benchmark Xapian vs plain 'fold-packages'. > * Commit messages > > Except for patch 1, I am not sure what prefixes (build-self, gnu, etc.) to use > in the first line of the commit message. Some advice there would be helpful. I cannot help. )-: All the best, simon
Hi Pierre, On Fri, 28 Feb 2020 at 09:13, Pierre Neidhardt <mail@ambrevar.xyz> wrote: > Beside this issue, how do you test it? I guess we first need to install > a bunch of package with `pre-inst-env guix ...` then to a `pre-inst-env search`? It is not searching in the installed packages but in all the packages. So, to test it, you need to "./pre-inst-env guix pull -p" or something like that to populate the Xapian index database. Then "./pre-inst-env guix search" will lookup into. I mean, it is how I understand it should work. I have not yet looked into the code. Cheers, simon
zimoun <zimon.toutoune@gmail.com> writes: > Hi Pierre, > > On Fri, 28 Feb 2020 at 09:13, Pierre Neidhardt <mail@ambrevar.xyz> wrote: > >> Beside this issue, how do you test it? I guess we first need to install >> a bunch of package with `pre-inst-env guix ...` then to a `pre-inst-env search`? > > It is not searching in the installed packages but in all the packages. > So, to test it, you need to "./pre-inst-env guix pull -p" or something > like that to populate the Xapian index database. Then "./pre-inst-env > guix search" will lookup into. > I mean, it is how I understand it should work. I have not yet looked > into the code. What I meant with "install a bunch of packages" is "guix pull -p", is you said. Xapian cache is populated as a hook of guix pull if I got it correctly.
> I can't build your patch though: > > ice-9/eval.scm:293:34: no code for module (xapian xapian) Sorry, I forgot to mention this in my patch cover letter. The above error is happening because of the new guile-xapian dependency. It's a little tricky to get right at the moment. Here goes. Drop into a guix development environment. $ guix environment guix Commit patch 1 (the patch that adds guile-xapian) alone, and build. $ git am 0001-gnu-Add-guile-xapian.patch $ make Then, drop into an environment where guile-xapian is available. $ ./pre-inst-env guix environment guix --ad-hoc guile-xapian Apply the other 3 patches and build. $ git am 0002-build-self-Add-guile-xapian-to-Guix-dependencies.patch 0003-gnu-Generate-xapian-package-search-index.patch 0004-gnu-Use-xapian-index-for-package-search.patch $ make Now, the build should have completed successfully. Let's do a test guix pull to actually test the new guix search. $ ./pre-inst-env guix pull -p /tmp/test Then, run the guix search in /tmp/test. $ /tmp/test/bin/guix search game That's it! :-) This whole process will be simpler if the guile-xapian package is pushed to master and guile-xapian added as an input to the guix package in gnu/packages/package-management.scm. But, for now...
> $ ./pre-inst-env guix pull -p /tmp/test
One mistake. This command should be
./pre-inst-env guix pull --url=$PWD --branch=xapian -p /tmp/test
where xapian is the name of the branch you committed the patches to.
Also, I acknowledge the corrections you both suggested. I will
incorporate them in v2 of the patchset.
> This whole process will be simpler if the guile-xapian package is pushed > to master and guile-xapian added as an input to the guix package in > gnu/packages/package-management.scm. But, for now... Shall I push patch 1 (add guile-xapian) alone to master?
Hi Arun,
On Sat, 29 Feb 2020 at 09:25, Arun Isaac <arunisaac@systemreboot.net> wrote:
> Shall I push patch 1 (add guile-xapian) alone to master?
Yes, it seems a good idea and it will ease the process for building
and then benchmarking the "guix search" via Xapian.
All the best,
simon
Hi Arun,
Do you have some benchmark in mind?
On Fri, 28 Feb 2020 at 17:05, Arun Isaac <arunisaac@systemreboot.net> wrote:
> ./pre-inst-env guix pull --url=$PWD --branch=xapian -p /tmp/test
We need to benchmark on different machines the new "guix pull". Well,
it is nothing compared to the derivation computations. :-)
And more importantly, 'make as-derivations' to avoid a "guix pull" breakage,
Then on cold caches, the new "guix search" for a couple of query.
There is no so much inspiration in tests/. :-)
Ah do not forget to adapt some tests.
All the best,
simon
Hi, After a quick benchmark: a. It is faster. Between x2 and x3. Really? b. The xapian relevance should truncated and examined in more details. --8<---------------cut here---------------start------------->8--- time guix search emacs | recsel -p name,relevance | head -n18 name: emacs relevance: 33 name: emacs-with-editor relevance: 19 name: emacs-restart-emacs relevance: 19 name: emacs-epkg relevance: 18 name: guile-emacs relevance: 17 name: emacs-xwidgets relevance: 17 real 0m1.530s user 0m1.827s sys 0m0.074s --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- time /tmp/test/bin/guix search emacs | recsel -p name,relevance | head -n18 name: emacs-helm-pass relevance: 5.0774748262821685 name: emacs-spark relevance: 4.898640632723127 name: emacs-evil-smartparens relevance: 4.898640632723127 name: emacs-howm relevance: 4.8638448958830685 name: emacs-el-mock relevance: 4.8638448958830685 name: emacs-strace-mode relevance: 4.693676055650271 real 0m0.440s user 0m0.482s sys 0m0.058s --8<---------------cut here---------------end--------------->8--- Here for example, Xapian does not return the package 'emacs' itself as the first. And worse, it is not returned at all. That's said, I do not know if it is really faster since: --8<---------------cut here---------------start------------->8--- guix search emacs | recsel -C -P name | wc -l 829 --8<---------------cut here---------------end--------------->8--- and --8<---------------cut here---------------start------------->8--- /tmp/test/bin/guix search emacs | recsel -C -P name | wc -l 10 --8<---------------cut here---------------end--------------->8--- Maybe I am doing a mistake. Well, thank you Arun for the Xapian bindings which will improve the searching experience. :-) And now it needs some polishing. All the best simo
Hi, On Mon, 2 Mar 2020 at 20:13, zimoun <zimon.toutoune@gmail.com> wrote: > --8<---------------cut here---------------start------------->8--- > /tmp/test/bin/guix search emacs | recsel -C -P name | wc -l > 10 > --8<---------------cut here---------------end--------------->8--- > > Maybe I am doing a mistake. I think this issue is fixed when changing the 'pagesize' value. Well, with '(pagesize 4294967295)' and using the same commit (c1febbbf94), I get: --8<---------------cut here---------------start------------->8--- guix time-machine --commit=c1febbbf94 -- guix search games | recsel -C -p name | wc -l 247 ./pre-inst-env guix search games | recsel -C -p name | wc -l 236 --8<---------------cut here---------------end--------------->8--- (I modified the patches in order to pull once to generate the index at commit c1febbbf94 and then do some stuff.) Note that the old "guix search" does not output blender and Xapian does even if the term 'games' is not in the description but 'game' is. Well, I am comparing the different list, i.e., "guix search games | recsel -C -P name | sort" to see which one is in one list and not the other one. But before going more ahead, let polish a bit the patches to more easily test without the double environment etc. And because I am using good old HDD and some SSD comparison should be welcome. All the best, simon
Hello Arun, Arun Isaac <arunisaac@systemreboot.net> skribis: > * Speed improvement > > Despite search-package-index in gnu/packages.scm taking only around 1.5ms, I > see an overall speedup in `guix search` of only a factor of 2 -- from around > 2s to around 1s. I wonder what else in `guix search` is taking up so much > time. Note that ‘guix search’ time is largely dominated by I/O. On my laptop, I get (first measurement is cold cache, second one is warm cache): --8<---------------cut here---------------start------------->8--- $ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' $ time guix search foo >/dev/null real 0m2.631s user 0m1.134s sys 0m0.124s $ time guix search foo >/dev/null real 0m0.836s user 0m1.027s sys 0m0.053s --8<---------------cut here---------------end--------------->8--- It’s hard to do better on the warm cache case because at this level, there may be other things to optimize having little to do with searching itself. Note that this is on an SSD; the cold-cache case must be worse on NFS or on a spinning disk, and there we could gain a lot. I think we should weigh the pros and cons on all these aspects: speed, complexity and maintenance cost, search result quality, search features, etc. Thanks, Ludo’. PS: I have not yet looked at the whole series as I’m just coming back to the keyboard. :-)