From patchwork Thu Dec 29 20:18:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arun Isaac X-Patchwork-Id: 1168 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 39EA127BBEB; Thu, 29 Dec 2022 20:19:34 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id 5B16027BBE9 for ; Thu, 29 Dec 2022 20:19:32 +0000 (GMT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pAzMn-0005n9-HK; Thu, 29 Dec 2022 15:19:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAzMl-0005ms-JO for guix-patches@gnu.org; Thu, 29 Dec 2022 15:19:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pAzMl-0004eK-49 for guix-patches@gnu.org; Thu, 29 Dec 2022 15:19:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pAzMk-0007Gt-HF for guix-patches@gnu.org; Thu, 29 Dec 2022 15:19:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#60410] [PATCH 0/7] mumi: Boolean prefixes in xapian indexing and others Resent-From: Arun Isaac Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 29 Dec 2022 20:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 60410 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 60410@debbugs.gnu.org, rekado@elephly.net Cc: Arun Isaac X-Debbugs-Original-To: guix-patches@gnu.org, Ricardo Wurmus Received: via spool by submit@debbugs.gnu.org id=B.167234512527926 (code B ref -1); Thu, 29 Dec 2022 20:19:02 +0000 Received: (at submit) by debbugs.gnu.org; 29 Dec 2022 20:18:45 +0000 Received: from localhost ([127.0.0.1]:32985 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAzMT-0007GM-5G for submit@debbugs.gnu.org; Thu, 29 Dec 2022 15:18:45 -0500 Received: from lists.gnu.org ([209.51.188.17]:54252) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAzMR-0007GC-68 for submit@debbugs.gnu.org; Thu, 29 Dec 2022 15:18:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAzMR-0005kN-0p for guix-patches@gnu.org; Thu, 29 Dec 2022 15:18:43 -0500 Received: from mugam.systemreboot.net ([139.59.75.54]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAzMO-0004d6-2K for guix-patches@gnu.org; Thu, 29 Dec 2022 15:18:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=systemreboot.net; s=default; h=Content-Transfer-Encoding:MIME-Version: Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Svxwvue+mDX1Yys/m0cjnSWxmLI2jK+RD1+75y/rVTE=; b=L1J9+1gVkh7yofd6tVCAtO0KcE 7MGEL4HBb/XeZb04nw44Xa1cd6nXh0xb5w03Tny75y8N4Q9ZVeGVwRI3Baoh9Q6uwY3/piJWRA18U BqpsR0G4heQ+RwM61EgEIhvevRTQS4sDpjH4L5Tr3mBFniFvHekdk696+DopYV2qbbRHlvatd1I/I AdeaT1KUaBI3nuD6Be4wQFHO0Na6fL7fzo9+KVZr6/XHoc/96+cRUeqBPGIGLLn4/MpLB+E7KnBnK fS+P4UfEtk0oKRYRHEmldZCFf3tRYnMJoYOsY49rUgee0ogyg0isr/Ma/rzQ4IAaKCZ4qTHbcn+xB XnybUNmw==; Received: from [192.168.2.1] (port=45786 helo=localhost.localdomain) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1pAzME-000oI2-0R; Fri, 30 Dec 2022 01:48:30 +0530 From: Arun Isaac Date: Thu, 29 Dec 2022 20:18:09 +0000 Message-Id: <20221229201809.27997-1-arunisaac@systemreboot.net> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Received-SPF: pass client-ip=139.59.75.54; envelope-from=arunisaac@systemreboot.net; helo=mugam.systemreboot.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches Hi Ricardo, This is a patchset that has been sleeping for some time in my local git repo. So, I thought it was about time to send it over! The main change is that some xapian prefixes should be indexed as boolean prefixes. This makes the use of an implicit AND operator unneccessary and lets xapian do the natural thing of ordering results by relevance. I believe this improves the search significantly. Also, since we retrieve search results by relevance, we can offload limiting of search results to xapian. Thus, we improve performance as well. For this patchset to be useful, mumi's xapian index will have to be rebuilt. In general, it is good to periodically rebuilt the xapian index from scratch. Regards, Arun Arun Isaac (7): xapian: Index several terms as boolean and without positions. xapian: Declare some prefixes as boolean. xapian: Do not override the default OR implicit query operator. messages: Remove unused set intersection feature in search-bugs. messages: Offload limiting search results to xapian. cache: Specify that cache! returns the cached value. xapian: Preserve order of search results. mumi/cache.scm | 3 +- mumi/messages.scm | 29 ++++-------- mumi/xapian.scm | 109 +++++++++++++++++++++++++++++++--------------- 3 files changed, 86 insertions(+), 55 deletions(-)