From patchwork Thu Jan 23 19:51:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Arun Isaac X-Patchwork-Id: 19962 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 21C6017B3A; Thu, 23 Jan 2020 19:53:15 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTP id 83CE917B38 for ; Thu, 23 Jan 2020 19:53:14 +0000 (GMT) Received: from localhost ([::1]:60784 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iuiXR-0001Wj-Ss for patchwork@mira.cbaines.net; Thu, 23 Jan 2020 14:53:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:39100) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iuiXI-0001Ks-T9 for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iuiXG-0000Lj-Jv for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:47235) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iuiXG-0000LS-FP for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iuiXG-0001A4-Cy for guix-patches@gnu.org; Thu, 23 Jan 2020 14:53:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#39258] Faster guix search using an sqlite cache Resent-From: Arun Isaac Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 23 Jan 2020 19:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 39258 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: 39258@debbugs.gnu.org X-Debbugs-Original-To: guix-patches@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.15798091534426 (code B ref -1); Thu, 23 Jan 2020 19:53:02 +0000 Received: (at submit) by debbugs.gnu.org; 23 Jan 2020 19:52:33 +0000 Received: from localhost ([127.0.0.1]:53208 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iuiWn-00019K-8s for submit@debbugs.gnu.org; Thu, 23 Jan 2020 14:52:33 -0500 Received: from lists.gnu.org ([209.51.188.17]:55538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iuiWk-00019A-Ju for submit@debbugs.gnu.org; Thu, 23 Jan 2020 14:52:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38921) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iuiWh-0008EY-Ap for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iuiWe-00087M-FS for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:26 -0500 Received: from mugam.systemreboot.net ([139.59.75.54]:60578) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iuiWd-00082X-4H for guix-patches@gnu.org; Thu, 23 Jan 2020 14:52:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=systemreboot.net; s=default; h=Content-Type:MIME-Version:Message-ID:Date: Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=m58VqhXMlYNbWwgkqnBfSm2SHXEBcN2pNnULcRktOt0=; b=lxKPiLfEtbVbt5FKBuTu0mWTDC vVyWwl+Ps0w4MfGSzWCpqt36Yvby5jcnx5o4BkhoKEii3opfHxnin65h/IynHQxTP3DjU8DMP72y9 tsh9x6HEU/Y7q3hJHhg882OczgmkYenh6z5R/zKtLzShLalFSabpDgQoHLBIpqA3DUFY=; Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1iuiWX-0009z3-Pm for guix-patches@gnu.org; Fri, 24 Jan 2020 01:22:17 +0530 From: Arun Isaac Date: Fri, 24 Jan 2020 01:21:57 +0530 Message-ID: MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: "Guix-patches" X-getmail-retrieved-from-mailbox: Patches Hi, As discussed on guix-devel at https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00310.html , I am working on an sqlite cache to improve guix search performance. I have attached a highly incomplete WIP patch. The patch attempts to reimplement the package-cache-file hook in guix/channels.scm using a sqlite database. To this end, it rewrites most of the generate-package-cache and cache-lookup functions in gnu/packages.scm. I am yet to hook this up to guix search. At the moment, I am having some difficulty populating the sqlite database. generate-package-cache populates the database correctly when invoked from a normal guile REPL using geiser, but fails to do so when run by the guix daemon during guix pull. I ran guix pull using $ ./pre-inst-env guix pull --url=$PWD --branch=search -p /tmp/test where search is the branch I am working on. Running $ ls /tmp/test/lib/guix -lh shows total 2.1M -r--r--r-- 2 root root 2.1M ஜன. 1 1970 package-cache.sqlite -r--r--r-- 2 root root 26K ஜன. 1 1970 package-cache.sqlite-journal On examining package-cache.sqlite, I find that no records have been written. And, there is a lingering journal file that shouldn't be there. For some reason, populating the sqlite database does not work with guix pull. sqlite probably crashes and leaves the journal file. If I try to populate the database with each package record being inserted in its own transaction, at least some of the insertions work. But the journal file still lingers. My unverified guess is that everything except the last transaction was successful. Any ideas what's going on? Also, inserting each package in its own transaction is ridiculously slow and so that is out of the question. See https://www.sqlite.org/faq.html#q19 From d1305351a90a84eb75e4769284d5e06927eade3e Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Tue, 21 Jan 2020 20:45:43 +0530 Subject: [PATCH] fast search --- build-aux/build-self.scm | 5 + gnu/packages.scm | 207 +++++++++++++++++++++++---------------- 2 files changed, 128 insertions(+), 84 deletions(-) diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm index fc13032b73..c123ad3b11 100644 --- a/build-aux/build-self.scm +++ b/build-aux/build-self.scm @@ -264,6 +264,9 @@ interface (FFI) of Guile.") (define fake-git (scheme-file "git.scm" #~(define-module (git)))) + (define fake-sqlite3 + (scheme-file "sqlite3.scm" #~(define-module (sqlite3)))) + (with-imported-modules `(((guix config) => ,(make-config.scm)) @@ -278,6 +281,8 @@ interface (FFI) of Guile.") ;; (git) to placate it. ((git) => ,fake-git) + ((sqlite3) => ,fake-sqlite3) + ,@(source-module-closure `((guix store) (guix self) (guix derivations) diff --git a/gnu/packages.scm b/gnu/packages.scm index d22c992bb1..4e2c52e62d 100644 --- a/gnu/packages.scm +++ b/gnu/packages.scm @@ -43,6 +43,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39) + #:use-module (sqlite3) #:export (search-patch search-patches search-auxiliary-file @@ -204,10 +205,8 @@ PROC is called along these lines: PROC can use #:allow-other-keys to ignore the bits it's not interested in. When a package cache is available, this procedure does not actually load any package module." - (define cache - (load-package-cache (current-profile))) - - (if (and cache (cache-is-authoritative?)) + (if (and (cache-is-authoritative?) + (current-profile)) (vhash-fold (lambda (name vector result) (match vector (#(name version module symbol outputs @@ -220,7 +219,7 @@ package module." #:supported? supported? #:deprecated? deprecated?)))) init - cache) + (cache-lookup (current-profile))) (fold-packages (lambda (package result) (proc (package-name package) (package-version package) @@ -252,31 +251,7 @@ is guaranteed to never traverse the same package twice." (define %package-cache-file ;; Location of the package cache. - "/lib/guix/package.cache") - -(define load-package-cache - (mlambda (profile) - "Attempt to load the package cache. On success return a vhash keyed by -package names. Return #f on failure." - (match profile - (#f #f) - (profile - (catch 'system-error - (lambda () - (define lst - (load-compiled (string-append profile %package-cache-file))) - (fold (lambda (item vhash) - (match item - (#(name version module symbol outputs - supported? deprecated? - file line column) - (vhash-cons name item vhash)))) - vlist-null - lst)) - (lambda args - (if (= ENOENT (system-error-errno args)) - #f - (apply throw args)))))))) + "/lib/guix/package-cache.sqlite") (define find-packages-by-name/direct ;bypass the cache (let ((packages (delay @@ -297,25 +272,57 @@ decreasing version order." matching) matching))))) -(define (cache-lookup cache name) +(define* (cache-lookup profile #:optional name) "Lookup package NAME in CACHE. Return a list sorted in increasing version order." (define (package-version? (vector-ref v2 1) (vector-ref v1 1))) - (sort (vhash-fold* cons '() name cache) - package-versionboolean n) + (case n + ((0) #f) + ((1) #t))) + + (define (string->list str) + (call-with-input-string str read)) + + (define select-statement + (string-append + "SELECT name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn from packages" + (if name " WHERE name = :name" ""))) + + (define cache-file + (string-append profile %package-cache-file)) + + (let* ((db (sqlite-open cache-file SQLITE_OPEN_READONLY)) + (statement (sqlite-prepare db select-statement))) + (when name + (sqlite-bind-arguments statement #:name name)) + (let ((result (sqlite-fold (lambda (v result) + (match v + (#(name version module symbol outputs supported superseded file line column) + (cons + (vector name + version + (string->list module) + (string->symbol symbol) + (string->list outputs) + (int->boolean supported) + (int->boolean superseded) + (list file line column)) + result)))) + '() statement))) + (sqlite-finalize statement) + (sqlite-close db) + (sort result package-versionstring x) + (call-with-output-string (cut write x <>))) + (define (generate-package-cache directory) "Generate under DIRECTORY a cache of all the available packages. @@ -381,49 +388,81 @@ reducing the memory footprint." (define cache-file (string-append directory %package-cache-file)) - (define (expand-cache module symbol variable result+seen) + (define schema + "CREATE TABLE packages (name text, +version text, +module text, +symbol text, +outputs text, +supported int, +superseded int, +locationFile text, +locationLine int, +locationColumn int); +CREATE VIRTUAL TABLE packageSearch USING fts5(name, searchText);") + + (define insert-statement + "INSERT INTO packages(name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn) +VALUES(:name, :version, :module, :symbol, :outputs, :supported, :superseded, :locationfile, :locationline, :locationcolumn)") + + (define insert-package-search-statement + "INSERT INTO packageSearch(name, searchText) VALUES(:name, :searchtext)") + + (define (boolean->int x) + (if x 1 0)) + + (define (list->string x) + (call-with-output-string (cut write x <>))) + + (define (insert-package db module symbol variable seen) (match (false-if-exception (variable-ref variable)) ((? package? package) - (match result+seen - ((result . seen) - (if (or (vhash-assq package seen) - (hidden-package? package)) - (cons result seen) - (cons (cons `#(,(package-name package) - ,(package-version package) - ,(module-name module) - ,symbol - ,(package-outputs package) - ,(->bool (supported-package? package)) - ,(->bool (package-superseded package)) - ,@(let ((loc (package-location package))) - (if loc - `(,(location-file loc) - ,(location-line loc) - ,(location-column loc)) - '(#f #f #f)))) - result) - (vhash-consq package #t seen)))))) - (_ - result+seen))) - - (define exp - (first - (fold-module-public-variables* expand-cache - (cons '() vlist-null) - (all-modules (%package-module-path) - #:warn - warn-about-load-error)))) + (cond + ((or (vhash-assq package seen) + (hidden-package? package)) + seen) + (else + (let ((statement (sqlite-prepare db insert-statement))) + (sqlite-bind-arguments statement + #:name (package-name package) + #:version (package-version package) + #:module (list->string (module-name module)) + #:symbol (symbol->string symbol) + #:outputs (list->string (package-outputs package)) + #:supported (boolean->int (supported-package? package)) + #:superseded (boolean->int (package-superseded package)) + #:locationfile (cond + ((package-location package) => location-file) + (else #f)) + #:locationline (cond + ((package-location package) => location-line) + (else #f)) + #:locationcolumn (cond + ((package-location package) => location-column) + (else #f))) + (sqlite-fold cons '() statement) + (sqlite-finalize statement)) + (let ((statement (sqlite-prepare db insert-package-search-statement))) + (sqlite-bind-arguments statement + #:name (package-name package) + #:searchtext (package-description package)) + (sqlite-fold cons '() statement) + (sqlite-finalize statement)) + (vhash-consq package #t seen)))) + (_ seen))) (mkdir-p (dirname cache-file)) - (call-with-output-file cache-file - (lambda (port) - ;; Store the cache as a '.go' file. This makes loading fast and reduces - ;; heap usage since some of the static data is directly mmapped. - (put-bytevector port - (compile `'(,@exp) - #:to 'bytecode - #:opts '(#:to-file? #t))))) + (let ((db (sqlite-open cache-file))) + (sqlite-exec db schema) + (sqlite-exec db "BEGIN") + (fold-module-public-variables* (cut insert-package db <> <> <> <>) + vlist-null + (all-modules (%package-module-path) + #:warn + warn-about-load-error)) + (sqlite-exec db "COMMIT;") + (sqlite-close db)) + cache-file) -- 2.23.0