From patchwork Sat Sep 4 21:17:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: M X-Patchwork-Id: 32625 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 4DA5D27BBE3; Sat, 4 Sep 2021 22:18:14 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,T_DKIM_INVALID,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id D9CD127BBE1 for ; Sat, 4 Sep 2021 22:18:11 +0100 (BST) Received: from localhost ([::1]:54782 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mMd3B-00085D-Qy for patchwork@mira.cbaines.net; Sat, 04 Sep 2021 17:18:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33506) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mMd34-00084A-FN for guix-patches@gnu.org; Sat, 04 Sep 2021 17:18:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37253) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mMd34-0005TU-7q for guix-patches@gnu.org; Sat, 04 Sep 2021 17:18:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mMd33-0004Ap-VC for guix-patches@gnu.org; Sat, 04 Sep 2021 17:18:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#50384] [PATCH] Optimise search-patch (reducing I/O) Resent-From: Maxime Devos Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sat, 04 Sep 2021 21:18:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 50384 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 50384@debbugs.gnu.org X-Debbugs-Original-To: guix-patches@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.163079026116016 (code B ref -1); Sat, 04 Sep 2021 21:18:01 +0000 Received: (at submit) by debbugs.gnu.org; 4 Sep 2021 21:17:41 +0000 Received: from localhost ([127.0.0.1]:48799 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mMd2d-0004A7-33 for submit@debbugs.gnu.org; Sat, 04 Sep 2021 17:17:41 -0400 Received: from lists.gnu.org ([209.51.188.17]:58188) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mMd2a-00049z-E6 for submit@debbugs.gnu.org; Sat, 04 Sep 2021 17:17:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33482) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mMd2Z-00083q-Pf for guix-patches@gnu.org; Sat, 04 Sep 2021 17:17:31 -0400 Received: from michel.telenet-ops.be ([2a02:1800:110:4::f00:18]:54422) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mMd2V-00054j-Ch for guix-patches@gnu.org; Sat, 04 Sep 2021 17:17:31 -0400 Received: from butterfly.local ([188.189.242.219]) by michel.telenet-ops.be with bizsmtp id pxHK250094kjswU06xHLsQ; Sat, 04 Sep 2021 23:17:21 +0200 Message-ID: <8900fa8c8eef7f72fc97adc2408be26c88de7803.camel@telenet.be> From: Maxime Devos Date: Sat, 04 Sep 2021 23:17:10 +0200 User-Agent: Evolution 3.34.2 MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telenet.be; s=r21; t=1630790241; bh=4FgmsEgZZw7uozIaHnfeIECoft4mn5joQfLTPGH+bqI=; h=Subject:From:To:Date; b=sS2gVWDMRuSWHb3TjAuaSmIbDSyMe9hX05oXgF0WwGuk7QdIoY1MXJ9MxtH5Lqa2R tJihf7VcDR3kSk6YG2Ejsst8MN0jHkJQWEVb66ClYBtmAdqHE//OPKbmXSgpTYs3Vq hw0TnCA/cGIOnXK+5btKXGmTVNeNlSq/XTBuPR7DWGTnFzNqaJ3Rk243eu1Jti7YGu 0o3aMhVdC9Zcz8DCdKQEDSjCliDs7mCTLqWsfqe0TqUByM/Va87jPkoHye95E4O7Ed nYl5VMMw+bg4ENdy4mNaUsx2iY/WbPc6fK4wR6hUnfIBhwiz+PslUR4ttvGeDPQpK/ FflKy32xl3xRw== Received-SPF: pass client-ip=2a02:1800:110:4::f00:18; envelope-from=maximedevos@telenet.be; helo=michel.telenet-ops.be X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: "Guix-patches" X-getmail-retrieved-from-mailbox: Patches Hi guix! The attached patch series optimises the G-exp compiler for local-file, by avoiding interning the file if it's already in the store, but only if the hash is known in advance. To take advantage of this, 'search-patch' has been modified to compute the hash at expansion time. The cost of this optimisation is a little additional complexity, and computing derivations theoretically becomes a little more expensive when the patch isn't already in the store (1 call to 'stat' per non-yet-interned patch). If you want to test this patch series for performance, do _not_ run ./pre-inst-env guix, instead use "guix pull --url=$PWD --branch=...", because the guix from the checkout performs more 'lstat' calls than an ‘user’ guix from "guix pull". I'll show the patch series decreases the number of syscalls below, using the 'strace' command. Use 'strace -c' to gather some statistics: # Run it twice for a hot cache. Ignore the output of the first run. $ strace -c ./the-optimised-guix/bin/guix build -d pigx --no-grafts $ strace -c ./the-optimised-guix/bin/guix build -d pigx --no-grafts # $ strace -c guix build -d pigx --no-grafts $ strace -c guix build -d pigx --no-grafts I've selected some syscalls from the output that seemed relevant and formatted the call count in a table optimised unoptimised result of optimisation: stat 3865 3712 + 4.1% lstat 119 321 -62.9% fstat 59 59 unchanged read 17303 17688 - 2.2% write 6741 6767 - 1.9% openat 885 1076 -17.8% readlink 14 16 -12.5% ------- total 28886 32539 -11.2% Almost all syscalls are now called less (-11.2% in total), which is good. The exception is 'stat'. Because 'search-path' is now being called less often (only when the patch isn't in the store), the number of 'stat' calls decreases. However, 'local-file-compiler' now calls 'stat' more (one or two times per patch). I think it's worth it though, because: (1) the second 'stat' is on the same file as the first 'stat', so presumably the kernel has cached the result, so no need to wait for I/O to complete the second time (there's a context switch though). So ignoring the context switch cost, there are only ‘effectively’ +2.1% extra calls to 'stat'. (2) the total decrease of -11.2% syscalls Now, what about the actual "time to derivation"? First, let's time "guix build -d pigx --no-grafts" to get some raw numbers on guix before the optimisation: time guix build -d pigx --no-grafts # repeated four times, first output is discarded # to eliminate hot/cold cache differences /gnu/store/03vmq94ckxfx6c4rc9zh745yy63n5i5m-pigx-0.0.3.drv real 0m13,470s user 0m13,526s sys 0m0,573s /gnu/store/03vmq94ckxfx6c4rc9zh745yy63n5i5m-pigx-0.0.3.drv real 0m13,582s user 0m13,639s sys 0m0,568s /gnu/store/03vmq94ckxfx6c4rc9zh745yy63n5i5m-pigx-0.0.3.drv real 0m13,834s user 0m13,901s sys 0m0,556s Average numbers: real 0m13,629s user 0m13,689s sys 0m0,566s After the optimisation: time ./the-optimised-guix/bin/guix build -d pigx --no-grafts /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m14,150s user 0m13,979s sys 0m0,685s /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m13,781s user 0m13,697s sys 0m0,580s /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m14,247s user 0m14,160s sys 0m0,548s The numbers are higher somehow after the optimisations? Even the 'sys' time is higher, even though there are less syscalls? I re-ran the time commands, and got a decrease in 'real' time this time. /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m13,304s user 0m13,146s sys 0m0,609s /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m12,132s user 0m11,940s sys 0m0,589s /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv real 0m13,716s user 0m13,723s sys 0m0,529s The output of "time ..." seems inconclusive (can possibly be attributed to things like CPU frequency changing?), but the decrease in syscall counts seems quite nice to me. Feel free to run your own tests! Greetings, Maxime. From 0fc54bdd9ccc9729fff54f5935a552e5e608a1d0 Mon Sep 17 00:00:00 2001 From: Maxime Devos Date: Sat, 4 Sep 2021 18:10:32 +0200 Subject: [PATCH 6/6] gexp: Do not intern if the file is already in the store. * guix/gexp.scm (local-file-compiler): When the file is already in the store, re-use the fixed output path instead of interning the file again. --- guix/gexp.scm | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/guix/gexp.scm b/guix/gexp.scm index c69e4aa299..da1e918801 100644 --- a/guix/gexp.scm +++ b/guix/gexp.scm @@ -531,13 +531,37 @@ appears." (define-gexp-compiler (local-file-compiler (file ) system target) ;; "Compile" FILE by adding it to the store. (match file - (($ file (= force absolute) name sha256 recursive? select?) - ;; Canonicalize FILE so that if it's a symlink, it is resolved. Failing - ;; to do that, when RECURSIVE? is #t, we could end up creating a dangling - ;; symlink in the store, and when RECURSIVE? is #f 'add-to-store' would - ;; just throw an error, both of which are inconvenient. - (interned-file absolute name - #:recursive? recursive? #:select? select?)))) + ;; Delay computing the absolute file name until 'intern', as this + ;; might be a relatively expensive computation (e.g. if search-patch + ;; is used), especially on a spinning disk. + (($ file absolute-promise name sha256 recursive? select?) + (let () + (define (intern) + ;; Canonicalize FILE so that if it's a symlink, it is resolved. + ;; Failing to do that, when RECURSIVE? is #t, we could end up creating + ;; a dangling symlink in the store, and when RECURSIVE? is #f + ;; 'add-to-store' would just throw an error, both of which are + ;; inconvenient. + (interned-file (force absolute-promise) name + #:recursive? recursive? #:select? select?)) + (if sha256 + (let ((path (fixed-output-path name sha256 #:recursive? recursive?))) + ;; If the hash is known in advance and the store already has the + ;; item, there is no need to intern the file. + (if (file-exists? path) + (mbegin %store-monad + ;; Tell the GC that PATH will be used, such that it won't + ;; be deleted. + ((store-lift add-temp-root) path) + ;; The GC could have deleted the item before add-temp-root + ;; completed, so check again if PATH exists. + (if (file-exists? path) + (return path) + ;; If it has been removed, fall-back interning. + (intern))) + ;; If PATH does not yet exist, fall back to interning. + (intern))) + (intern)))))) (define-record-type (%plain-file name content references) -- 2.33.0