From patchwork Wed Jan 10 12:57:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christopher Baines X-Patchwork-Id: 58767 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 9519827BBEA; Wed, 10 Jan 2024 14:17:33 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_PASS autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id E84D327BBE2 for ; Wed, 10 Jan 2024 14:17:32 +0000 (GMT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rNZOO-0001F7-Ut; Wed, 10 Jan 2024 09:17:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rNZOD-0001En-N9 for guix-patches@gnu.org; Wed, 10 Jan 2024 09:17:07 -0500 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rNZOC-0000q2-M9; Wed, 10 Jan 2024 09:17:05 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rNZO9-0000qU-VD; Wed, 10 Jan 2024 09:17:01 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#68266] [PATCH v2] guix: store: Add report-object-cache-duplication. References: <87plyfrb2x.fsf@cbaines.net> In-Reply-To: <87plyfrb2x.fsf@cbaines.net> Resent-From: Christopher Baines Original-Sender: "Debbugs-submit" Resent-CC: guix@cbaines.net, dev@jpoiret.xyz, ludo@gnu.org, othacehe@gnu.org, rekado@elephly.net, zimon.toutoune@gmail.com, me@tobias.gr, guix-patches@gnu.org Resent-Date: Wed, 10 Jan 2024 14:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 68266 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 68266@debbugs.gnu.org Cc: Christopher Baines , Josselin Poiret , Ludovic =?utf-8?q?Court=C3=A8s?= , Mathieu Othacehe , Ricardo Wurmus , Simon Tournier , Tobias Geerinckx-Rice X-Debbugs-Original-Xcc: Christopher Baines , Josselin Poiret , Ludovic =?utf-8?q?Court=C3=A8s?= , Mathieu Othacehe , Ricardo Wurmus , Simon Tournier , Tobias Geerinckx-Rice Received: via spool by 68266-submit@debbugs.gnu.org id=B68266.17048961631254 (code B ref 68266); Wed, 10 Jan 2024 14:17:01 +0000 Received: (at 68266) by debbugs.gnu.org; 10 Jan 2024 14:16:03 +0000 Received: from localhost ([127.0.0.1]:39381 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rNZNC-0000JT-KS for submit@debbugs.gnu.org; Wed, 10 Jan 2024 09:16:03 -0500 Received: from mira.cbaines.net ([212.71.252.8]:43096) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rNZNA-0000EG-Bt for 68266@debbugs.gnu.org; Wed, 10 Jan 2024 09:16:00 -0500 Received: from localhost (unknown [217.155.61.229]) by mira.cbaines.net (Postfix) with ESMTPSA id ADD2F27BBE9 for <68266@debbugs.gnu.org>; Wed, 10 Jan 2024 12:57:24 +0000 (GMT) Received: from localhost (localhost [local]) by localhost (OpenSMTPD) with ESMTPA id 32e0a20a for <68266@debbugs.gnu.org>; Wed, 10 Jan 2024 12:57:24 +0000 (UTC) From: Christopher Baines Date: Wed, 10 Jan 2024 12:57:23 +0000 Message-ID: <89c875f974d1ad81ddd03f664ef08e397771d224.1704891443.git.mail@cbaines.net> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches This is intended to help with spotting duplication in the object cache, so where many keys, for example package records map to the same derivation. This represents an opportunity for improved performance if you can reduce this duplication in the cache, and better take advantage of the already present cache entries. I'm thinking this can be used by the data service, but maybe it could also be worked in to guix commands. * guix/store.scm (report-object-cache-duplication): New procedure. Change-Id: Ia6c816f871d10cae6807543224250110099d764f --- guix/store.scm | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) base-commit: e541f9593f8bfc84b6140c2408b393243289fae6 diff --git a/guix/store.scm b/guix/store.scm index 97c4f32a5b..86ca293cac 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -70,6 +70,7 @@ (define-module (guix store) current-store-protocol-version ;for internal use cache-lookup-recorder ;for internal use mcached + report-object-cache-duplication &store-error store-error? &store-connection-error store-connection-error? @@ -2037,6 +2038,64 @@ (define-syntax mcached ((_ mvalue object keys ...) (mcached eq? mvalue object keys ...)))) +(define* (report-object-cache-duplication store #:key (threshold 10) + (port (current-error-port))) + (define cache-values-to-keys + (make-hash-table)) + + (define (insert key val) + (hash-set! + cache-values-to-keys + key + (or (and=> (hash-ref cache-values-to-keys + key) + (lambda (existing-values) + (cons val existing-values))) + (list val)))) + + (let* ((cache-size + (vhash-fold + (lambda (key value result) + (match value + ((item . keys*) + (insert item key))) + + (+ 1 result)) + 0 + (store-connection-cache store %object-cache-id))) + (cached-values-by-key-count + (sort + (hash-map->list + (lambda (key value) + (cons key (length value))) + cache-values-to-keys) + (lambda (a b) + (< (cdr a) (cdr b)))))) + + (filter-map + (match-lambda + ((value . count) + (if (>= count threshold) + (begin + (when port + (simple-format port "value ~A cached ~A times\n" value count) + (simple-format port "example keys:\n")) + + (let ((keys (hash-ref cache-values-to-keys value))) + (when port + (for-each + (lambda (key) + (simple-format #t " - ~A\n" key)) + (if (> count 10) + (take keys 10) + keys)) + (newline port)) + + `((value . ,value) + (keys . ,keys)))) + #f))) + cached-values-by-key-count))) + (define (preserve-documentation original proc) "Return PROC with documentation taken from ORIGINAL." (set-object-property! proc 'documentation