From patchwork Mon Jun 21 06:12:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Maxim Cournoyer X-Patchwork-Id: 30562 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id F043327BC81; Mon, 21 Jun 2021 07:13:20 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,T_DKIM_INVALID,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id 6BF4527BC78 for ; Mon, 21 Jun 2021 07:13:20 +0100 (BST) Received: from localhost ([::1]:38608 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lvDBP-00081p-Gj for patchwork@mira.cbaines.net; Mon, 21 Jun 2021 02:13:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47944) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lvDBA-0007w9-Pw for guix-patches@gnu.org; Mon, 21 Jun 2021 02:13:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:51897) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lvDBA-0001tp-HG for guix-patches@gnu.org; Mon, 21 Jun 2021 02:13:04 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lvDBA-0006cw-DJ for guix-patches@gnu.org; Mon, 21 Jun 2021 02:13:04 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#49149] [PATCH 5/7] pack: Prevent duplicate files in tar archives. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Mon, 21 Jun 2021 06:13:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 49149 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 49149@debbugs.gnu.org Cc: Maxim Cournoyer Received: via spool by 49149-submit@debbugs.gnu.org id=B49149.162425597225399 (code B ref 49149); Mon, 21 Jun 2021 06:13:04 +0000 Received: (at 49149) by debbugs.gnu.org; 21 Jun 2021 06:12:52 +0000 Received: from localhost ([127.0.0.1]:35200 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lvDAx-0006bU-RU for submit@debbugs.gnu.org; Mon, 21 Jun 2021 02:12:52 -0400 Received: from mail-qv1-f41.google.com ([209.85.219.41]:36371) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lvDAw-0006aY-B0 for 49149@debbugs.gnu.org; Mon, 21 Jun 2021 02:12:50 -0400 Received: by mail-qv1-f41.google.com with SMTP id im10so6893973qvb.3 for <49149@debbugs.gnu.org>; Sun, 20 Jun 2021 23:12:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9sYSuoxh/TXX0vpZJIGqEGuZnOV29nWz3iCoA1cYbX4=; b=jIeO8o2AdhRHLnGOzTu6AGjp0/kTxZ4lzWCmSNYQ3X/7LL+3v0WSaq9x6FZ3T8x7x5 5o/TImIr9ZXSRqnW3UTXZ9Vw4yf8RysFt1kGT8a9b5bLXE9gtz5+EytNTFJZpY6HLnHN JCObUfKTG2lrO4Ai3FjR+TUI7089/LRAHVUF5WecUf7sBfBmcQTV2qneAG6NDNUYdACh Iz4TyNCvAcd8IzpipIp/HYbUgYGQP9O0eCbtEvEsplyJlyMKAEhGDjW/Bc575fYy/g0J 4mvCHxrax2zZo6WS/O4x5v/ar8Uu31YaFitZWaBTq249/6A19Ka+/F+c+twJpEPjgmzC t5rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9sYSuoxh/TXX0vpZJIGqEGuZnOV29nWz3iCoA1cYbX4=; b=KEo0e1WysN2ez6pFhzgTi3PT4zTnpR5C9EghqVjpdE9KYFouoDS+SVKrxaiT1lsBHn RMyMtK+30qhS4eQO4m4ZHTeVgF9le2QI81U01YaFig+ZIvtBQnSv3lvmNGx3fFeVngfI FUXW2D3xDO2OMYAosgvDK+EFdNJ3I47t/Z77ERckZIoQz1jPbMEGPymK/skGOeHSSfN3 DIG28Pe0xbUrlhzHF0zszTTEWaZFQFFF+Cp9S8IGNFvUXt5ckBKI/Cu2apX/q06Yc7XM i0IwSROnKSrU/oX1r1ksZeORYEqIKYftHnmbU/nvLVWGKm9u3Zhx7HqThusntkkCap+K Prxg== X-Gm-Message-State: AOAM533w14lXECziEQjjPi1RUeJdpRpPRdqSZvHExOSmT9V4QH8+9guS TfNuFoQgzG6qELFPegeAwwst3+fKtE02RA== X-Google-Smtp-Source: ABdhPJwKELnPF5QJFXzuk1HcuLong5Lm4ltkdpl7qGX0ryWnzz/b0dHEZr6hdC7un+BNsGY+lURLhg== X-Received: by 2002:a05:6214:b6c:: with SMTP id ey12mr18374648qvb.48.1624255964788; Sun, 20 Jun 2021 23:12:44 -0700 (PDT) Received: from localhost.localdomain (dsl-148-219.b2b2c.ca. [66.158.148.219]) by smtp.gmail.com with ESMTPSA id i11sm8478663qke.74.2021.06.20.23.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Jun 2021 23:12:44 -0700 (PDT) From: Maxim Cournoyer Date: Mon, 21 Jun 2021 02:12:03 -0400 Message-Id: <20210621061205.31878-6-maxim.cournoyer@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210621061205.31878-1-maxim.cournoyer@gmail.com> References: <20210621061205.31878-1-maxim.cournoyer@gmail.com> MIME-Version: 1.0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: "Guix-patches" X-getmail-retrieved-from-mailbox: Patches Tar translate duplicate files in the archive into hard links. These can cause problems, as not every tool support them; for example dpkg doesn't. * gnu/system/file-systems.scm (reduce-directories): New procedure. (file-prefix?): Lift the restriction on file prefix. The procedure can be useful for comparing relative file names. Adjust doc. (file-name-depth): New procedure, extracted from ... (btrfs-store-subvolume-file-name): ... here. * guix/scripts/pack.scm (self-contained-tarball/builder): Use reduce-directories. * tests/file-systems.scm ("reduce-directories"): New test. --- gnu/system/file-systems.scm | 56 +++++++++++++++++++++++++------------ guix/scripts/pack.scm | 6 ++-- tests/file-systems.scm | 7 ++++- 3 files changed, 48 insertions(+), 21 deletions(-) diff --git a/gnu/system/file-systems.scm b/gnu/system/file-systems.scm index 464e87cb18..fb87bfc85b 100644 --- a/gnu/system/file-systems.scm +++ b/gnu/system/file-systems.scm @@ -55,6 +55,7 @@ file-system-dependencies file-system-location + reduce-directories file-system-type-predicate btrfs-subvolume? btrfs-store-subvolume-file-name @@ -231,8 +232,8 @@ (char-set-complement (char-set #\/))) (define (file-prefix? file1 file2) - "Return #t if FILE1 denotes the name of a file that is a parent of FILE2, -where both FILE1 and FILE2 are absolute file name. For example: + "Return #t if FILE1 denotes the name of a file that is a parent of FILE2. +For example: (file-prefix? \"/gnu\" \"/gnu/store\") => #t @@ -240,19 +241,41 @@ where both FILE1 and FILE2 are absolute file name. For example: (file-prefix? \"/gn\" \"/gnu/store\") => #f " - (and (string-prefix? "/" file1) - (string-prefix? "/" file2) - (let loop ((file1 (string-tokenize file1 %not-slash)) - (file2 (string-tokenize file2 %not-slash))) - (match file1 - (() - #t) - ((head1 tail1 ...) - (match file2 - ((head2 tail2 ...) - (and (string=? head1 head2) (loop tail1 tail2))) - (() - #f))))))) + (let loop ((file1 (string-tokenize file1 %not-slash)) + (file2 (string-tokenize file2 %not-slash))) + (match file1 + (() + #t) + ((head1 tail1 ...) + (match file2 + ((head2 tail2 ...) + (and (string=? head1 head2) (loop tail1 tail2))) + (() + #f)))))) + +(define (file-name-depth file-name) + (length (string-tokenize file-name %not-slash))) + +(define (reduce-directories file-names) + "Eliminate entries in FILE-NAMES that are children of other entries in +FILE-NAMES. This is for example useful when passing a list of files to GNU +tar, which would otherwise descend into each directory passed and archive the +duplicate files as hard links, which can be undesirable." + (let* ((file-names/sorted + ;; Ascending sort by file hierarchy depth, then by file name length. + (stable-sort (delete-duplicates file-names) + (lambda (f1 f2) + (let ((depth1 (file-name-depth f1)) + (depth2 (file-name-depth f2))) + (if (= depth1 depth2) + (string< f1 f2) + (< depth1 depth2))))))) + (reverse (fold (lambda (file-name results) + (if (find (cut file-prefix? <> file-name) results) + results ;parent found -- skipping + (cons file-name results))) + '() + file-names/sorted)))) (define* (file-system-device->string device #:key uuid-type) "Return the string representations of the DEVICE field of a @@ -624,9 +647,6 @@ store is located, else #f." s (string-append "/" s))) - (define (file-name-depth file-name) - (length (string-tokenize file-name %not-slash))) - (and-let* ((btrfs-subvolume-fs (filter btrfs-subvolume? file-systems)) (btrfs-subvolume-fs* (sort btrfs-subvolume-fs diff --git a/guix/scripts/pack.scm b/guix/scripts/pack.scm index 9d4bb9f497..8a108b7a1a 100644 --- a/guix/scripts/pack.scm +++ b/guix/scripts/pack.scm @@ -225,13 +225,15 @@ dependencies are registered." `((guix build pack) (guix build utils) (guix build union) - (gnu build install)) + (gnu build install) + (gnu system file-systems)) #:select? import-module?) #~(begin (use-modules (guix build pack) (guix build utils) ((guix build union) #:select (relative-file-name)) (gnu build install) + ((gnu system file-systems) #:select (reduce-directories)) (srfi srfi-1) (srfi srfi-26) (ice-9 match)) @@ -298,7 +300,7 @@ dependencies are registered." ,(string-append "." (%store-directory)) - ,@(delete-duplicates + ,@(reduce-directories (filter-map (match-lambda (('directory directory) (string-append "." directory)) diff --git a/tests/file-systems.scm b/tests/file-systems.scm index 7f7c373884..80acb6d5b9 100644 --- a/tests/file-systems.scm +++ b/tests/file-systems.scm @@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2015, 2017 Ludovic Courtès -;;; Copyright © 2020 Maxim Cournoyer +;;; Copyright © 2020, 2021 Maxim Cournoyer ;;; ;;; This file is part of GNU Guix. ;;; @@ -50,6 +50,11 @@ (device "/foo") (flags '(bind-mount read-only))))))))) +(test-equal "reduce-directories" + '("./opt/gnu/" "./opt/gnuism" "a/b/c") + (reduce-directories '("./opt/gnu/etc" "./opt/gnu/" "./opt/gnu/bin" + "./opt/gnu/lib/debug" "./opt/gnuism" "a/b/c" "a/b/c"))) + (test-assert "does not pull (guix config)" ;; This module is meant both for the host side and "build side", so make ;; sure it doesn't pull in (guix config), which depends on the user's