Message ID | cover.1706287537.git.ludo@gnu.org |
---|---|
Headers |
Return-Path: <guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org> X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 9C76027BBEA; Fri, 26 Jan 2024 17:18:14 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id C657D27BBE2 for <patchwork@mira.cbaines.net>; Fri, 26 Jan 2024 17:18:09 +0000 (GMT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <guix-patches-bounces@gnu.org>) id 1rTPq7-0002T8-Cg; Fri, 26 Jan 2024 12:18:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1rTPq6-0002Q5-Dq for guix-patches@gnu.org; Fri, 26 Jan 2024 12:18:02 -0500 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1rTPq4-0005mj-GZ; Fri, 26 Jan 2024 12:18:01 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1rTPq6-0004N4-9T; Fri, 26 Jan 2024 12:18:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Resent-From: Ludovic =?utf-8?q?Court=C3=A8s?= <ludo@gnu.org> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org> Resent-CC: guix@cbaines.net, dev@jpoiret.xyz, ludo@gnu.org, othacehe@gnu.org, rekado@elephly.net, zimon.toutoune@gmail.com, me@tobias.gr, guix-patches@gnu.org Resent-Date: Fri, 26 Jan 2024 17:18:02 +0000 Resent-Message-ID: <handler.68741.B.170628943416672@debbugs.gnu.org> Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 68741 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 68741@debbugs.gnu.org Cc: Ludovic =?utf-8?q?Court=C3=A8s?= <ludo@gnu.org>, Christopher Baines <guix@cbaines.net>, Josselin Poiret <dev@jpoiret.xyz>, Ludovic =?utf-8?q?Court=C3=A8s?= <ludo@gnu.org>, Mathieu Othacehe <othacehe@gnu.org>, Ricardo Wurmus <rekado@elephly.net>, Simon Tournier <zimon.toutoune@gmail.com>, Tobias Geerinckx-Rice <me@tobias.gr> X-Debbugs-Original-To: guix-patches@gnu.org X-Debbugs-Original-Xcc: Christopher Baines <guix@cbaines.net>, Josselin Poiret <dev@jpoiret.xyz>, Ludovic =?utf-8?q?Court=C3=A8s?= <ludo@gnu.org>, Mathieu Othacehe <othacehe@gnu.org>, Ricardo Wurmus <rekado@elephly.net>, Simon Tournier <zimon.toutoune@gmail.com>, Tobias Geerinckx-Rice <me@tobias.gr> Received: via spool by submit@debbugs.gnu.org id=B.170628943416672 (code B ref -1); Fri, 26 Jan 2024 17:18:02 +0000 Received: (at submit) by debbugs.gnu.org; 26 Jan 2024 17:17:14 +0000 Received: from localhost ([127.0.0.1]:52619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>) id 1rTPpH-0004Kj-39 for submit@debbugs.gnu.org; Fri, 26 Jan 2024 12:17:14 -0500 Received: from lists.gnu.org ([2001:470:142::17]:39826) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ludo@gnu.org>) id 1rTPpC-0004JH-1E for submit@debbugs.gnu.org; Fri, 26 Jan 2024 12:17:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@gnu.org>) id 1rTPoz-0001Zv-Tk for guix-patches@gnu.org; Fri, 26 Jan 2024 12:16:54 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <ludo@gnu.org>) id 1rTPoy-0005bb-V5; Fri, 26 Jan 2024 12:16:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to: references; bh=F5IC4AS13r+TdM2wuBoeVvNNlZcvTYULpiRynO2msVw=; b=L+MCrO9yQZymRo ywXbLil33dkD2qU8HsDpRlPaY7dP6heXCnlPgtSnaEzZee0NnGEZeH+jYTRjv5Z/sX6FB1sr5GRoE oDGa8mi1VBjugIsw86KO1e2TE1vjavky17TGDzg3wy432nrRbFH9IHgzRag3m7GWsac+TEXIZ3boM JjSmLN3T568XnnouSwpTGjhNsWulQuUGIJa2U2JlLxvRGTJVBlhP554hQns4V1cWHplC5IAmCX8PM PhtDKIX+XRVhdI7GGnWZvme93+hBUKxUOaHtuN5nm6rDtI81sIPpsHRcFuRck7Qy5EfkDqasFtSS3 e7AUne3YCxWPLpSTjXIQ==; From: Ludovic =?utf-8?q?Court=C3=A8s?= <ludo@gnu.org> Date: Fri, 26 Jan 2024 18:16:40 +0100 Message-ID: <cover.1706287537.git.ludo@gnu.org> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: <guix-patches.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-patches>, <mailto:guix-patches-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/guix-patches> List-Post: <mailto:guix-patches@gnu.org> List-Help: <mailto:guix-patches-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-patches>, <mailto:guix-patches-request@gnu.org?subject=subscribe> Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches |
Series |
Content-addressed downloads from Software Heritage
|
|
Message
Ludovic Courtès
Jan. 26, 2024, 5:16 p.m. UTC
Hello Guix! For those who’ve been following along, you might remember that the main impedance mismatch between SWH and Guix is that SWH uses Git tree SHA1 hashes to identify directories whereas Guix uses nar SHA256 hashes (and possibly other hash functions in the future): https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/ Because of this, the SWH fallback path for ‘git-download’ had two options: 1. If ‘git-reference’ specifies a full SHA1 commit ID, it would look it up on SWH and fetch it. 2. If ‘git-reference’ specifies a tag, which is perhaps the majority of cases, Guix would ask SWH the commit that once corresponded to that tag at that URL, and then fetch it. Case #1 is ideal: it’s content-addressed. Case #2 is brittle: we’re hoping that the tag hasn’t been modified and that the URL hasn’t been reused for something else; if that’s not the case, SWH might return the “wrong” commit and we end up fetching something unrelated. The good news is that our friends at SWH have just deployed a new version of their code that lets us look up directories by some “external identifier” (“ExtID”), among which there’s ‘nar-sha256’: https://archive.softwareheritage.org/api/1/extid/doc/ And that, my friends, makes a huge difference: the impedance mismatch is gone, we can now use content-addressing to fetch our stuff from SWH!! And that works not just for Git, but also for Mercurial, SVN, CVS, etc. Well, there’s a caveat: currently the ‘nar-sha256’ is added only on new visits and it’s apparently not being added yet for Mercurial for unclear reasons. So right now, we can get guile-sqlite3 0.1.3 (Git) by nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things. That’ll improve over time though, and SWH comrades are open to adding those ExtIDs retroactively. The patches that follow do several things: 1. Follow redirects in the Vault: (guix swh) previously did not do that (oops!) but the newly-deployed Vault now responds with 302 redirects so we have to handle that. 2. Add bindings for the ExtID HTTP interface. 3. Add ‘swh-download-directory-by-nar-hash’, which does what it says. 4. Use that as the preferred fallback method for ‘git-fetch’. Here’s a REPLshot: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) ) $43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153"> scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql") SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153' swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/ swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/ swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm $46 = #t --8<---------------cut here---------------end--------------->8--- Huge thanks to everyone over at #swh-devel for helping me out over the past few days! Next tasks: implement download fallback for ‘hg-fetch’, change ‘guix lint -c archival’ to make ‘save-origin’ requests not just for Git repos, assess the situation with SVN and sub-directories to see what can be done. Thoughts? Ludo’. PS: Apologies for the wall of text! Ludovic Courtès (6): swh: ‘vault-fetch’ follows redirects. swh: Add bindings for the “ExtID” API. swh: Add ‘swh-download-directory-by-nar-hash’. lint: archival: Check with ‘lookup-directory-by-nar-hash’. git-download: Download from SWH by nar hash when possible. swh: Fix docstring of ‘lookup-directory’. guix/build/git.scm | 20 ++++-- guix/git-download.scm | 4 +- guix/lint.scm | 28 +++++--- guix/scripts/perform-download.scm | 4 +- guix/swh.scm | 113 ++++++++++++++++++++++++++---- tests/lint.scm | 33 +++++++-- tests/swh.scm | 21 +++++- 7 files changed, 189 insertions(+), 34 deletions(-) base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
Comments
Oops, I forgot to Cc: the fine people for the cover letter; fixed! See <https://issues.guix.gnu.org/68741>. Ludovic Courtès <ludo@gnu.org> skribis: > Hello Guix! > > For those who’ve been following along, you might remember that the > main impedance mismatch between SWH and Guix is that SWH uses Git > tree SHA1 hashes to identify directories whereas Guix uses nar SHA256 > hashes (and possibly other hash functions in the future): > > https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/ > > Because of this, the SWH fallback path for ‘git-download’ had two > options: > > 1. If ‘git-reference’ specifies a full SHA1 commit ID, it would > look it up on SWH and fetch it. > > 2. If ‘git-reference’ specifies a tag, which is perhaps the > majority of cases, Guix would ask SWH the commit that once > corresponded to that tag at that URL, and then fetch it. > > Case #1 is ideal: it’s content-addressed. Case #2 is brittle: we’re > hoping that the tag hasn’t been modified and that the URL hasn’t been > reused for something else; if that’s not the case, SWH might return > the “wrong” commit and we end up fetching something unrelated. > > The good news is that our friends at SWH have just deployed a new > version of their code that lets us look up directories by some > “external identifier” (“ExtID”), among which there’s ‘nar-sha256’: > > https://archive.softwareheritage.org/api/1/extid/doc/ > > And that, my friends, makes a huge difference: the impedance mismatch > is gone, we can now use content-addressing to fetch our stuff from SWH!! > And that works not just for Git, but also for Mercurial, SVN, CVS, etc. > > Well, there’s a caveat: currently the ‘nar-sha256’ is added only on > new visits and it’s apparently not being added yet for Mercurial for > unclear reasons. So right now, we can get guile-sqlite3 0.1.3 (Git) by > nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things. > That’ll improve over time though, and SWH comrades are open to adding > those ExtIDs retroactively. > > The patches that follow do several things: > > 1. Follow redirects in the Vault: (guix swh) previously did not > do that (oops!) but the newly-deployed Vault now responds with > 302 redirects so we have to handle that. > > 2. Add bindings for the ExtID HTTP interface. > > 3. Add ‘swh-download-directory-by-nar-hash’, which does what it > says. > > 4. Use that as the preferred fallback method for ‘git-fetch’. > > Here’s a REPLshot: > > scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) ) > $43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153"> > scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql") > SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153' > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/ > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/ > swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm > $46 = #t > > Huge thanks to everyone over at #swh-devel for helping me out > over the past few days! > > Next tasks: implement download fallback for ‘hg-fetch’, change > ‘guix lint -c archival’ to make ‘save-origin’ requests not just > for Git repos, assess the situation with SVN and sub-directories > to see what can be done. > > Thoughts? > > Ludo’. > > PS: Apologies for the wall of text! > > Ludovic Courtès (6): > swh: ‘vault-fetch’ follows redirects. > swh: Add bindings for the “ExtID” API. > swh: Add ‘swh-download-directory-by-nar-hash’. > lint: archival: Check with ‘lookup-directory-by-nar-hash’. > git-download: Download from SWH by nar hash when possible. > swh: Fix docstring of ‘lookup-directory’. > > guix/build/git.scm | 20 ++++-- > guix/git-download.scm | 4 +- > guix/lint.scm | 28 +++++--- > guix/scripts/perform-download.scm | 4 +- > guix/swh.scm | 113 ++++++++++++++++++++++++++---- > tests/lint.scm | 33 +++++++-- > tests/swh.scm | 21 +++++- > 7 files changed, 189 insertions(+), 34 deletions(-) > > > base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
Hi, Ludovic Courtès <ludo@gnu.org> skribis: > swh: ‘vault-fetch’ follows redirects. > swh: Add bindings for the “ExtID” API. > swh: Add ‘swh-download-directory-by-nar-hash’. > lint: archival: Check with ‘lookup-directory-by-nar-hash’. > git-download: Download from SWH by nar hash when possible. > swh: Fix docstring of ‘lookup-directory’. Pushed as 5a61ce6bcfbd0882956e40457232da737776abe7. > Next tasks: implement download fallback for ‘hg-fetch’, change > ‘guix lint -c archival’ to make ‘save-origin’ requests not just > for Git repos, assess the situation with SVN and sub-directories > to see what can be done. Let’s make it happen! Ludo’.