Message ID | 20211216161724.547-1-pukkamustard@posteo.net |
---|---|
Headers | show |
Series | Decentralized substitute distribution with ERIS | expand |
Hi pukkamustard, pukkamustard <pukkamustard@posteo.net> skribis: > This is an initial patch and proposal towards decentralizing substitute > distribution with ERIS. Woohoo, sounds exciting! > ERIS (Encoding for Robust Immutable Storage) [1] is an encoding of content into > uniformly sized, encryped and content-addressed blocks. The original content > can be reconstructed only with access to a read capability, which can be > encoded as an URN. > > One key advantage of ERIS is that the encoding is protocol agnostic. Any > protocol that can transfer small (32KiB) sized blocks referenced by the hash of > their content will do. This can be done with things such as GNUNet, IPFS, > OpenDHT, HTTP or a USB stick on a bicycle. Yes, that’s nice. > The following patch allows substitutes to be published over IPFS using ERIS. > This is inspired and very similar to previous work on distributing substitutes > over IPFS [2]. > > The narinfos served by `guix publish` look like this: > > StorePath: /gnu/store/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 > URL: nar/gzip/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 > Compression: gzip > FileSize: 67363 > ERIS: urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE > URL: nar/zstd/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 > Compression: zstd > FileSize: 64917 > ERIS: urn:erisx2:BIBO7KS7SAWHDNC43DVILOSQ3F3SRRHEV6YPLDCSZ7MMD6LZVCHQMEQ6FUBTJAPSNFF7XR5XPTP4OQ72OPABNEO7UYBUN42O46ARKHBTGM Do we really need one URN per compression method? Couldn’t we leave compression (of individual chunks, possibly) as a “detail” handled by the encoding or the transport layer? > If the `--ipfs` is used for `guix publish` then the encoded blocks are also > uploaded to the IPFS daemon. The nar could then be retrieved from anywhere like > this: > > (use-modules (eris) > (eris blocks ipfs)) > > (eris-decode->bytevector > "urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE" > eris-blocks-ipfs-ref) > > These patches do not yet retrieve content from IPFS (TODO). But in principle, > anybody connected to IPFS can get the nar with the ERIS URN. This could be used > to reduce load on substitute server as they would only need to publish the ERIS > URN directly - substitutes could be delivered much more peer-to-peer. Nice. So adjusting ‘guix substitute’ should be relatively easy? > Other transports that I have been looking in to and am pretty sure will work > include: HTTP (with RFC 2169 [3]), GNUNet, OpenDHT. This is, imho, the > advantage of ERIS over IPFS directly or GNUNet directly. The encoding and > identifiers (URN) are abstracted away from specific transports (and also > applications). ERIS is almost exactly the same encoding as used in GNUNet > (ECRS). As a first step, ‘guix publish’ could implement RFC 2169, too. I gather implementing the HTTP and IPFS backends in ‘guix substitute’ should be relatively easy, right? > Blocks can be stored in any kind of databases (see for example the GDBM > bindings [4]). > > A tricky things is figuring out how to multiplex all these different > transports and storages... Yes. We don’t know yet what performance and data availability will be like on IPFS, for instance, so it’s important for users to be able to set priorities. It’s also important to gracefully fall back to direct HTTP downloads when fancier p2p methods fail, regardless of how they fail. > The ERIS specification is still considered "experimental". However we feel > confident to stabilize it and intend to do so around February/March 2022 with a > release 1.0.0 of the specification. This will ensure that the identifiers > remain stable for the forseeable future (until the crypto breaks). Before that > there is also a small external security audit of the specification planned > (thanks to NGI0/NLnet!). Neat. This is all very exciting. I look forward to playing around with it! Ludo’.
Hi Ludo, Thanks for your comments! Ludovic Courtès <ludo@gnu.org> writes: >> StorePath: /gnu/store/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >> URL: nar/gzip/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >> Compression: gzip >> FileSize: 67363 >> ERIS: urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE >> URL: nar/zstd/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >> Compression: zstd >> FileSize: 64917 >> ERIS: urn:erisx2:BIBO7KS7SAWHDNC43DVILOSQ3F3SRRHEV6YPLDCSZ7MMD6LZVCHQMEQ6FUBTJAPSNFF7XR5XPTP4OQ72OPABNEO7UYBUN42O46ARKHBTGM > > Do we really need one URN per compression method? Couldn’t we leave > compression (of individual chunks, possibly) as a “detail” handled by > the encoding or the transport layer? > I agree that it would be nice to leave this to the encoding layer as that would allow certain optimizations (e.g. de-duplication). Unfortunately, we haven't figured out yet what the most suitable compression/format would be. Something like EROSFS seems good (as it aligns data to fixed block sizes) [1]. But this seems a bit "clunky" for just an archive format and there do not seem to be any libraries that we could use to neatly integrate. It seems possible to block-align a Tar archive, but that seems a bit hackey [2]. Other things to look into might be Tarlz [3] and ZPAQ [4]. To get started I suggest just using one of the compressions/formats already in Guix. zstd seems to be a reasonable choice (for the same reasons why it makes sense to use zstd with `--discover` [5]). Does that sound like a plan? [1] https://inqlab.net/git/guile-eris.git/tree/examples/dedup-fs/Readme.org [2] https://unix.stackexchange.com/questions/276908/make-tar-or-other-archive-with-data-block-aligned-like-in-original-files-for/279384#279384 [3] http://lzip.nongnu.org/tarlz.html [4] http://mattmahoney.net/dc/zpaq.html [5] https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/ >> If the `--ipfs` is used for `guix publish` then the encoded blocks are also >> uploaded to the IPFS daemon. The nar could then be retrieved from anywhere like >> this: >> >> (use-modules (eris) >> (eris blocks ipfs)) >> >> (eris-decode->bytevector >> "urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE" >> eris-blocks-ipfs-ref) >> >> These patches do not yet retrieve content from IPFS (TODO). But in principle, >> anybody connected to IPFS can get the nar with the ERIS URN. This could be used >> to reduce load on substitute server as they would only need to publish the ERIS >> URN directly - substitutes could be delivered much more peer-to-peer. > > Nice. So adjusting ‘guix substitute’ should be relatively easy? Yes, relatively! :) I meant to send in a V2 that does this before going on holidays, but I'm afraid I won't make it. V2 will come in early January! >> Other transports that I have been looking in to and am pretty sure will work >> include: HTTP (with RFC 2169 [3]), GNUNet, OpenDHT. This is, imho, the >> advantage of ERIS over IPFS directly or GNUNet directly. The encoding and >> identifiers (URN) are abstracted away from specific transports (and also >> applications). ERIS is almost exactly the same encoding as used in GNUNet >> (ECRS). > > As a first step, ‘guix publish’ could implement RFC 2169, too. > > I gather implementing the HTTP and IPFS backends in ‘guix substitute’ > should be relatively easy, right? Yes, those seem to be the two easiest backends to implement. >> A tricky things is figuring out how to multiplex all these different >> transports and storages... > > Yes. We don’t know yet what performance and data availability will be > like on IPFS, for instance, so it’s important for users to be able to > set priorities. It’s also important to gracefully fall back to direct > HTTP downloads when fancier p2p methods fail, regardless of how they > fail. Agree. Thanks, -pukkamustard
Hi! pukkamustard <pukkamustard@posteo.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >>> StorePath: /gnu/store/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >>> URL: nar/gzip/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >>> Compression: gzip >>> FileSize: 67363 >>> ERIS: urn:erisx2:BIBC2LUTIQH43S2KRIAV7TBXNUUVPZTMV6KFA2M7AL5V6FNE77VNUDDVDAGJUEEAFATVO2QQT67SMOPTO3LGWCJFU7BZVCF5VXEQQW25BE >>> URL: nar/zstd/81bdcd5x4v50i28h98bfkvvkx9cky63w-hello-2.10 >>> Compression: zstd >>> FileSize: 64917 >>> ERIS: urn:erisx2:BIBO7KS7SAWHDNC43DVILOSQ3F3SRRHEV6YPLDCSZ7MMD6LZVCHQMEQ6FUBTJAPSNFF7XR5XPTP4OQ72OPABNEO7UYBUN42O46ARKHBTGM >> >> Do we really need one URN per compression method? Couldn’t we leave >> compression (of individual chunks, possibly) as a “detail” handled by >> the encoding or the transport layer? >> > > I agree that it would be nice to leave this to the encoding layer as > that would allow certain optimizations (e.g. de-duplication). > > Unfortunately, we haven't figured out yet what the most suitable > compression/format would be. Something like EROSFS seems good (as it > aligns data to fixed block sizes) [1]. But this seems a bit "clunky" for > just an archive format and there do not seem to be any libraries that we > could use to neatly integrate. It seems possible to block-align a Tar > archive, but that seems a bit hackey [2]. Other things to look into > might be Tarlz [3] and ZPAQ [4]. Yeah. Though it may be that deduplication at the block level doesn’t buy us much. That was the conclusion I reached a long time ago[a], and also seems to be supported by the recent guix-daemon deduplication improvements[b]. [a] https://hal.inria.fr/hal-00187069/en [b] https://issues.guix.gnu.org/24937#20-lineno0 > To get started I suggest just using one of the compressions/formats > already in Guix. zstd seems to be a reasonable choice (for the same > reasons why it makes sense to use zstd with `--discover` [5]). > > Does that sound like a plan? Sure! > I meant to send in a V2 that does this before going on holidays, but I'm > afraid I won't make it. V2 will come in early January! Alright, we’ll see! :-) Until then, enjoy your holidays! Ludo’.
Hello Guix, Here comes the V2 of a proposal towards decentralizing substitute distribution with ERIS. A quick summary (as this has become quite long): - This adds support for publishing and getting substitutes over IPFS. - By using the ERIS encoding we are not limited to using IPFS as transport. We can also use GNUNet, Named Data Networking (possibly) or just plain old HTTP. Support for these can be added in (guix eris). - These patches are still very rough and we need better logic for when to use IPFS et. al. and when to fallback to HTTP. - There might be performance issues when using IPFS via the IPFS daemon HTTP API. I found the setup for testing this a bit tricky. I will try and describe how I have been testing it. Please let me know how this can be improved! ** Authorize local substitutes We will be running a local substitute server so we need to add the local signing key to the list of authorized keys. In the system configurations: #+BEGIN_SRC scheme (modify-services %base-services (guix-service-type config => (guix-configuration (inherit config) (authorized-keys (cons* ;; allow substitutes from ourselves for testing purposes (local-file "/etc/signing-key.pub") %default-authorized-guix-keys))))) #+END_SRC ** Configure the local Guix checkout #+BEGIN_SRC shell ./bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc #+END_SRC The ~--sysconfdir~ is required so that guix will use the ACL in ~/etc/guix/acl~. ** Start the IPFS daemon #+BEGIN_SRC shell guix shell go-ipfs -- ipfs daemon #+END_SRC Start a local substitute server: #+BEGIN_SRC shell sudo -E ./pre-inst-env guix publish --public-key=/etc/guix/signing-key.pub --private-key=/etc/guix/signing-key.sec --cache=/tmp/guix-publish-cache/ --port=8081 --compression=zstd:19 #+END_SRC We use port 8081 as IPFS is running on 8080. We use the temporary cache directory ~/tmp/guix-publish-cache~. ** Build some package locally First we build some package: #+BEGIN_SRC shell ./pre-inst-env guix build hello --no-substitutes --no-offload #+END_SRC #+RESULTS: : /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 ** Trigger the substitute server to "bake" a susbtitute #+BEGIN_SRC shell curl http://localhost:8081/khaaib6s836bk5kbik239hlk6n6ianc4.narinfo #+END_SRC --8<---------------cut here---------------start------------->8--- StorePath: /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 URL: nar/zstd/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 Compression: zstd NarHash: sha256:11pk3jsh4zk0gigyjk881ay1nnvjfgpd3xpb4rmbaljhbiis4jbm NarSize: 190480 References: 094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib 5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33 khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 Deriver: mc7i1cdi42gy89mxl48nhdhgrfa9lpq6-hello-2.11.drv Signature: 1;strawberry;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjOTE0QTVGNTE4NUZGRUIzMzc4QTEwMzgzQzdFMEU1NDI1MEUyREZDRjk1RDUwOTNCMzU4QTFBNDE4OUFBRDVGNCMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDkxMDA2NDlCMkMyMzhEQzE2ODhFQTgyQTdCOEJFMTc5MTVBMjVDQjc1NzcwQjlGRkNGOTFDRTg2MDgyNzAwQiMpCiAgIChzICMwMUFBQ0VERjY0N0VENTQyRTIwNENDMEM1M0VDMEY0QjQ4QzdEOTAyRkFEQTkxREI4NzRGQjE2MTQ4QTIzNUI2IykKICAgKQogICkKIChwdWJsaWMta2V5IAogIChlY2MgCiAgIChjdXJ2ZSBFZDI1NTE5KQogICAocSAjMDRDMkY4ODk1QTU0NDNGNTlCODk2NDEwMEI1MDY0NzU4RjQ1N0YzMENEREE1MTQyQzE0MDc0NjExNTA1NTc5MCMpCiAgICkKICApCiApCg== --8<---------------cut here---------------end--------------->8--- If you do this again after a few seconds you will get a different response that has the ERIS URN and the FileSizes. The reason for this is that Guix publish bakes the nars asyncrhonisly in the background: #+BEGIN_SRC shell curl http://localhost:8081/khaaib6s836bk5kbik239hlk6n6ianc4.narinfo #+END_SRC #+RESULTS: --8<---------------cut here---------------start------------->8--- StorePath: /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 URL: nar/zstd/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 Compression: zstd FileSize: 57691 NarHash: sha256:11pk3jsh4zk0gigyjk881ay1nnvjfgpd3xpb4rmbaljhbiis4jbm NarSize: 190480 ERISFormat: application/x-nix-archive+zstd-19 ERIS: urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q References: 094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib 5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33 khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 Deriver: mc7i1cdi42gy89mxl48nhdhgrfa9lpq6-hello-2.11.drv Signature: 1;strawberry;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjOTE0QTVGNTE4NUZGRUIzMzc4QTEwMzgzQzdFMEU1NDI1MEUyREZDRjk1RDUwOTNCMzU4QTFBNDE4OUFBRDVGNCMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDkxMDA2NDlCMkMyMzhEQzE2ODhFQTgyQTdCOEJFMTc5MTVBMjVDQjc1NzcwQjlGRkNGOTFDRTg2MDgyNzAwQiMpCiAgIChzICMwMUFBQ0VERjY0N0VENTQyRTIwNENDMEM1M0VDMEY0QjQ4QzdEOTAyRkFEQTkxREI4NzRGQjE2MTQ4QTIzNUI2IykKICAgKQogICkKIChwdWJsaWMta2V5IAogIChlY2MgCiAgIChjdXJ2ZSBFZDI1NTE5KQogICAocSAjMDRDMkY4ODk1QTU0NDNGNTlCODk2NDEwMEI1MDY0NzU4RjQ1N0YzMENEREE1MTQyQzE0MDc0NjExNTA1NTc5MCMpCiAgICkKICApCiApCg== --8<---------------cut here---------------end--------------->8--- These patches have added the ERIS and ERISFormat fields. Eventually we would have figured out what the best format for use over ERIS is, for now we encode it in the ERISFormat field. ** Removing a package from the store This is necessary in order to make guix look for a substitute. #+BEGIN_SRC shell ./pre-inst-env guix gc -D /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 #+END_SRC ** Start the Guix daemon from the repository #+BEGIN_SRC shell sudo -E ./pre-inst-env guix-daemon --build-users-group=guixbuild --debug --substitute-urls=http://localhost:8081/ #+END_SRC Note this will probably stop your system Guix daemon. Run ~sudo herd restart guix-daemon~ to restart it. #+BEGIN_SRC shell ./pre-inst-env guix build hello #+END_SRC --8<---------------cut here---------------start------------->8--- substituting /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11... downloading from urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q ... urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q 502KiB/s 00:00 | 56KiB transferred /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11 --8<---------------cut here---------------end--------------->8--- We have just retreived the substitute for the hello package from IPFS. Hello decentralized substitutes! I have only tested this for fairly small packages (up to a few MB). One issue with IPFS might be that we have to create a new HTTP connection to the IPFS daemon for every single block (32KiB). The IPFS daemon does not seem to support HTTP connection re-use and neither does the Guile (web client). I fear this might become a performance issue. It seems possible to use IPFS more directly by exposing the Go code as a C library and then using that with the Guile FFI [1]. This is however a bit complicated and adds a lot of dependencies. In particular, this should not become a dependency of Guix itself. The performance of IPFS itself also needs to be evaluated, maybe the IPFS HTTP API will not be the bottle-neck. As mentioned in previous mail a simple HTTP transport for blocks would be a good fallback. This would allow users to get missing blocks (things that somehow got dropped from IPFS) directly from a substitute server. This is different then getting the entire NAR from a substitute server. A user might be missing a single 32KiB block and should be able to get only that. However, such a HTTP fallback would also suffer from the one-connection-per-block issue. As part of general ERIS research we are investigating CoAP as a better fallback transport. In any case, it would be necessary for the substitute server to store encoded blocks of the NAR. For this I think it makes sense to use a small database. We have bindings to use ERIS with GDBM [2]. It might also make sense to use SQLite, especially if there are other use-cases for such a database. I will be looking into the HTTP fallback and also using BitTorrent and GNUNet as transports. Thanks for making it so far and happy hacking! -pukkamustard [1] https://github.com/scala-network/libipfs/ [2] https://codeberg.org/eris/guile-eris/src/branch/main/eris/blocks/gdbm.scm pukkamustard (5): WIP: gnu: guile-eris: Update to unreleased git version. publish: Add ERIS URN to narinfo Add (guix eris). publish: Add support for storing ERIS encoded blocks to IPFS. substitute: Fetch substitutes using ERIS. Makefile.am | 1 + configure.ac | 5 +++ gnu/packages/guile-xyz.scm | 10 ++--- gnu/packages/package-management.scm | 1 + guix/eris.scm | 60 +++++++++++++++++++++++++++++ guix/narinfo.scm | 14 +++++-- guix/scripts/publish.scm | 32 +++++++++++++-- guix/scripts/substitute.scm | 21 +++++++--- 8 files changed, 126 insertions(+), 18 deletions(-) create mode 100644 guix/eris.scm