mbox series

[bug#52555,RFC,v2,0/5] Decentralized substitute distribution with ERIS

Message ID 20220125192201.7582-1-pukkamustard@posteo.net
Headers show
Series Decentralized substitute distribution with ERIS | expand

Message

pukkamustard Jan. 25, 2022, 7:21 p.m. UTC
Hello Guix,

Here comes the V2 of a proposal towards decentralizing substitute distribution
with ERIS.

A quick summary (as this has become quite long):

- This adds support for publishing and getting substitutes over IPFS.
- By using the ERIS encoding we are not limited to using IPFS as transport. We
  can also use GNUNet, Named Data Networking (possibly) or just plain old HTTP. Support
  for these can be added in (guix eris).
- These patches are still very rough and we need better logic for when to use
  IPFS et. al. and when to fallback to HTTP.
- There might be performance issues when using IPFS via the IPFS daemon HTTP
  API.

I found the setup for testing this a bit tricky. I will try and describe how I
have been testing it. Please let me know how this can be improved!

** Authorize local substitutes

We will be running a local substitute server so we need to add the local
signing key to the list of authorized keys. In the system configurations:

#+BEGIN_SRC scheme
  (modify-services %base-services
    (guix-service-type
     config =>
     (guix-configuration
      (inherit config)
      (authorized-keys
       (cons*
        ;; allow substitutes from ourselves for testing purposes
        (local-file "/etc/signing-key.pub")
        %default-authorized-guix-keys)))))
#+END_SRC

** Configure the local Guix checkout

#+BEGIN_SRC shell
./bootstrap && ./configure --localstatedir=/var --sysconfdir=/etc
#+END_SRC

The ~--sysconfdir~ is required so that guix will use the ACL in ~/etc/guix/acl~.

** Start the IPFS daemon

#+BEGIN_SRC shell
guix shell go-ipfs -- ipfs daemon
#+END_SRC

Start a local substitute server:

#+BEGIN_SRC shell
  sudo -E ./pre-inst-env guix publish --public-key=/etc/guix/signing-key.pub --private-key=/etc/guix/signing-key.sec --cache=/tmp/guix-publish-cache/ --port=8081 --compression=zstd:19
#+END_SRC

We use port 8081 as IPFS is running on 8080.

We use the temporary cache directory ~/tmp/guix-publish-cache~.

** Build some package locally

First we build some package:

#+BEGIN_SRC shell
./pre-inst-env guix build hello --no-substitutes --no-offload
#+END_SRC

#+RESULTS:
: /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11

** Trigger the substitute server to "bake" a susbtitute

#+BEGIN_SRC shell
curl http://localhost:8081/khaaib6s836bk5kbik239hlk6n6ianc4.narinfo
#+END_SRC
--8<---------------cut here---------------start------------->8---
StorePath: /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
URL: nar/zstd/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
Compression: zstd
NarHash: sha256:11pk3jsh4zk0gigyjk881ay1nnvjfgpd3xpb4rmbaljhbiis4jbm
NarSize: 190480
References: 094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib 5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33 khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
Deriver: mc7i1cdi42gy89mxl48nhdhgrfa9lpq6-hello-2.11.drv
Signature: 1;strawberry;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjOTE0QTVGNTE4NUZGRUIzMzc4QTEwMzgzQzdFMEU1NDI1MEUyREZDRjk1RDUwOTNCMzU4QTFBNDE4OUFBRDVGNCMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDkxMDA2NDlCMkMyMzhEQzE2ODhFQTgyQTdCOEJFMTc5MTVBMjVDQjc1NzcwQjlGRkNGOTFDRTg2MDgyNzAwQiMpCiAgIChzICMwMUFBQ0VERjY0N0VENTQyRTIwNENDMEM1M0VDMEY0QjQ4QzdEOTAyRkFEQTkxREI4NzRGQjE2MTQ4QTIzNUI2IykKICAgKQogICkKIChwdWJsaWMta2V5IAogIChlY2MgCiAgIChjdXJ2ZSBFZDI1NTE5KQogICAocSAjMDRDMkY4ODk1QTU0NDNGNTlCODk2NDEwMEI1MDY0NzU4RjQ1N0YzMENEREE1MTQyQzE0MDc0NjExNTA1NTc5MCMpCiAgICkKICApCiApCg==
--8<---------------cut here---------------end--------------->8---

If you do this again after a few seconds you will get a different response that
has the ERIS URN and the FileSizes. The reason for this is that Guix publish
bakes the nars asyncrhonisly in the background:

#+BEGIN_SRC shell
curl http://localhost:8081/khaaib6s836bk5kbik239hlk6n6ianc4.narinfo
#+END_SRC

#+RESULTS:
--8<---------------cut here---------------start------------->8---
StorePath: /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
URL: nar/zstd/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
Compression: zstd
FileSize: 57691
NarHash: sha256:11pk3jsh4zk0gigyjk881ay1nnvjfgpd3xpb4rmbaljhbiis4jbm
NarSize: 190480
ERISFormat: application/x-nix-archive+zstd-19
ERIS: urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q
References: 094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib 5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33 khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
Deriver: mc7i1cdi42gy89mxl48nhdhgrfa9lpq6-hello-2.11.drv
Signature: 1;strawberry;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjOTE0QTVGNTE4NUZGRUIzMzc4QTEwMzgzQzdFMEU1NDI1MEUyREZDRjk1RDUwOTNCMzU4QTFBNDE4OUFBRDVGNCMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDkxMDA2NDlCMkMyMzhEQzE2ODhFQTgyQTdCOEJFMTc5MTVBMjVDQjc1NzcwQjlGRkNGOTFDRTg2MDgyNzAwQiMpCiAgIChzICMwMUFBQ0VERjY0N0VENTQyRTIwNENDMEM1M0VDMEY0QjQ4QzdEOTAyRkFEQTkxREI4NzRGQjE2MTQ4QTIzNUI2IykKICAgKQogICkKIChwdWJsaWMta2V5IAogIChlY2MgCiAgIChjdXJ2ZSBFZDI1NTE5KQogICAocSAjMDRDMkY4ODk1QTU0NDNGNTlCODk2NDEwMEI1MDY0NzU4RjQ1N0YzMENEREE1MTQyQzE0MDc0NjExNTA1NTc5MCMpCiAgICkKICApCiApCg==
--8<---------------cut here---------------end--------------->8---

These patches have added the ERIS and ERISFormat fields. Eventually we would
have figured out what the best format for use over ERIS is, for now we encode
it in the ERISFormat field.

** Removing a package from the store

This is necessary in order to make guix look for a substitute.

#+BEGIN_SRC shell
./pre-inst-env guix gc -D /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
#+END_SRC

** Start the Guix daemon from the repository

#+BEGIN_SRC shell
sudo -E ./pre-inst-env guix-daemon --build-users-group=guixbuild  --debug  --substitute-urls=http://localhost:8081/
#+END_SRC

Note this will probably stop your system Guix daemon. Run ~sudo herd restart
guix-daemon~ to restart it.

#+BEGIN_SRC shell
./pre-inst-env guix build hello
#+END_SRC
--8<---------------cut here---------------start------------->8---
substituting /gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11...
downloading from urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q ...
 urn:erisx2:B4AYPTXLTACB6WJYJ74RKBCVU3RBLHA4PY6HATUWRZNJ6THVSDUFM34K2ASUF3B6EOYEEBRZ5XEUR4PAAAIED7G7YSEZVZ5V7WWZ2PSC7Q                                        502KiB/s 00:00 | 56KiB transferred

/gnu/store/khaaib6s836bk5kbik239hlk6n6ianc4-hello-2.11
--8<---------------cut here---------------end--------------->8---

We have just retreived the substitute for the hello package from IPFS. Hello
decentralized substitutes!

I have only tested this for fairly small packages (up to a few MB).

One issue with IPFS might be that we have to create a new HTTP connection to
the IPFS daemon for every single block (32KiB).  The IPFS daemon does not seem
to support HTTP connection re-use and neither does the Guile (web client).  I
fear this might become a performance issue. It seems possible to use IPFS more
directly by exposing the Go code as a C library and then using that with the
Guile FFI [1]. This is however a bit complicated and adds a lot of
dependencies. In particular, this should not become a dependency of Guix
itself. The performance of IPFS itself also needs to be evaluated, maybe the
IPFS HTTP API will not be the bottle-neck.

As mentioned in previous mail a simple HTTP transport for blocks would be a
good fallback. This would allow users to get missing blocks (things that
somehow got dropped from IPFS) directly from a substitute server. This is
different then getting the entire NAR from a substitute server. A user might be
missing a single 32KiB block and should be able to get only that. However, such
a HTTP fallback would also suffer from the one-connection-per-block issue. As
part of general ERIS research we are investigating CoAP as a better fallback
transport.

In any case, it would be necessary for the substitute server to store encoded
blocks of the NAR. For this I think it makes sense to use a small database. We
have bindings to use ERIS with GDBM [2]. It might also make sense to use
SQLite, especially if there are other use-cases for such a database.

I will be looking into the HTTP fallback and also using BitTorrent and GNUNet
as transports.

Thanks for making it so far and happy hacking!
-pukkamustard


[1] https://github.com/scala-network/libipfs/
[2] https://codeberg.org/eris/guile-eris/src/branch/main/eris/blocks/gdbm.scm

pukkamustard (5):
  WIP: gnu: guile-eris: Update to unreleased git version.
  publish: Add ERIS URN to narinfo
  Add (guix eris).
  publish: Add support for storing ERIS encoded blocks to IPFS.
  substitute: Fetch substitutes using ERIS.

 Makefile.am                         |  1 +
 configure.ac                        |  5 +++
 gnu/packages/guile-xyz.scm          | 10 ++---
 gnu/packages/package-management.scm |  1 +
 guix/eris.scm                       | 60 +++++++++++++++++++++++++++++
 guix/narinfo.scm                    | 14 +++++--
 guix/scripts/publish.scm            | 32 +++++++++++++--
 guix/scripts/substitute.scm         | 21 +++++++---
 8 files changed, 126 insertions(+), 18 deletions(-)
 create mode 100644 guix/eris.scm

Comments

M Jan. 29, 2022, 9 p.m. UTC | #1
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
> ** Authorize local substitutes
> 
> We will be running a local substitute server so we need to add the
> local
> signing key to the list of authorized keys. In the system
> configurations:
> 
> #+BEGIN_SRC scheme
>   (modify-services %base-services
>     (guix-service-type config => [...]))
> #+END_SRC
> [...]
> ** Start the IPFS daemon
> 
> #+BEGIN_SRC shell
> guix shell go-ipfs -- ipfs daemon
> #+END_SRC

There's an ipfs-service-type nowadays, so starting the daemon manually
isn't required (if using Guix System).

Greetings,
Maxime.
M Jan. 29, 2022, 9:08 p.m. UTC | #2
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
> I will be looking into the HTTP fallback and also using BitTorrent and GNUNet
> as transports.

I have been writing a (Guile) Scheme port of GNUnet's client libraries
(https://git.gnunet.org/gnunet-scheme.git/).  Currently only NSE is
supported, but I'm working on DHT.  DHT search/put already works to a
degree (see examples/web.scm), but there are plenty of sharp edges
(see TODOs about disconnecting, reconnecting and stopping fibers,
and see guix.scm for Guile bugs that are patched out and extra guile-
fibers features).

Tests are being written a edge cases will be addressed.

Greetings,
Maxime.
M Jan. 29, 2022, 9:52 p.m. UTC | #3
Hi,

Is it possible for the following situation to happen?
If so, why not?

  1. server A is authentic
  2. server M is malicious, it tries to trick the client into
     installing an incorrect substitute
  3. (key of) server A is authorised
  4. (key of) server M is _not_ authorised
  5. server A and M are both in substitute-urls
  6. server A only serves ‘classical’ substitutes, server B also serves
     via ERIS+ipfs
  7. Both A and M set the same FileHash, References, etc. in the
     narinfo
  8. However, M set an ERIS URN pointing to a backdoored substitute.
  9. The client trusts A, and A and B have the same FileHash etc.,
     so the client considers the narinfo of B to be authentic
     because it has the same FileHash.
 10. The client prefers ERIS above HTTP(S), so it downloads via M.
 11. The client now installed a backdoored substitute!

Greetings,
Maxime.
M Jan. 30, 2022, 11:46 a.m. UTC | #4
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
> I have only tested this for fairly small packages (up to a few MB).
> 
> One issue with IPFS might be that we have to create a new HTTP connection to
> the IPFS daemon for every single block (32KiB).  The IPFS daemon does not seem
> to support HTTP connection re-use

According to <https://github.com/ipfs/go-ipfs/issues/3767>, the IPFS
daemon supports connection reuse according to some people and doesn't
according to other people.

> and neither does the Guile (web client).

Guix supports connection reuse, see 'call-with-cached-connection'
in (guix scripts substitute).

> I fear this might become a performance issue.

IIUC, the performance problem primarily lies in the round-tripping
between the client and the server.  If the client and the server are on
the same machine, then this round trip time is presumably small
compared to, say, localhost contacting ci.guix.gnu.org.

Still, connection reuse would be nice.

> It seems possible to use IPFS more directly by exposing the Go code as a
> C library and then using that with the Guile FFI [1]. This is however a bit
> complicated and adds a lot of dependencies. In particular, this should not become
> a dependency of Guix itself. The performance of IPFS itself also needs to be
> evaluated, maybe the IPFS HTTP API will not be the bottle-neck.

Security-wise, libipfs doesn't seem great: libipfs starts the IPFS
daemon inside the process and guix/scripts/substitute.scm is run
as root.

> As mentioned in previous mail a simple HTTP transport for blocks would be a
> good fallback. This would allow users to get missing blocks (things that
> somehow got dropped from IPFS) directly from a substitute server. [...]

Seems a good idea to me -- DHTs can be unreliable.  I presume this will
be implemented with some kind of timeout: if no block is received
within N seconds, fallback to HTTP?

Also, don't forget to insert this missing block back into
IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be
available until nothing is available anymore.

> In any case, it would be necessary for the substitute server to store encoded
> blocks of the NAR. For this I think it makes sense to use a small database. We
> have bindings to use ERIS with GDBM [2]. It might also make sense to use
> SQLite, especially if there are other use-cases for such a database.

Wouldn't this be a huge database?  IIRC, according to logs.guix.gnu.org
the size of the nars of the substitute servers are somewhere in the
200G-2T range or something like that.

To reduce the size of the database, perhaps you could let the database
be a mapping from block ids to the name of the nar + the position in
the nar, and encode the block on-demand?

The database doesn't seem necessary, the substitute server could have
some end-point

  /publish-this-nar-again-into-IPFS/name-of-the-nar

which, when contacted, inserts the nar again into IPFS.  Then when a
block was unavailable, the client contacts this end-point and retries.

Greetings,
Maxime.
pukkamustard Feb. 2, 2022, 9:50 a.m. UTC | #5
Hi Maxime,

Maxime Devos <maximedevos@telenet.be> writes:

> There's an ipfs-service-type nowadays, so starting the daemon manually
> isn't required (if using Guix System).

Good point. Starting the daemon manually is only necessary if you don't
use the service. I don't use the IPFS service.

-pukkamustard
pukkamustard Feb. 2, 2022, 9:56 a.m. UTC | #6
Maxime Devos <maximedevos@telenet.be> writes:

> [[PGP Signed Part:Undecided]]
> pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
>> I will be looking into the HTTP fallback and also using BitTorrent and GNUNet
>> as transports.
>
> I have been writing a (Guile) Scheme port of GNUnet's client libraries
> (https://git.gnunet.org/gnunet-scheme.git/).  Currently only NSE is
> supported, but I'm working on DHT.  DHT search/put already works to a
> degree (see examples/web.scm), but there are plenty of sharp edges
> (see TODOs about disconnecting, reconnecting and stopping fibers,
> and see guix.scm for Guile bugs that are patched out and extra guile-
> fibers features).

Very interesting! I have been following your work on that a bit.

From what I understand gnunet-scheme interacts with the GNUNet services
and sends messages to the various GNUNet services. Is that correct?

Have you considered implementing the GNUNet protocols themeselves in
Guile? I.e. instead of connecting with the GNUNet services and sending
messages, implement R5N completely in Guile. IMHO this would be very
nice as one could use GNUNet protocols completely in Guile and not rely
on the GNUNet C code.

I believe this is somewhat the direction being taken with the GNUNet Go
implementation (https://github.com/bfix/gnunet-go) and also in line with
recent efforts to specify the individual GNUNet components and protocols
more independantly of one another (e.g. R5N is specified to work over IP
- https://lsd.gnunet.org/lsd0004/).

-pukkamustard
pukkamustard Feb. 2, 2022, 10:51 a.m. UTC | #7
Maxime Devos <maximedevos@telenet.be> writes:

> [[PGP Signed Part:Undecided]]
> pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
>> I have only tested this for fairly small packages (up to a few MB).
>> 
>> One issue with IPFS might be that we have to create a new HTTP connection to
>> the IPFS daemon for every single block (32KiB).  The IPFS daemon does not seem
>> to support HTTP connection re-use
>
> According to <https://github.com/ipfs/go-ipfs/issues/3767>, the IPFS
> daemon supports connection reuse according to some people and doesn't
> according to other people.

Hm, from what I understand connection re-use is something introduced in
HTTP/2 and go-ipfs does not do HTTP/2
(https://github.com/ipfs/go-ipfs/issues/5974).

>> and neither does the Guile (web client).
>
> Guix supports connection reuse, see 'call-with-cached-connection'
> in (guix scripts substitute).

Ah ok. Cool!

>> I fear this might become a performance issue.
>
> IIUC, the performance problem primarily lies in the round-tripping
> between the client and the server.  If the client and the server are on
> the same machine, then this round trip time is presumably small
> compared to, say, localhost contacting ci.guix.gnu.org.
>
> Still, connection reuse would be nice.

Remains to be seen if this is a problem.

It is considerably more pronounced than with regular usage of IPFS as we
make a HTTP request to IPFS for every 32KiB block instead of for an
entire file (what most people do when using the IPFS daemon).

>> It seems possible to use IPFS more directly by exposing the Go code as a
>> C library and then using that with the Guile FFI [1]. This is however a bit
>> complicated and adds a lot of dependencies. In particular, this should not become
>> a dependency of Guix itself. The performance of IPFS itself also needs to be
>> evaluated, maybe the IPFS HTTP API will not be the bottle-neck.
>
> Security-wise, libipfs doesn't seem great: libipfs starts the IPFS
> daemon inside the process and guix/scripts/substitute.scm is run
> as root.

I agree.

>> As mentioned in previous mail a simple HTTP transport for blocks would be a
>> good fallback. This would allow users to get missing blocks (things that
>> somehow got dropped from IPFS) directly from a substitute server. [...]
>
> Seems a good idea to me -- DHTs can be unreliable.  I presume this will
> be implemented with some kind of timeout: if no block is received
> within N seconds, fallback to HTTP?

Yes, exactly.

> Also, don't forget to insert this missing block back into
> IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be
> available until nothing is available anymore.

This might be a bit of a burden for users. As you mention the size of
such a database might become considerable.

>> In any case, it would be necessary for the substitute server to store encoded
>> blocks of the NAR. For this I think it makes sense to use a small database. We
>> have bindings to use ERIS with GDBM [2]. It might also make sense to use
>> SQLite, especially if there are other use-cases for such a database.
>
> Wouldn't this be a huge database?  IIRC, according to logs.guix.gnu.org
> the size of the nars of the substitute servers are somewhere in the
> 200G-2T range or something like that.
>
> To reduce the size of the database, perhaps you could let the database
> be a mapping from block ids to the name of the nar + the position in
> the nar, and encode the block on-demand?

Yes! I've also been thinking of this - a "in-file" block store. I think
this makes a lot of sense for Guix but also other things (e.g. sharing
your music collection).

Another problem with IPFS/GNUNet is that they have their own storage. So
even if are clever about storing blocks in Guix, IPFS and GNUNet will
have their own copy of the blocks on disk. I think it would be much
nicer if DHTs/transport layers don't do block storage but are provided
with a callback from where they can get stored blocks. I believe this is
what OpenDHT does
(https://github.com/savoirfairelinux/opendht/wiki/API-Overview).

I think we should propose such a change to the GNUNet R5N specification
(https://lsd.gnunet.org/lsd0004/).

> The database doesn't seem necessary, the substitute server could have
> some end-point
>
>   /publish-this-nar-again-into-IPFS/name-of-the-nar
>
> which, when contacted, inserts the nar again into IPFS.  Then when a
> block was unavailable, the client contacts this end-point and retries.

But for a HTTP block endpoint we would still need such a database/block
storage.

I think it is important that we do not rely on IPFS for block
storage. The decentralized block distribution should work even if the
IPFS daemon is not available.

-pukkamustard
M Feb. 2, 2022, 11:09 a.m. UTC | #8
pukkamustard schreef op wo 02-02-2022 om 09:56 [+0000]:
> Maxime Devos <maximedevos@telenet.be> writes:
> 
> > [[PGP Signed Part:Undecided]]
> > pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]:
> > > I will be looking into the HTTP fallback and also using BitTorrent and GNUNet
> > > as transports.
> > 
> > I have been writing a (Guile) Scheme port of GNUnet's client libraries
> > (https://git.gnunet.org/gnunet-scheme.git/).  Currently only NSE is
> > supported, but I'm working on DHT.  DHT search/put already works to a
> > degree (see examples/web.scm), but there are plenty of sharp edges
> > (see TODOs about disconnecting, reconnecting and stopping fibers,
> > and see guix.scm for Guile bugs that are patched out and extra guile-
> > fibers features).
> 
> Very interesting! I have been following your work on that a bit.
> 
> From what I understand gnunet-scheme interacts with the GNUNet services
> and sends messages to the various GNUNet services. Is that correct?

Yes, it works like the C GNUnet client libraries, except it's in Guile
Scheme and a few different design decisions were made, e.g. w.r.t.
concurrency.

> Have you considered implementing the GNUNet protocols themeselves in
> Guile?  I.e. instead of connecting with the GNUNet services and sending
> messages, implement R5N completely in Guile.

I didn't, at least not _yet_.  As-is, things are already complicated
enough and the client code seems a lot simpler than the service code.
Though perhaps in the future ...

E.g., for testing the DHT service, the test code effectively creates a
tiny, limited, in-memory DHT service (not communicating to any peers)
that's buggy in some respects (not yet committed, but will be in
tests/distributed-hash-table.scm).

> IMHO this would be very nice as one could use GNUNet protocols completely
> in Guile and not rely on the GNUNet C code.

While it's not a priority, I'm not opposed to someday implementing the
services in Guile and testing whether they can communicate with C
peers.

However, keep in mind that GNUnet is supposed to be able to eventually
replace the TCP/IP stack, and running the DHT, NSE, NAT, FS, CADET,
TRANSPORT ... services in every web browser, in every mail client, in
all "guix substitute" and "guix perform-download" processes, etc. is
rather wasteful (memory-wise and CPU-wise), so I'd prefer this not to
be the _default_ option.

(I'm not sure if you were referring to that.)

Greetings,
Maxme.
pukkamustard Feb. 2, 2022, 11:10 a.m. UTC | #9
Maxime Devos <maximedevos@telenet.be> writes:

> [[PGP Signed Part:Undecided]]
> Hi,
>
> Is it possible for the following situation to happen?
> If so, why not?
>
>   1. server A is authentic
>   2. server M is malicious, it tries to trick the client into
>      installing an incorrect substitute
>   3. (key of) server A is authorised
>   4. (key of) server M is _not_ authorised
>   5. server A and M are both in substitute-urls
>   6. server A only serves ‘classical’ substitutes, server B also serves
>      via ERIS+ipfs
>   7. Both A and M set the same FileHash, References, etc. in the
>      narinfo
>   8. However, M set an ERIS URN pointing to a backdoored substitute.
>   9. The client trusts A, and A and B have the same FileHash etc.,
>      so the client considers the narinfo of B to be authentic
>      because it has the same FileHash.
>  10. The client prefers ERIS above HTTP(S), so it downloads via M.
>  11. The client now installed a backdoored substitute!
>
> Greetings,
> Maxime.

No this should not work.

The ERIS URN is only used if the entire narinfo is signed with a
authorized signature. The FileHash is not used when getting substitutes
via ERIS (being able to decode ERIS content implies integrity).

The interesting case that would be allowed with ERIS is following:

1. Server A is authentic and its key is authorized.
2. Servers M1 to MN are potentially malicious and their keys are not
   authorized.
3. Server A and servers M1 to MN are in the substitute-urls.
4. Client gets Narinfo from server A and uses the ERIS URN from there.
5. Client can get blocks simultaneously from Server A and servers M1 to
   MN.
6. Client decodes content with the ERIS URN and can be sure that they
   have the valid substitute.

So client only needs to trust A but can use M1-MN (simultaneously) for
fetching the content.

-pukkamustard
M Feb. 2, 2022, 11:27 a.m. UTC | #10
pukkamustard schreef op wo 02-02-2022 om 10:51 [+0000]:
> > The database doesn't seem necessary, the substitute server could
> > have
> > some end-point
> > 
> >    /publish-this-nar-again-into-IPFS/name-of-the-nar
> > 
> > which, when contacted, inserts the nar again into IPFS.  Then when
> > a
> > block was unavailable, the client contacts this end-point and
> > retries.
> 
> But for a HTTP block endpoint we would still need such a
> database/block
> storage.
> 
> I think it is important that we do not rely on IPFS for block
> storage. The decentralized block distribution should work even if the
> IPFS daemon is not available.

Do we need a database at all?

E.g., if the client cannot download the data in the range [start, end]
because the corresponding block has disappeared, can it not simply
download that range from https://ci.guix.gnu.org/nar/[...]
(not sure about the URI) using a HTTP range request?

(Afterwards, the client should insert the block(s) back into
IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block store’
such that other clients (using the same DHT mechanism) can benefit.)

Greetings,
Maxime.
pukkamustard Feb. 2, 2022, 12:42 p.m. UTC | #11
Maxime Devos <maximedevos@telenet.be> writes:

>> I think it is important that we do not rely on IPFS for block
>> storage. The decentralized block distribution should work even if the
>> IPFS daemon is not available.
>
> Do we need a database at all?
>
> E.g., if the client cannot download the data in the range [start, end]
> because the corresponding block has disappeared, can it not simply
> download that range from https://ci.guix.gnu.org/nar/[...]
> (not sure about the URI) using a HTTP range request?

This does not work as the mapping from block reference to location in
NAR can not be known by the client who only holds the ERIS
URN. Furthermore, some blocks will be intermediary nodes - they hold
references to content blocks (or other intermediary nodes) but not
content itself.

> (Afterwards, the client should insert the block(s) back into
> IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block store’
> such that other clients (using the same DHT mechanism) can benefit.)

It might make sense for some clients to make content available to other
clients and to go trough the extra effort of putting blocks back into
IPFS/GNUNet/whatever. But this should be optional. Maybe we can call
such clients "caching peers"?

IMO A client should by default only deal with things that are strictly
necessary for getting substitutes. The substistute servers (and caching
peers) should make sure substitutes are available to clients, whether
over IPFS/GNUNet/whatever or plain old HTTP.

-pukkamustard
M Feb. 2, 2022, 3:07 p.m. UTC | #12
pukkamustard schreef op wo 02-02-2022 om 12:42 [+0000]:
> > (Afterwards, the client should insert the block(s) back into
> > IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block
> > store’
> > such that other clients (using the same DHT mechanism) can
> > benefit.)
> 
> It might make sense for some clients to make content available to
> other
> clients and to go trough the extra effort of putting blocks back into
> IPFS/GNUNet/whatever. But this should be optional. Maybe we can call
> such clients "caching peers"?
> 
> IMO A client should by default only deal with things that are
> strictly
> necessary for getting substitutes. The substistute servers (and
> caching
> peers) should make sure substitutes are available to clients, whether
> over IPFS/GNUNet/whatever or plain old HTTP.

If re-inserting missing blocks back into the IPFS/GNUnet/whatever is
made optional and is off by default, then almost nobody will enable the
‘caching peer’ option and we will have freeloaders, somewhat defeating
the point of GNUnet/whatever.

In a classic setting (‘plain old HTTP’), serving and downloading is a
separate thing.  But in a P2P setting, downloading cannot be separated
from uploading -- strictly speaking, a peer might be able to download
without uploading (depending on the P2P system), but that's anti-
social, not something that should be done by default.

However, if re-inserting missing blocks is _on_ by default, then there
doesn't seem to be any trouble.

Greetings,
Maxime.
M Feb. 2, 2022, 3:27 p.m. UTC | #13
pukkamustard schreef op wo 02-02-2022 om 12:42 [+0000]:
> > E.g., if the client cannot download the data in the range [start,
> > end]
> > because the corresponding block has disappeared, can it not simply
> > download that range from https://ci.guix.gnu.org/nar/[...]
> > (not sure about the URI) using a HTTP range request?
> 
> This does not work as the mapping from block reference to location in
> NAR can not be known by the client who only holds the ERIS
> URN.

The client not only knows the ERIS URN, it also knows the location of
the nar (over classical HTTP) because it's in the narinfo.

> Furthermore, some blocks will be intermediary nodes - they hold
> references to content blocks (or other intermediary nodes) but not
> content itself.

If an intermediary node (responsible for, say, bytes 900--10000)
is missing, then the bytes 900--10000 could be downloaded via HTTP.
Whether the node is close to the top, or close to the bottom, in ERIS'
variant of Merkle trees, doesn't matter much.

Granted, if the nar is, say, 1 GiB, and the top-level block is missing,
then we'll have to download 1 GiB over HTTP, even if most lower blocks
exist on IPFS/GNUnet/whatever, which isn't really great.

We could also do some combination of the GDBM database and HTTP
Content-Range requests: most nodes are leaf nodes (*).  Instead of
representing all nodes in the database, we could include only
(intermediate) nodes responsible for data of size, say, 4MiB.

(*) At least, that's the case for binary trees, presumably something
similar holds for ERIS.

I don't know the specifics for ERIS, but for (balanced) binary trees,
not storing the leaf nodes would save about 50% (**), which is a rather
nice space saving.

(**) This assumes the ‘block size’ is the size for storing two pointers
to the children, but in practice the block size would be quite a bit
larger, so there would be more space savings?

Perhaps we are overthinking things and the GDBM (***) database isn't
overly large, or perhaps missing blocks are sufficiently rare such that
we could simply download the _entire_ nar from classical HTTP in case
of missing blocks ...

(***) Guix uses SQlite databases, so I would use SQLite instead of GDBM
unless there's a compelling reason to use GDBM instead.

Greetings,
Maxime.
M Feb. 3, 2022, 8:36 p.m. UTC | #14
pukkamustard schreef op wo 02-02-2022 om 11:10 [+0000]:
> The ERIS URN is only used if the entire narinfo is signed with a
> authorized signature.

Perhaps I'm missing something here, but in that case, shouldn't "ERIS"
be added to %mandatory-fields in (guix narinfo)?

Anyway, I don't see what prevents an unauthorised narinfo with a ERIS
URN to be used: the narinfo is chosen with

  (define narinfo
    (lookup-narinfo cache-urls store-item
                    (if (%allow-unauthenticated-substitutes?)
                        (const #t)
                        (cut valid-narinfo? <> acl))))

where lookup-narinfo is a tiny wrapper around lookup-narinfos/diverse.
lookup-narinfos/diverse considers both unauthorised and authorised
narinfos, and can choose an unauthorised narinfo if it's ‘equivalent’
to an authorised narinfo (using equivalent-narinfo?)

equivalent-narinfo? only looks at the hash, path, references and size,
and ignores the ERIS.  As such, an unauthorised narinfo with a
malicious ERIS URN could be selected.

However, it turns out that all this doesn't really matter: whether the
port returned by 'fetch' in (guix scripts substitute) came from
file://, http://, https:// or ERIS, the file hash is verified later
anyway:

                  ;; Compute the actual nar hash as we read it.
                  ((algorithm expected)
                   (narinfo-hash-algorithm+value narinfo))
                  ((hashed get-hash)
                   (open-hash-input-port algorithm input)))

      [...]

      ;; Check whether we got the data announced in NARINFO.
      (let ((actual (get-hash)))
        (if (bytevector=? actual expected)
            [...]

False alarm I guess!

Greetings,
Maxime.
pukkamustard Feb. 4, 2022, 10:20 a.m. UTC | #15
Maxime Devos <maximedevos@telenet.be> writes:

> pukkamustard schreef op wo 02-02-2022 om 11:10 [+0000]:
>> The ERIS URN is only used if the entire narinfo is signed with a
>> authorized signature.
>
> Perhaps I'm missing something here, but in that case, shouldn't "ERIS"
> be added to %mandatory-fields in (guix narinfo)?
>
> Anyway, I don't see what prevents an unauthorised narinfo with a ERIS
> URN to be used: the narinfo is chosen with
>
>   (define narinfo
>     (lookup-narinfo cache-urls store-item
>                     (if (%allow-unauthenticated-substitutes?)
>                         (const #t)
>                         (cut valid-narinfo? <> acl))))
>
> where lookup-narinfo is a tiny wrapper around lookup-narinfos/diverse.
> lookup-narinfos/diverse considers both unauthorised and authorised
> narinfos, and can choose an unauthorised narinfo if it's ‘equivalent’
> to an authorised narinfo (using equivalent-narinfo?)
>
> equivalent-narinfo? only looks at the hash, path, references and size,
> and ignores the ERIS.  As such, an unauthorised narinfo with a
> malicious ERIS URN could be selected.

You're right. I was not aware that parts of unauthorized narinfos are
used when they are deemed equavelent to authorized narinfos with
equivalent-narinfo?.

>
> However, it turns out that all this doesn't really matter: whether the
> port returned by 'fetch' in (guix scripts substitute) came from
> file://, http://, https:// or ERIS, the file hash is verified later
> anyway:
>
>                   ;; Compute the actual nar hash as we read it.
>                   ((algorithm expected)
>                    (narinfo-hash-algorithm+value narinfo))
>                   ((hashed get-hash)
>                    (open-hash-input-port algorithm input)))
>
>       [...]
>
>       ;; Check whether we got the data announced in NARINFO.
>       (let ((actual (get-hash)))
>         (if (bytevector=? actual expected)
>             [...]
>
> False alarm I guess!

Yeah, good that the hash is checked. Still, I think we should not even
try downloading a ERIS URN that is not authorized.

I think adding a check to equivalent-narinfo? that makes sure that the
ERIS URNs are equivalent if present would fix this. wdyt?

-pukkamustard
M Feb. 4, 2022, 4:16 p.m. UTC | #16
pukkamustard schreef op wo 02-02-2022 om 10:51 [+0000]:
> > Also, don't forget to insert this missing block back into
> > IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be
> > available until nothing is available anymore.
> 
> This might be a bit of a burden for users. As you mention the size of
> such a database might become considerable.

At least in GNUnet, there are quota on the size of the datastore
(and presumably, whatever the DHT service uses as database).  When it's
exceeded, old blocks are removed.  So I don't see a burden here,
assuming that the quota aren't overly large by default.

Greetings,
Maxime.