Message ID | 20220125192201.7582-1-pukkamustard@posteo.net |
---|---|
Headers | show |
Series | Decentralized substitute distribution with ERIS | expand |
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: > ** Authorize local substitutes > > We will be running a local substitute server so we need to add the > local > signing key to the list of authorized keys. In the system > configurations: > > #+BEGIN_SRC scheme > (modify-services %base-services > (guix-service-type config => [...])) > #+END_SRC > [...] > ** Start the IPFS daemon > > #+BEGIN_SRC shell > guix shell go-ipfs -- ipfs daemon > #+END_SRC There's an ipfs-service-type nowadays, so starting the daemon manually isn't required (if using Guix System). Greetings, Maxime.
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: > I will be looking into the HTTP fallback and also using BitTorrent and GNUNet > as transports. I have been writing a (Guile) Scheme port of GNUnet's client libraries (https://git.gnunet.org/gnunet-scheme.git/). Currently only NSE is supported, but I'm working on DHT. DHT search/put already works to a degree (see examples/web.scm), but there are plenty of sharp edges (see TODOs about disconnecting, reconnecting and stopping fibers, and see guix.scm for Guile bugs that are patched out and extra guile- fibers features). Tests are being written a edge cases will be addressed. Greetings, Maxime.
Hi, Is it possible for the following situation to happen? If so, why not? 1. server A is authentic 2. server M is malicious, it tries to trick the client into installing an incorrect substitute 3. (key of) server A is authorised 4. (key of) server M is _not_ authorised 5. server A and M are both in substitute-urls 6. server A only serves ‘classical’ substitutes, server B also serves via ERIS+ipfs 7. Both A and M set the same FileHash, References, etc. in the narinfo 8. However, M set an ERIS URN pointing to a backdoored substitute. 9. The client trusts A, and A and B have the same FileHash etc., so the client considers the narinfo of B to be authentic because it has the same FileHash. 10. The client prefers ERIS above HTTP(S), so it downloads via M. 11. The client now installed a backdoored substitute! Greetings, Maxime.
pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: > I have only tested this for fairly small packages (up to a few MB). > > One issue with IPFS might be that we have to create a new HTTP connection to > the IPFS daemon for every single block (32KiB). The IPFS daemon does not seem > to support HTTP connection re-use According to <https://github.com/ipfs/go-ipfs/issues/3767>, the IPFS daemon supports connection reuse according to some people and doesn't according to other people. > and neither does the Guile (web client). Guix supports connection reuse, see 'call-with-cached-connection' in (guix scripts substitute). > I fear this might become a performance issue. IIUC, the performance problem primarily lies in the round-tripping between the client and the server. If the client and the server are on the same machine, then this round trip time is presumably small compared to, say, localhost contacting ci.guix.gnu.org. Still, connection reuse would be nice. > It seems possible to use IPFS more directly by exposing the Go code as a > C library and then using that with the Guile FFI [1]. This is however a bit > complicated and adds a lot of dependencies. In particular, this should not become > a dependency of Guix itself. The performance of IPFS itself also needs to be > evaluated, maybe the IPFS HTTP API will not be the bottle-neck. Security-wise, libipfs doesn't seem great: libipfs starts the IPFS daemon inside the process and guix/scripts/substitute.scm is run as root. > As mentioned in previous mail a simple HTTP transport for blocks would be a > good fallback. This would allow users to get missing blocks (things that > somehow got dropped from IPFS) directly from a substitute server. [...] Seems a good idea to me -- DHTs can be unreliable. I presume this will be implemented with some kind of timeout: if no block is received within N seconds, fallback to HTTP? Also, don't forget to insert this missing block back into IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be available until nothing is available anymore. > In any case, it would be necessary for the substitute server to store encoded > blocks of the NAR. For this I think it makes sense to use a small database. We > have bindings to use ERIS with GDBM [2]. It might also make sense to use > SQLite, especially if there are other use-cases for such a database. Wouldn't this be a huge database? IIRC, according to logs.guix.gnu.org the size of the nars of the substitute servers are somewhere in the 200G-2T range or something like that. To reduce the size of the database, perhaps you could let the database be a mapping from block ids to the name of the nar + the position in the nar, and encode the block on-demand? The database doesn't seem necessary, the substitute server could have some end-point /publish-this-nar-again-into-IPFS/name-of-the-nar which, when contacted, inserts the nar again into IPFS. Then when a block was unavailable, the client contacts this end-point and retries. Greetings, Maxime.
Hi Maxime, Maxime Devos <maximedevos@telenet.be> writes: > There's an ipfs-service-type nowadays, so starting the daemon manually > isn't required (if using Guix System). Good point. Starting the daemon manually is only necessary if you don't use the service. I don't use the IPFS service. -pukkamustard
Maxime Devos <maximedevos@telenet.be> writes: > [[PGP Signed Part:Undecided]] > pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: >> I will be looking into the HTTP fallback and also using BitTorrent and GNUNet >> as transports. > > I have been writing a (Guile) Scheme port of GNUnet's client libraries > (https://git.gnunet.org/gnunet-scheme.git/). Currently only NSE is > supported, but I'm working on DHT. DHT search/put already works to a > degree (see examples/web.scm), but there are plenty of sharp edges > (see TODOs about disconnecting, reconnecting and stopping fibers, > and see guix.scm for Guile bugs that are patched out and extra guile- > fibers features). Very interesting! I have been following your work on that a bit. From what I understand gnunet-scheme interacts with the GNUNet services and sends messages to the various GNUNet services. Is that correct? Have you considered implementing the GNUNet protocols themeselves in Guile? I.e. instead of connecting with the GNUNet services and sending messages, implement R5N completely in Guile. IMHO this would be very nice as one could use GNUNet protocols completely in Guile and not rely on the GNUNet C code. I believe this is somewhat the direction being taken with the GNUNet Go implementation (https://github.com/bfix/gnunet-go) and also in line with recent efforts to specify the individual GNUNet components and protocols more independantly of one another (e.g. R5N is specified to work over IP - https://lsd.gnunet.org/lsd0004/). -pukkamustard
Maxime Devos <maximedevos@telenet.be> writes: > [[PGP Signed Part:Undecided]] > pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: >> I have only tested this for fairly small packages (up to a few MB). >> >> One issue with IPFS might be that we have to create a new HTTP connection to >> the IPFS daemon for every single block (32KiB). The IPFS daemon does not seem >> to support HTTP connection re-use > > According to <https://github.com/ipfs/go-ipfs/issues/3767>, the IPFS > daemon supports connection reuse according to some people and doesn't > according to other people. Hm, from what I understand connection re-use is something introduced in HTTP/2 and go-ipfs does not do HTTP/2 (https://github.com/ipfs/go-ipfs/issues/5974). >> and neither does the Guile (web client). > > Guix supports connection reuse, see 'call-with-cached-connection' > in (guix scripts substitute). Ah ok. Cool! >> I fear this might become a performance issue. > > IIUC, the performance problem primarily lies in the round-tripping > between the client and the server. If the client and the server are on > the same machine, then this round trip time is presumably small > compared to, say, localhost contacting ci.guix.gnu.org. > > Still, connection reuse would be nice. Remains to be seen if this is a problem. It is considerably more pronounced than with regular usage of IPFS as we make a HTTP request to IPFS for every 32KiB block instead of for an entire file (what most people do when using the IPFS daemon). >> It seems possible to use IPFS more directly by exposing the Go code as a >> C library and then using that with the Guile FFI [1]. This is however a bit >> complicated and adds a lot of dependencies. In particular, this should not become >> a dependency of Guix itself. The performance of IPFS itself also needs to be >> evaluated, maybe the IPFS HTTP API will not be the bottle-neck. > > Security-wise, libipfs doesn't seem great: libipfs starts the IPFS > daemon inside the process and guix/scripts/substitute.scm is run > as root. I agree. >> As mentioned in previous mail a simple HTTP transport for blocks would be a >> good fallback. This would allow users to get missing blocks (things that >> somehow got dropped from IPFS) directly from a substitute server. [...] > > Seems a good idea to me -- DHTs can be unreliable. I presume this will > be implemented with some kind of timeout: if no block is received > within N seconds, fallback to HTTP? Yes, exactly. > Also, don't forget to insert this missing block back into > IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be > available until nothing is available anymore. This might be a bit of a burden for users. As you mention the size of such a database might become considerable. >> In any case, it would be necessary for the substitute server to store encoded >> blocks of the NAR. For this I think it makes sense to use a small database. We >> have bindings to use ERIS with GDBM [2]. It might also make sense to use >> SQLite, especially if there are other use-cases for such a database. > > Wouldn't this be a huge database? IIRC, according to logs.guix.gnu.org > the size of the nars of the substitute servers are somewhere in the > 200G-2T range or something like that. > > To reduce the size of the database, perhaps you could let the database > be a mapping from block ids to the name of the nar + the position in > the nar, and encode the block on-demand? Yes! I've also been thinking of this - a "in-file" block store. I think this makes a lot of sense for Guix but also other things (e.g. sharing your music collection). Another problem with IPFS/GNUNet is that they have their own storage. So even if are clever about storing blocks in Guix, IPFS and GNUNet will have their own copy of the blocks on disk. I think it would be much nicer if DHTs/transport layers don't do block storage but are provided with a callback from where they can get stored blocks. I believe this is what OpenDHT does (https://github.com/savoirfairelinux/opendht/wiki/API-Overview). I think we should propose such a change to the GNUNet R5N specification (https://lsd.gnunet.org/lsd0004/). > The database doesn't seem necessary, the substitute server could have > some end-point > > /publish-this-nar-again-into-IPFS/name-of-the-nar > > which, when contacted, inserts the nar again into IPFS. Then when a > block was unavailable, the client contacts this end-point and retries. But for a HTTP block endpoint we would still need such a database/block storage. I think it is important that we do not rely on IPFS for block storage. The decentralized block distribution should work even if the IPFS daemon is not available. -pukkamustard
pukkamustard schreef op wo 02-02-2022 om 09:56 [+0000]: > Maxime Devos <maximedevos@telenet.be> writes: > > > [[PGP Signed Part:Undecided]] > > pukkamustard schreef op di 25-01-2022 om 19:21 [+0000]: > > > I will be looking into the HTTP fallback and also using BitTorrent and GNUNet > > > as transports. > > > > I have been writing a (Guile) Scheme port of GNUnet's client libraries > > (https://git.gnunet.org/gnunet-scheme.git/). Currently only NSE is > > supported, but I'm working on DHT. DHT search/put already works to a > > degree (see examples/web.scm), but there are plenty of sharp edges > > (see TODOs about disconnecting, reconnecting and stopping fibers, > > and see guix.scm for Guile bugs that are patched out and extra guile- > > fibers features). > > Very interesting! I have been following your work on that a bit. > > From what I understand gnunet-scheme interacts with the GNUNet services > and sends messages to the various GNUNet services. Is that correct? Yes, it works like the C GNUnet client libraries, except it's in Guile Scheme and a few different design decisions were made, e.g. w.r.t. concurrency. > Have you considered implementing the GNUNet protocols themeselves in > Guile? I.e. instead of connecting with the GNUNet services and sending > messages, implement R5N completely in Guile. I didn't, at least not _yet_. As-is, things are already complicated enough and the client code seems a lot simpler than the service code. Though perhaps in the future ... E.g., for testing the DHT service, the test code effectively creates a tiny, limited, in-memory DHT service (not communicating to any peers) that's buggy in some respects (not yet committed, but will be in tests/distributed-hash-table.scm). > IMHO this would be very nice as one could use GNUNet protocols completely > in Guile and not rely on the GNUNet C code. While it's not a priority, I'm not opposed to someday implementing the services in Guile and testing whether they can communicate with C peers. However, keep in mind that GNUnet is supposed to be able to eventually replace the TCP/IP stack, and running the DHT, NSE, NAT, FS, CADET, TRANSPORT ... services in every web browser, in every mail client, in all "guix substitute" and "guix perform-download" processes, etc. is rather wasteful (memory-wise and CPU-wise), so I'd prefer this not to be the _default_ option. (I'm not sure if you were referring to that.) Greetings, Maxme.
Maxime Devos <maximedevos@telenet.be> writes: > [[PGP Signed Part:Undecided]] > Hi, > > Is it possible for the following situation to happen? > If so, why not? > > 1. server A is authentic > 2. server M is malicious, it tries to trick the client into > installing an incorrect substitute > 3. (key of) server A is authorised > 4. (key of) server M is _not_ authorised > 5. server A and M are both in substitute-urls > 6. server A only serves ‘classical’ substitutes, server B also serves > via ERIS+ipfs > 7. Both A and M set the same FileHash, References, etc. in the > narinfo > 8. However, M set an ERIS URN pointing to a backdoored substitute. > 9. The client trusts A, and A and B have the same FileHash etc., > so the client considers the narinfo of B to be authentic > because it has the same FileHash. > 10. The client prefers ERIS above HTTP(S), so it downloads via M. > 11. The client now installed a backdoored substitute! > > Greetings, > Maxime. No this should not work. The ERIS URN is only used if the entire narinfo is signed with a authorized signature. The FileHash is not used when getting substitutes via ERIS (being able to decode ERIS content implies integrity). The interesting case that would be allowed with ERIS is following: 1. Server A is authentic and its key is authorized. 2. Servers M1 to MN are potentially malicious and their keys are not authorized. 3. Server A and servers M1 to MN are in the substitute-urls. 4. Client gets Narinfo from server A and uses the ERIS URN from there. 5. Client can get blocks simultaneously from Server A and servers M1 to MN. 6. Client decodes content with the ERIS URN and can be sure that they have the valid substitute. So client only needs to trust A but can use M1-MN (simultaneously) for fetching the content. -pukkamustard
pukkamustard schreef op wo 02-02-2022 om 10:51 [+0000]: > > The database doesn't seem necessary, the substitute server could > > have > > some end-point > > > > /publish-this-nar-again-into-IPFS/name-of-the-nar > > > > which, when contacted, inserts the nar again into IPFS. Then when > > a > > block was unavailable, the client contacts this end-point and > > retries. > > But for a HTTP block endpoint we would still need such a > database/block > storage. > > I think it is important that we do not rely on IPFS for block > storage. The decentralized block distribution should work even if the > IPFS daemon is not available. Do we need a database at all? E.g., if the client cannot download the data in the range [start, end] because the corresponding block has disappeared, can it not simply download that range from https://ci.guix.gnu.org/nar/[...] (not sure about the URI) using a HTTP range request? (Afterwards, the client should insert the block(s) back into IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block store’ such that other clients (using the same DHT mechanism) can benefit.) Greetings, Maxime.
Maxime Devos <maximedevos@telenet.be> writes: >> I think it is important that we do not rely on IPFS for block >> storage. The decentralized block distribution should work even if the >> IPFS daemon is not available. > > Do we need a database at all? > > E.g., if the client cannot download the data in the range [start, end] > because the corresponding block has disappeared, can it not simply > download that range from https://ci.guix.gnu.org/nar/[...] > (not sure about the URI) using a HTTP range request? This does not work as the mapping from block reference to location in NAR can not be known by the client who only holds the ERIS URN. Furthermore, some blocks will be intermediary nodes - they hold references to content blocks (or other intermediary nodes) but not content itself. > (Afterwards, the client should insert the block(s) back into > IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block store’ > such that other clients (using the same DHT mechanism) can benefit.) It might make sense for some clients to make content available to other clients and to go trough the extra effort of putting blocks back into IPFS/GNUNet/whatever. But this should be optional. Maybe we can call such clients "caching peers"? IMO A client should by default only deal with things that are strictly necessary for getting substitutes. The substistute servers (and caching peers) should make sure substitutes are available to clients, whether over IPFS/GNUNet/whatever or plain old HTTP. -pukkamustard
pukkamustard schreef op wo 02-02-2022 om 12:42 [+0000]: > > (Afterwards, the client should insert the block(s) back into > > IPFS/GNUnet/whatever, maybe using this proposed ‘in-file block > > store’ > > such that other clients (using the same DHT mechanism) can > > benefit.) > > It might make sense for some clients to make content available to > other > clients and to go trough the extra effort of putting blocks back into > IPFS/GNUNet/whatever. But this should be optional. Maybe we can call > such clients "caching peers"? > > IMO A client should by default only deal with things that are > strictly > necessary for getting substitutes. The substistute servers (and > caching > peers) should make sure substitutes are available to clients, whether > over IPFS/GNUNet/whatever or plain old HTTP. If re-inserting missing blocks back into the IPFS/GNUnet/whatever is made optional and is off by default, then almost nobody will enable the ‘caching peer’ option and we will have freeloaders, somewhat defeating the point of GNUnet/whatever. In a classic setting (‘plain old HTTP’), serving and downloading is a separate thing. But in a P2P setting, downloading cannot be separated from uploading -- strictly speaking, a peer might be able to download without uploading (depending on the P2P system), but that's anti- social, not something that should be done by default. However, if re-inserting missing blocks is _on_ by default, then there doesn't seem to be any trouble. Greetings, Maxime.
pukkamustard schreef op wo 02-02-2022 om 12:42 [+0000]: > > E.g., if the client cannot download the data in the range [start, > > end] > > because the corresponding block has disappeared, can it not simply > > download that range from https://ci.guix.gnu.org/nar/[...] > > (not sure about the URI) using a HTTP range request? > > This does not work as the mapping from block reference to location in > NAR can not be known by the client who only holds the ERIS > URN. The client not only knows the ERIS URN, it also knows the location of the nar (over classical HTTP) because it's in the narinfo. > Furthermore, some blocks will be intermediary nodes - they hold > references to content blocks (or other intermediary nodes) but not > content itself. If an intermediary node (responsible for, say, bytes 900--10000) is missing, then the bytes 900--10000 could be downloaded via HTTP. Whether the node is close to the top, or close to the bottom, in ERIS' variant of Merkle trees, doesn't matter much. Granted, if the nar is, say, 1 GiB, and the top-level block is missing, then we'll have to download 1 GiB over HTTP, even if most lower blocks exist on IPFS/GNUnet/whatever, which isn't really great. We could also do some combination of the GDBM database and HTTP Content-Range requests: most nodes are leaf nodes (*). Instead of representing all nodes in the database, we could include only (intermediate) nodes responsible for data of size, say, 4MiB. (*) At least, that's the case for binary trees, presumably something similar holds for ERIS. I don't know the specifics for ERIS, but for (balanced) binary trees, not storing the leaf nodes would save about 50% (**), which is a rather nice space saving. (**) This assumes the ‘block size’ is the size for storing two pointers to the children, but in practice the block size would be quite a bit larger, so there would be more space savings? Perhaps we are overthinking things and the GDBM (***) database isn't overly large, or perhaps missing blocks are sufficiently rare such that we could simply download the _entire_ nar from classical HTTP in case of missing blocks ... (***) Guix uses SQlite databases, so I would use SQLite instead of GDBM unless there's a compelling reason to use GDBM instead. Greetings, Maxime.
pukkamustard schreef op wo 02-02-2022 om 11:10 [+0000]: > The ERIS URN is only used if the entire narinfo is signed with a > authorized signature. Perhaps I'm missing something here, but in that case, shouldn't "ERIS" be added to %mandatory-fields in (guix narinfo)? Anyway, I don't see what prevents an unauthorised narinfo with a ERIS URN to be used: the narinfo is chosen with (define narinfo (lookup-narinfo cache-urls store-item (if (%allow-unauthenticated-substitutes?) (const #t) (cut valid-narinfo? <> acl)))) where lookup-narinfo is a tiny wrapper around lookup-narinfos/diverse. lookup-narinfos/diverse considers both unauthorised and authorised narinfos, and can choose an unauthorised narinfo if it's ‘equivalent’ to an authorised narinfo (using equivalent-narinfo?) equivalent-narinfo? only looks at the hash, path, references and size, and ignores the ERIS. As such, an unauthorised narinfo with a malicious ERIS URN could be selected. However, it turns out that all this doesn't really matter: whether the port returned by 'fetch' in (guix scripts substitute) came from file://, http://, https:// or ERIS, the file hash is verified later anyway: ;; Compute the actual nar hash as we read it. ((algorithm expected) (narinfo-hash-algorithm+value narinfo)) ((hashed get-hash) (open-hash-input-port algorithm input))) [...] ;; Check whether we got the data announced in NARINFO. (let ((actual (get-hash))) (if (bytevector=? actual expected) [...] False alarm I guess! Greetings, Maxime.
Maxime Devos <maximedevos@telenet.be> writes: > pukkamustard schreef op wo 02-02-2022 om 11:10 [+0000]: >> The ERIS URN is only used if the entire narinfo is signed with a >> authorized signature. > > Perhaps I'm missing something here, but in that case, shouldn't "ERIS" > be added to %mandatory-fields in (guix narinfo)? > > Anyway, I don't see what prevents an unauthorised narinfo with a ERIS > URN to be used: the narinfo is chosen with > > (define narinfo > (lookup-narinfo cache-urls store-item > (if (%allow-unauthenticated-substitutes?) > (const #t) > (cut valid-narinfo? <> acl)))) > > where lookup-narinfo is a tiny wrapper around lookup-narinfos/diverse. > lookup-narinfos/diverse considers both unauthorised and authorised > narinfos, and can choose an unauthorised narinfo if it's ‘equivalent’ > to an authorised narinfo (using equivalent-narinfo?) > > equivalent-narinfo? only looks at the hash, path, references and size, > and ignores the ERIS. As such, an unauthorised narinfo with a > malicious ERIS URN could be selected. You're right. I was not aware that parts of unauthorized narinfos are used when they are deemed equavelent to authorized narinfos with equivalent-narinfo?. > > However, it turns out that all this doesn't really matter: whether the > port returned by 'fetch' in (guix scripts substitute) came from > file://, http://, https:// or ERIS, the file hash is verified later > anyway: > > ;; Compute the actual nar hash as we read it. > ((algorithm expected) > (narinfo-hash-algorithm+value narinfo)) > ((hashed get-hash) > (open-hash-input-port algorithm input))) > > [...] > > ;; Check whether we got the data announced in NARINFO. > (let ((actual (get-hash))) > (if (bytevector=? actual expected) > [...] > > False alarm I guess! Yeah, good that the hash is checked. Still, I think we should not even try downloading a ERIS URN that is not authorized. I think adding a check to equivalent-narinfo? that makes sure that the ERIS URNs are equivalent if present would fix this. wdyt? -pukkamustard
pukkamustard schreef op wo 02-02-2022 om 10:51 [+0000]: > > Also, don't forget to insert this missing block back into > > IPFS/GNUnet/BitTorrent/..., otherwise less and less blocks will be > > available until nothing is available anymore. > > This might be a bit of a burden for users. As you mention the size of > such a database might become considerable. At least in GNUnet, there are quota on the size of the datastore (and presumably, whatever the DHT service uses as database). When it's exceeded, old blocks are removed. So I don't see a burden here, assuming that the quota aren't overly large by default. Greetings, Maxime.