mbox

[bug#44199,0/1] An origin method for GNUnet FS URI's

Message ID 5c72bcb9c86934deda97d952eb5cd459e615b313.camel@student.kuleuven.be
Headers show

Message

Maxime Devos Oct. 24, 2020, 7:47 p.m. UTC
This patch defines a `gnunet-fetch' method, allowing for downloading
files from GNUnet by their GNUnet chk-URI.

This patch does not provide:
- a service configuration
- downloading substitutes from GNUnet
- fall-back to non-P2P (e.g. http://) or other P2P (e.g. ipfs://)
  systems
- downloading directories over GNUnet
- actual packages definitions using this method

Some issues and questions:
- (guix build gnunet) would call call-with-temporary-output-file
  from (guix utils), which isn't available when building derivations,
  so it has been copied to (guix build gnunet).  Is there any
  particular reason for call-with-temporary-output-file to be in
  (guix utils) and not (guix build utils)?
- Would it be possible somehow for url-fetch to support gnunet://fs/chk
  URIs? That way we could fall-back unto non-P2P URLs, which would be
  useful to bootstrap a P2P distribution from a non-P2P system.
- No timeouts have been implemented, so gnunet-download may spin
  forever if a source isn't available on GNUnet FS.

Some problematic points:
- (guix gnunet-download) calls gnunet-config from $PATH,
  to figure out connection details for (guix build gnunet)
- (guix build gnunet) requires the GNUnet FS daemon to bind
  to loopback, whereas a standard GNUnet setup would have
  the daemon bound to a Unix socket.

Example usage:

First make the GNUnet FS daemon accessible to Guix:
$ guix install gnunet
$ gnunet-config -s fs -o port -V 2094
$ gnunet-arm -s

Then publish the source tarball of the package to the GNUnet FS system:
$ guix environment --ad-hoc wget -- wget 
https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz
$ gnunet-publish hello-2.10.tar.gz

The output should look like this:

> Publishing `$PWD/hello-2.10.tar.gz' done.
> URI is
> `gnunet://fs/chk/TY48PGS5RVX643NT2B7GDNFCBT4DWG692PF4YNHERR96K6MSFRZ4
> ZWRPQ4KVKZV29MGRZTWAMY9ETTST4B6VFM47JR2JS5PWBTPVXB0.8A9HRYABJ7HDA7B0P
> 37VG6D593>

The following test package can now be compiled:

$ cat > example.scm <<EOF
(define-module (example)
  #:use-module ((guix licenses)
                #:select (gpl3+))
  #:use-module (gnu packages)
  #:use-module (guix packages)
  #:use-module (guix utils)
  #:use-module (guix gnunet-download)
  #:use-module (guix build-system gnu)
  #:export (hello/gnunet))

(define-public hello/gnunet
  (package
    (name "hello-gnunet")
    (version "2.10")
    (source (origin
              (method gnunet-fetch)
              (uri
"gnunet://fs/chk/TY48PGS5RVX643NT2B7GDNFCBT4DWG692PF4YNHERR96K6MSFRZ4ZW
RPQ4KVKZV29MGRZTWAMY9ETTST4B6VFM47JR2JS5PWBTPVXB0.8A9HRYABJ7HDA7B0>
              (file-name "gnunet-hello-2.10.tar.gz")
              (sha256
               (base32
                "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i")
)))
    (build-system gnu-build-system)
    (synopsis "Hello, GNUnet world! An example of a package with a
GNUnet chk-URI origin")
    (description
     "GNU Hello prints the message \"Hello, world!\" and then
exits.  It
serves as an example of standard GNU coding practices.  As such, it
supports
command-line arguments, multiple languages, and so on.")
    (home-page "https://www.gnu.org/software/hello/")
    (license gpl3+)))
hello/gnunet
EOF

$ ./pre-inst-env guix build -f example.scm

Maxime Devos (1):
  guix: Add (guix gnunet-download).

 Makefile.am              |   2 +
 doc/guix.texi            |   7 +++
 guix/build/gnunet.scm    | 113 +++++++++++++++++++++++++++++++++++++++
 guix/gnunet-download.scm |  89 ++++++++++++++++++++++++++++++
 4 files changed, 211 insertions(+)
 create mode 100644 guix/build/gnunet.scm
 create mode 100644 guix/gnunet-download.scm

Comments

Simon Tournier Oct. 27, 2020, 1:39 p.m. UTC | #1
Dear,

Thank you for the patch.  My questions are totally naive since I do not
know much about GNUnet.


On Sat, 24 Oct 2020 at 21:47, Maxime Devos <maxime.devos@student.kuleuven.be> wrote:
> This patch defines a `gnunet-fetch' method, allowing for downloading
> files from GNUnet by their GNUnet chk-URI.
>
> This patch does not provide:
> - a service configuration
> - downloading substitutes from GNUnet
> - fall-back to non-P2P (e.g. http://) or other P2P (e.g. ipfs://)
>   systems
> - downloading directories over GNUnet

This means it only works for archives as tarball, right?


> - actual packages definitions using this method
>
> Some issues and questions:

[...]

> - Would it be possible somehow for url-fetch to support gnunet://fs/chk
>   URIs? That way we could fall-back unto non-P2P URLs, which would be
>   useful to bootstrap a P2P distribution from a non-P2P system.

Who is the “we”?  What do you mean by “url-fetch supports gnunet:// and
fall-back unto non-P2P”?

Some recent discussions are about content-address and fallback.  For
example, roughly speaking ’git-fetch’ tries upstream, then the Guix
build farms, then Software Heritage (swh).  For Git repo, it works
because the address from Guix side to SWH is straightforward.  The 2
other VCS –hg and svn– supported by SWH should be implemented soon… who
knows! ;-)

The story about archives as tarball is a bit more complicated.  The main
issue –as I understand it– can be summarized as: Guix knows the URL, the
integrity checksum and only at package time the content of the tarball.
Later in time, it is difficult to lookup because of this very address;
and some are around: nar, swh-id, ipfs, gnunet, etc.

Bridges to reassemble the content are currently discussed, e.g.,

   <https://git.ngyro.com/disarchive-db/>
   <https://git.ngyro.com/disarchive>

Well, today the fallback of tarball archive to SWH is not reliable.


What is your question? ;-)


> Then publish the source tarball of the package to the GNUnet FS system:
> $ guix environment --ad-hoc wget -- wget 
> https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz
> $ gnunet-publish hello-2.10.tar.gz

Naive question:  are packages only available on GNUnet?


All the best,
simon
Maxime Devos Nov. 1, 2020, 12:05 a.m. UTC | #2
[CC'd to Timothy Sample because of discussion of defining a new format
for disarchive, and to gnunet-developers because of obvious reasons]

A small status update!

zimoun schreef op di 27-10-2020 om 14:39 [+0100]:
> [...]
> 
> The story about archives as tarball is a bit more complicated.  The
> main
> issue –as I understand it– can be summarized as: Guix knows the URL,
> the
> integrity checksum and only at package time the content of the
> tarball.
> Later in time, it is difficult to lookup because of this very
> address;
> and some are around: nar, swh-id, ipfs, gnunet, etc.
> 
> Bridges to reassemble the content are currently discussed, e.g.,
> 
>    <https://git.ngyro.com/disarchive-db/>
>    <https://git.ngyro.com/disarchive>
> 
> Well, today the fallback of tarball archive to SWH is not reliable.
> 
> 
> What is your question? ;-)

I looked a bit into the GNUnet FS code and disarchive discussions. The
part about tarballs seemed particularily relevant, as well as some
older discussion on preserving the executable bit when using IPFS.

Some issues with using GNUnet's directory format in GNUnet for Guix
substitutes to address:

* directory entries are not placed in any particular order.
  Solution: sort by file-name

* there is no executable bit.
  Solution: define a new metadata property (*).
  This should only take a small patch to libextractor.

  (*) Not sure about the correct terminology

* GNUnet sometimes inlines small files in directories,
  but strictly speaking when to do so is left up to the implementation.
  Solution: pick a fixed reference implementation.

* By default, when publishing, gnunet-publish uses libextractor
  to figure out some meta-data (e.g. title, mime-type, album name),
  which may return different meta-data depending on the implementation.

  Solution: disable the use of libextractor, at least when GNUnet is
  used by Guix.

I'm currently porting the directory creation code of GNUnet to Scheme
(but not any other GNUnet code), to be used by Guix (for publishing
substitutes) and disarchive (for reconstructing GNUnet directories).

After addressing these issues, I believe I will end up with a fairly
well-defined archive format.

<friendly-footer/>
Ludovic Courtès Nov. 15, 2020, 9:13 p.m. UTC | #3
Hi Maxime,

Maxime Devos <maxime.devos@student.kuleuven.be> skribis:

> This patch defines a `gnunet-fetch' method, allowing for downloading
> files from GNUnet by their GNUnet chk-URI.

While I think this is a laudable goal, I’m reluctant to including GNUnet
support just yet because, as stated in recent release announcements,
GNUnet is still in flux and not considered “production ready”.

So I think we should keep it around and revisit this issue when GNUnet
is considered “stable”.  WDYT?

Thanks,
Ludo’.
Simon Tournier Nov. 16, 2020, 12:35 a.m. UTC | #4
Hi Maxim,

Thanks for your detailed answer.

You might be interested by the coming oneline Guix Days conference on
Sun. 22nd 2020.  A session is specifically dedicated to a related topic:
How to distribute P2P?  Please browse this week the blog guix.gnu.org
for the details.


On Tue, 27 Oct 2020 at 19:50, Maxime Devos <maxime.devos@student.kuleuven.be> wrote:

> Q: How does Guix figure out the GNUnet URI from the Guix (or nar, I
>    guess) hash?
> A: Not automatically.  The gnunet-fetch method as defined in this
>    patch needs to be passed the URI manually. However, an additional
>    service for GNUnet can be written that uses the DHT to map Guix
>    (or nar, or something else) hashes to corresponding GNUnet URI's.

From my understanding, this is a show stopper.  It has to be solved
first going further, IMHO.  It is not possible to write manually the URI
for all the packages.  And as perhaps you read with the project
’disassemble’, it is not straightforward.


> Q: What about automatically generated tarballs (e.g. from git
>    repositories)?
> A: Not addressed by this patch. The intention is to be able to replace
>    a http://*/*.tar.gz URL with a gnunet://fs/chk URI in package
>    definitions; source code repositories aren't supported by this
>    patch. (But perhaps a future patch could support this!)

I think this is the main issue: it is not affordable to replace for some
packages the current http:// by gnunet://.  Especially when GNUnet is
not “stable“.


> * Is package *source code* only available, on *GNUnet*?
>
>   If someone published the source code (e.g. as a tarball) on GNUnet
>   with `gnunet-publish hello-2.10.tar.gz`, it is only published(*) on
>   GNUnet, and not somewhere else as well.
>
>   (*) I don't know exactly when one can be reasonably sure the file
>   will *remain* available for some time when published, except for
>   keeping the GNUnet daemon running continuously.
>
>   However, in practice, $MAINTAINER will publish the source code
>   somewhere else as well (e.g. <https://ftp.gnu.org> or perhaps ipfs).
>   This patch doesn't automatically publish source code of built or
>   downloaded packages on GNUnet, although that seems a useful service 
>   to run as a daemon.

Therefore the corollary question is: how many tarballs currently used as
source by Guix are also available on GNUnet?


Thank you for your interest.  And again I invite you to join the
discussion about P2P and Guix this Sunday 22nd.  Read on the Guix
blog. :-)


All the best,
simon
Maxime Devos Nov. 18, 2020, 8:28 p.m. UTC | #5
Hi,

(btw it's Maxim*e*, not Maxim. The ‘e’ isn't pronounced but it's still
written.)

I'll try to address the various issues in separate e-mails.

zimoun schreef op ma 16-11-2020 om 01:35 [+0100
> [snip]
> From my understanding, this is a show stopper.  It has to be solved
first going further, IMHO.  It is not possible to write manually the
URI for all the packages.  And as perhaps you read with the project
’disassemble’, it is not straightforward.

I agree! I see three straightforward answers to this.

a) Fancy

Write a GNUnet service using the DHT to map the hashes used in origin
specifications (*) to URI's for the FS system. To let the local
contribution to the DHT survive peer restarts, maintain a database
(e.g. SQlite) of (Guix hash -> GNUnet hash) (^), that is expanded with
each successful source (or binary) substitution or build.

(Alternatively, as the DHT isn't anonymous,
place hash -> GNUnet hash references into some well-known name space.
Then hash lookup + FS should automatically be anonymous when desired.)

Possible issues: time out behaviour, the DHT is not anonymous.
Annoyance: probably requires extending the build daemon.

Perhaps try regular downloads (e.g. via HTTP/S, ftp, ...) in parallel
with the GNUnet download after a configurable delay?
Perhaps use a well-known GNUnet FS namespace instead of the DHT
for anonymous downloads?

(*) Also usable for package outputs, if the hash of the output is used
and not the hash of the outputs 
(^) In case the database is full, delete some old entries

b) Simple, slow introduction (no additional GNUnet services required)

Extend (origin ...) with an optional gnunet-hash field.
Adjust ‘guix download’, ‘guix refresh’ and ‘guix import’
to emit the gnunet-hash (%) field. Plumb this field to the guix daemon
somehow. Same approach is possible for IPFS.

As packages are updated and new packages are defined, given sufficient
time, there will be more packages with a gnunet-hash field than not.

(%) Computing the gnunet-hash of a directory doesn't require
a full-fledged GNUnet installation. My scheme-gnunet repository
is not very far from the point where it can convert file trees +
libextractor metadata into bytevectors, without depending on C gnunet.

A TODO: different zlib's
would produce different bytevectors --> different GNUnet hash
--> perhaps always use a single version.
A TODO (for nix archives on GNUnet): define
EXTRACTOR_METATYPE_EXECUTABLE (or mimetype: application/x-executable).
Perhaps use mimetype: x-inode/symlink (or something like that) as well?
Repository URL: https://notabug.org/mdevos/scheme-gnunet

c) Not scalable, but may reduce network traffic to ci.guix.gnu.org & co

Like in a) keep a database of known (Guix hash -> GNUnet FS URI).
Perhaps make this available through a web interface or git repository
... wait, this sounds familiar ... this seems to fit well into the
‘disarchive’ project!

Greetings,
Maxime
Simon Tournier Nov. 18, 2020, 10:42 p.m. UTC | #6
Hi Maxime,

On Wed, 18 Nov 2020 at 20:14, Maxime Devos <maxime.devos@student.kuleuven.be> wrote:
> Ludovic Courtès schreef op zo 15-11-2020 om 22:13 [+0100]:

>> [snip]
>> While I think this is a laudable goal, I’m reluctant to including
>> GNUnet
>> support just yet because, as stated in recent release announcements,
>> GNUnet is still in flux and not considered “production ready”.
>> 
>> So I think we should keep it around and revisit this issue when
>> GNUnet
>> is considered “stable”.  WDYT?
>
> Sounds reasonable to me. There are also a lot of missing parts: a
> service definition for Guix System, findings substitutes, finding
> sources by hash (the one Guix uses, not the GNUnet hash) ..., so it
> isn't like my rudimentary patch was usable on large scale anyway.

Therefore, I am closing.  Feel free to reopen once GNUnet is considered
as (more) “stable”.

Thank you for your contribution.

All the best,
simon
M Jan. 27, 2021, 1:07 p.m. UTC | #7
Hi Matias (and Guix, which I've CC'ed),

To Matias: a follow up message will follow.

Unfortunately, I've just taken a pause from Guix+GNUnet hacking
(though probably I'll occasionally resume hacking once in a while).

Some things that work now:

* The rehash service itself seems to work
(https://notabug.org/mdevos/rehash).
This is the service where peers add SHA512<->GNUnet FS URI mappings they
discover (replace SHA512 by whatever Guix uses).

* Unless I broke anything, the ‘remirror’ service
(actually just a daemon implementing a web
server to run locally) can proxy http: downloads.
Proxying https: is a little difficult, as ‘remirror’
would need to play man-in-the-middle, but
may be implemented eventually. Or maybe guix can
be patched to (optionally) not use the CONNECT method
for proxying https: downloads.

There is no ‘offloading’ to GNUnet yet, though.

* Perhaps a better approach for substitutes:

In the ‘scheme-gnunet’ repository (https://notabug.org/mdevos/scheme-gnunet/src/master/ROADMAP.org),
I've written a publish-store.scm and download-store.scm script,
that respectively upload and download an item
from the store/GNUnet FS (using the gnunet-publish and
gnunet-download binaries).

It's not plugged into the guix substituter
and guix publish yet, though. I'm a bit at a loss how to do this properly,
so I'm more-or-less waiting until (a future revision of) the IPFS patch
is merged, and then I'll try to add GNUnet as ‘just another p2p system’.

Greetings, Maxime