diff mbox series

[bug#46800] Allow defining multiple substituters

Message ID c9f5535b93c0e0e832017e8f6f4ec3182fdad971.camel@telenet.be
State New
Headers show
Series [bug#46800] Allow defining multiple substituters | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job
cbaines/issue success View issue

Commit Message

M Feb. 26, 2021, 5:41 p.m. UTC
Hi Guix,

This patch series is my suggestion for allowing
multiple "substitution methods" or "substituters"
as I call them.  Currently, only a method for HTTP/S
is defined, though I hope it will be a good basis
for a common framework for substitutes over GNUnet
and IPFS.

Extending "guix-service-type" to allow configuration
of substitution method is left for later.

Any questions, remarks?

Greetings,
Maxime

Comments

Ludovic Courtès March 2, 2021, 8:37 p.m. UTC | #1
Hi Maxime,

Maxime Devos <maximedevos@telenet.be> skribis:

> This patch series is my suggestion for allowing
> multiple "substitution methods" or "substituters"
> as I call them.  Currently, only a method for HTTP/S
> is defined, though I hope it will be a good basis
> for a common framework for substitutes over GNUnet
> and IPFS.

Thanks for working on this!

As discussed on IRC, the daemon used to have support for multiple
substituters, but as a built-in C++ interface, which I removed in
f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f.

The Scheme interface you propose is of course nicer :-), but I’m still
not sure it’s necessary.  For example, in the IPFS prototype at
<https://issues.guix.gnu.org/33899>, IPFS support goes hand in hand with
HTTP support: narinfos are retrieved over HTTP and nars can be retrieved
over IPFS, or HTTP.  Likewise with “digests”:
<https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00080.html>.

Another issue is that it may be that, instead of letting users choose
one method and stick to it, we’d rather let them choose a policy that
can automatically pick the “best” method, dynamically adjusting choices.

All in all, I would prefer to wait until there’s a clear need for this
abstraction.

WDYT?

Thanks,
Ludo’.
M March 4, 2021, 7:48 a.m. UTC | #2
On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote:
> Hi Maxime,
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > This patch series is my suggestion for allowing
> > multiple "substitution methods" or "substituters"
> > as I call them.  Currently, only a method for HTTP/S
> > is defined, though I hope it will be a good basis
> > for a common framework for substitutes over GNUnet
> > and IPFS.
> 
> Thanks for working on this!
> 
> As discussed on IRC, the daemon used to have support for multiple
> substituters, but as a built-in C++ interface, which I removed in
> f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f.

Was there any particular reason this support was removed, beyond
moving from C++ to Scheme and the absence of any alternative substituters?

> The Scheme interface you propose is of course nicer :-), but I’m still
> not sure it’s necessary.  For example, in the IPFS prototype at
> <https://issues.guix.gnu.org/33899>;, IPFS support goes hand in hand with
> HTTP support: narinfos are retrieved over HTTP and nars can be retrieved
> over IPFS, or HTTP.

About X going hand-in-hand with Y:

Note that fetching narinfos, or fetching the nar itself are separated
A method can support both procedures, or just one of them (or none,
but that's rather useless.)

Users (well, the system administrator) can choose multiple methods, which
will be each fetch narinfos after each other & combine the
results into
one large list (or maybe some other data structure, I don't recall the
details), and each substituter will be asked to produce
a nar until a substituter
succeeds or all have said "sorry, I don't have that nar".

(That's different from C++ interface for multiple substituters I think, where
the methods are only tried sequentialy, they aren't combined.)

In case of IPFS, the idea is that *both* the IPFS and HTTP substituter are
enabled, in that order: "--substitute-methods=ipfs http".  The IPFS substituter
won't be able to produce any narinfos by itself, but that's no problem as
the HTTP substituter can find some.  Then, the IPFS substituter will be asked
first to download a substitute, as it's first in the "--substitute-methods" list.

And what if the narinfo doesn't have a IPFS URI, as the substitute server doesn't
support that?  Then "guix substitute" automatically fall-backs to HTTP.

Summary: some substitution methods can't do everything on their own, but that's ok,
as "guix substitute" will just ask them to try what they can and will see if some
combination of methods works.

About ‘not sure it's necessary’: there presumably will be a GNUnet substituter
at some point.  I suppose it would be possible to define all substitute methods
in (guix scripts substitute), but then you would still end up with a procedure
that tries all methods (e.g. in wip-ipfs-substitutes, process-substitution has
an "if" structures that tries downloading via IPFS with fall-back to HTTP; this
would become a (cond (ipfs? ipfs-code) (gnunet? gnunet-code) (#t http-code?))

Note that there's (guix scripts import X) and (guix build-system X).

> Likewise with “digests”: <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00080.html>;.

I haven't taken a close look at this yet before (I haven't been around guix
development for long).  To me, this seems compatible with this patch actually.
The HTTP substituter's procedure for downloading the substitute itself
(process-substitution/http in my patch) could be split in two, and look
at the narinfo to see whether the 'digest' or the usual mechanism should be used.

Alternatively, one could define *two* substituters: the ‘standard’ http substituter
‘http’, and the ‘http-digest’ substituter that can't fetch narinfo's, but rather
is an alternative method for downloading the substitute.  The daemon can be started
with "--substitute-methods http-digest http" to prefer downloading via the ‘http-digest’
method when possible, but uses ‘http’ for the narinfos and as a fallback for when the
narinfo does not have a digest.

But what if a non-HTTP substituter wants to use digests?  Well, I don't know any such
substituters (-:.  But for the (hypothetical) GNUnet substituter & the wip IPFS
substituter, I don't think they will use the digests code.

> Another issue is that it may be that, instead of letting users choose
> one method and stick to it,

They (at least the system administrator) can choose a list of substituters,
see above.

>  we’d rather let them choose a policy that
> can automatically pick the “best” method, dynamically adjusting choices.

Who's the user here?
(a) the system administrator, who configuring the daemon to use a certain
    list of substituters and defines a default list of substitute uris.
(b) the ‘user’, that doesn't directly have the capability to modify
    the system's guix daemon (or possibly an administrator that wants to
    to test some things out without the possibility of accidentally messing
    up the ‘real’ system).

If (b), I think it would be ideal to give the (unprivileged) user the
possibility of using their own substituter(s) (under their own capabilities,
not root), albeit at the cost of the guix daemon having to verify the narhash
& narinfo signature.

That could be implemented as a separate patch (though this patch would need
to be rebased then).  WDYT?  Would be useful for developing new substituters
and testing them, I think.

About *automatically* dynamically adjusting choices: would be nice, but how is
this supposed to work?  Any ideas?  The only thing I could think of is a
allowing the user to choose which narinfo to use (e.g. from the list of found
narinfos try to choose a narinfo that has an IPFS URI).

Also, for (a) the shepherd service could use a "set-substitute-methods" option,
and perhaps the user (b) could be allowed to select a subset of these substitute
methods to use when running "guix build PACKAGE" and the like (but only a subset,
as "guix substitute" when invoked by the daemon runs as root and therefore the
potential attack surface shouldn't be increased beyond what the administrator
allows).

> All in all, I would prefer to wait until there’s a clear need for this
> abstraction.

See above responses.

WDYT?

Thanks,
Maxime.
M March 5, 2021, 8:05 p.m. UTC | #3
On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote:
> Hi Maxime,
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > This patch series is my suggestion for allowing
> > multiple "substitution methods" or "substituters"
> > as I call them.  Currently, only a method for HTTP/S
> > is defined, though I hope it will be a good basis
> > for a common framework for substitutes over GNUnet
> > and IPFS.
> 
> [Ludovic's reply]

(See previous mail for my responses)

FYI: I've implemented a GNUnet substituter using this patch series
and the "publish hooks" patch (+ an unsubmitted patch that passes
some extra information to the publish hook) here:

https://notabug.org/mdevos/guix-gnunet/src/download-hooks3

(Warning: it does some questionable things with add-to-load-path.
Will hopefully be fixed eventually.  Also requires
<https://notabug.org/mdevos/scheme-gnunet> in a special location.)

Also, there's a bug in fetch-narinfos that causes an error if
the "fetch-narinfos" field of a subtituter is #f.  Also,
recognised-uri-scheme should be removed or reworked, as otherwise
the IPFS and GNUnet substituter won't be used for downloading a
substitute if http and https are not in the list.

I've worked around that for now by setting the latter '(http https file),
and setting the former to (const '()).

Greetings,
Maxime.
Ludovic Courtès March 12, 2021, 5:37 p.m. UTC | #4
Hi Maxime,

Maxime Devos <maximedevos@telenet.be> skribis:

> On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote:

[...]

>> As discussed on IRC, the daemon used to have support for multiple
>> substituters, but as a built-in C++ interface, which I removed in
>> f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f.
>
> Was there any particular reason this support was removed, beyond
> moving from C++ to Scheme and the absence of any alternative substituters?

These were the main reasons, yes.

>> The Scheme interface you propose is of course nicer :-), but I’m still
>> not sure it’s necessary.  For example, in the IPFS prototype at
>> <https://issues.guix.gnu.org/33899>;, IPFS support goes hand in hand with
>> HTTP support: narinfos are retrieved over HTTP and nars can be retrieved
>> over IPFS, or HTTP.
>
> About X going hand-in-hand with Y:
>
> Note that fetching narinfos, or fetching the nar itself are separated
> A method can support both procedures, or just one of them (or none,
> but that's rather useless.)
>
> Users (well, the system administrator) can choose multiple methods, which
> will be each fetch narinfos after each other & combine the
> results into
> one large list (or maybe some other data structure, I don't recall the
> details), and each substituter will be asked to produce
> a nar until a substituter
> succeeds or all have said "sorry, I don't have that nar".

OK.

> (That's different from C++ interface for multiple substituters I think, where
> the methods are only tried sequentialy, they aren't combined.)
>
> In case of IPFS, the idea is that *both* the IPFS and HTTP substituter are
> enabled, in that order: "--substitute-methods=ipfs http".  The IPFS substituter
> won't be able to produce any narinfos by itself, but that's no problem as
> the HTTP substituter can find some.  Then, the IPFS substituter will be asked
> first to download a substitute, as it's first in the "--substitute-methods" list.
>
> And what if the narinfo doesn't have a IPFS URI, as the substitute server doesn't
> support that?  Then "guix substitute" automatically fall-backs to HTTP.
>
> Summary: some substitution methods can't do everything on their own, but that's ok,
> as "guix substitute" will just ask them to try what they can and will see if some
> combination of methods works.

Alright.

> About ‘not sure it's necessary’: there presumably will be a GNUnet substituter
> at some point.  I suppose it would be possible to define all substitute methods
> in (guix scripts substitute), but then you would still end up with a procedure
> that tries all methods (e.g. in wip-ipfs-substitutes, process-substitution has
> an "if" structures that tries downloading via IPFS with fall-back to HTTP; this
> would become a (cond (ipfs? ipfs-code) (gnunet? gnunet-code) (#t http-code?))

I guess considerations that are more important to me (and to users, I
suppose) now than a few years back are maintainability and robustness.

Concretely, I wouldn’t want Guix to offer out of the box 4 methods, 3 of
which perform poorly or are downright buggy.  I think it would be more
fruitful if, as a project, we would focus on one or two methods or
method combinations that we have battle-tested, perform well, and a nice
long-term maintenance story, and so on.

[...]

>>  we’d rather let them choose a policy that
>> can automatically pick the “best” method, dynamically adjusting choices.
>
> Who's the user here?
> (a) the system administrator, who configuring the daemon to use a certain
>     list of substituters and defines a default list of substitute uris.
> (b) the ‘user’, that doesn't directly have the capability to modify
>     the system's guix daemon (or possibly an administrator that wants to
>     to test some things out without the possibility of accidentally messing
>     up the ‘real’ system).

I think (b) should be possible, just like users can pass
‘--substitute-urls’.

[...]

> About *automatically* dynamically adjusting choices: would be nice, but how is
> this supposed to work?  Any ideas?  The only thing I could think of is a
> allowing the user to choose which narinfo to use (e.g. from the list of found
> narinfos try to choose a narinfo that has an IPFS URI).

I think it’ll have to be fine-tuned once we have several stable
substitute methods.  After all, we have yet to figure out how to choose
between zstd and lzip for the current substitution mechanism; the
tradeoffs when very different methods are in use may be more complex!

>> All in all, I would prefer to wait until there’s a clear need for this
>> abstraction.
>
> See above responses.

I don’t think my concerns are really addressed :-), but at the same time
I think we need a playground for these things so we can actually grow
new substitute methods like those you’ve been looking at.  Hmmm tricky!

Ludo’.
Tony O June 6, 2021, 5:52 p.m. UTC | #5
Hi, any news on this patch?

Thanks,
ix
diff mbox series

Patch

From b0b3f3e339ff834fd973f2f0f0bc7ad9be6ffd04 Mon Sep 17 00:00:00 2001
From: Maxime Devos <maximedevos@telenet.be>
Date: Fri, 26 Feb 2021 15:30:04 +0100
Subject: [PATCH 4/4] =?UTF-8?q?substitute:=20Unstub=20=E2=80=98verify-hash?=
 =?UTF-8?q?/unknown=E2=80=99.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This procedure is called if a substitution method
returns 'unpacked'.  While no method does that yet,
it is expected the IPFS and GNUnet substituter will.

* guix/scripts/substitute.scm
  (verify-hash/unknown): Unstub procedure.
  (process-substitution): When the substituter
  returns ‘unpacked’, verify whether we got the right
  substitute and canonicalize permissions and timestamps.
* doc/substituters.texi (Defining Substituters): Document the
  absence of a requirement for substituters to normalize timestamps
  and file permissions.
* tests/substitute.scm (write-string-as-nar): Define procedure,
  and use in test cases.  Also test that the hash is verified
  when a substituter returns 'unpacked'.
---
 doc/substituters.texi       |  4 +++
 guix/scripts/substitute.scm | 43 +++++++++++++++++++----
 tests/substitute.scm        | 70 ++++++++++++++++++++++++++++++++-----
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/doc/substituters.texi b/doc/substituters.texi
index 516e2c1eea..65b1a929b0 100644
--- a/doc/substituters.texi
+++ b/doc/substituters.texi
@@ -47,4 +47,8 @@  are correctly signed and have a correct hash; this is handled
 by @code{(guix scripts substitute)}.  @var{nar-downloader} and
 @var{fetch-narinfos} can be @code{#f} if unimplemented by this
 substituter.
+
+Likewise, when returning @code{unpacked}, @var{nar-downloader}
+does not need to normalize timestamps and file permissions.
+
 @end deffn
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 78345dce8f..90fd7dd021 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -31,8 +31,10 @@ 
   #:use-module (guix records)
   #:use-module (guix diagnostics)
   #:use-module (guix i18n)
-  #:use-module ((guix serialization) #:select (restore-file dump-file))
+  #:use-module ((guix serialization)
+                #:select (restore-file write-file dump-file))
   #:autoload   (guix store deduplication) (dump-file/deduplicate)
+  #:autoload   (guix store database) (reset-timestamps)
   #:autoload   (guix scripts discover) (read-substitute-urls)
   #:autoload   (guix scripts substitute http) (open-connection-for-uri/cached)
   #:use-module (gcrypt hash)
@@ -49,6 +51,7 @@ 
   #:use-module ((guix build syscalls)
                 #:select (set-thread-name))
   #:use-module (ice-9 rdelim)
+  #:use-module (ice-9 receive)
   #:use-module (ice-9 regex)
   #:use-module (ice-9 match)
   #:use-module (ice-9 format)
@@ -531,11 +534,20 @@  to the narinfo."
               (bytevector->nix-base32-string expected)
               (bytevector->nix-base32-string actual))))
 
-(define (verify-hash/unknown . rest)
-  ;; Variant of verify-hash where the hash hasn't yet been computed.
-  ;; TODO: this will be implemented later in the patch series!
-  ;; (To be used by the IPFS and GNUnet substituter)
-  (leave (G_ "TODO verify-hash/unknown is unimplemented~%")))
+(define* (verify-hash/unknown file expected algorithm narinfo
+                              #:key thunk)
+  "Check whether we got the data announced in the narinfo NARINFO.
+FILE is the actual file we got and EXPECTED is the hash according
+to the narinfo.  Call THUNK after FILE was read, but before
+the daemon is informed."
+  ;; Recreate the nar, hash it, and let verify-hash
+  ;; produce the 'success' or 'hash-mismatch' output.
+  (receive (hashed get-hash)
+      (open-hash-port algorithm)
+    (write-file file hashed)
+    (close hashed)
+    (thunk)
+    (verify-hash (get-hash) expected algorithm narinfo)))
 
 (define-syntax-rule (receive* kwargs exp exp* exp** ...)
   (call-with-values (lambda () exp)
@@ -606,7 +618,24 @@  the current output port."
              (when after-input-close
                (after-input-close))
              ;; Check whether we got the data announced in the NARINFO.
-             (verify-hash/unknown destination narinfo))
+             (receive (algorithm expected)
+                 (narinfo-hash-algorithm+value narinfo)
+               (verify-hash/unknown
+                destination expected algorithm narinfo
+                ;; Make sure the permissions and timestamps are canonical.
+                ;;
+                ;; This could theoretically be done somewhat more
+                ;; cache-friendly if done in the substitution method,
+                ;; by canonicalising a file right after it has been
+                ;; downloaded, but let's try for correctness first
+                ;; before efficiency.
+                ;;
+                ;; Also, this must be done *after* verifying the hash,
+                ;; in order to make the access time is set correctly.
+                ;;
+                ;; TODO it would be nice to deduplicate DESTINATION.
+                #:thunk
+                (lambda () (reset-timestamps destination)))))
             (else
              (format (current-error-port) "~s~%" input)
              (leave
diff --git a/tests/substitute.scm b/tests/substitute.scm
index 6c754f774d..1cb1a10402 100644
--- a/tests/substitute.scm
+++ b/tests/substitute.scm
@@ -324,6 +324,16 @@  System: mips64el-linux\n")
       (lambda ()
         (guix-substitute "--substitute")))))
 
+(define (write-string-as-nar port content)
+  (write-file-tree "foo" port
+                   #:file-type+size
+                   (lambda _
+                     (values 'regular
+                             (string-length content)))
+                   #:file-port
+                   (lambda _
+                     (open-input-string content))))
+
 (test-equal "substitute, invalid hash"
   (string-append "hash-mismatch sha256 "
                  (bytevector->nix-base32-string (sha256 #vu8())) " "
@@ -331,14 +341,7 @@  System: mips64el-linux\n")
                                (open-hash-port (hash-algorithm sha256)))
                               ((content)
                                "Substitutable data."))
-                   (write-file-tree "foo" port
-                                    #:file-type+size
-                                    (lambda _
-                                      (values 'regular
-                                              (string-length content)))
-                                    #:file-port
-                                    (lambda _
-                                      (open-input-string content)))
+                   (write-string-as-nar port content)
                    (close-port port)
                    (bytevector->nix-base32-string (get-hash)))
                  "\n")
@@ -367,6 +370,57 @@  System: mips64el-linux\n")))
                (lambda ()
                  (guix-substitute "--substitute"))))))))))
 
+(test-equal "substitute (unpacked), invalid hash"
+  (string-append "hash-mismatch sha256 "
+                 (bytevector->nix-base32-string (sha256 #vu8())) " "
+                 (let-values (((port get-hash)
+                               (open-hash-port (hash-algorithm sha256)))
+                              ((content) "Wrong data!"))
+                   (write-string-as-nar port content)
+                   (close-port port)
+                   (bytevector->nix-base32-string (get-hash)))
+                 "\n")
+  (with-output-to-string
+    (lambda ()
+      ;; Arrange so the actual data hash does not match the 'NarHash' field in the
+      ;; narinfo.  Use a substituter that does not produce a nar, but rather
+      ;; writes the item to the store by itself.
+      (define narinfo
+        ;; Pretend this hash and size actually correspond to
+        ;; some nar.
+        (string-append "StorePath: " (%store-prefix)
+                       "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash
+URL: irrelevant
+Compression: none
+NarHash: sha256:" (bytevector->nix-base32-string (sha256 #vu8())) "
+NarSize: 123
+References:
+Deriver: " (%store-prefix) "/foo.drv
+System: mips64el-linux\n"))
+      (parameterize ((substituters
+                      (list
+                       (make-substituter
+                        'test
+                        (lambda (destination . rest)
+                          (call-with-output-file destination
+                            (cut display "Wrong data!" <>))
+                          'unpacked)
+                        (const
+                         (list
+                          (string->narinfo
+                           (string-append narinfo "Signature: "
+                                          (signature-field narinfo)
+                                          "\n")
+                           "test://")))
+                        '(test))))
+                     (substitute-urls '("test://")))
+        (call-with-temporary-directory
+         (lambda (directory)
+           (request-substitution
+            (string-append (%store-prefix)
+                           "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo")
+            (string-append directory "/wrong-hash"))))))))
+
 (test-quit "substitute, unauthorized key"
     "no valid substitute"
   (with-narinfo (string-append %narinfo "Signature: "
-- 
2.30.0