Message ID | c9f5535b93c0e0e832017e8f6f4ec3182fdad971.camel@telenet.be |
---|---|
State | New |
Headers | show |
Series | [bug#46800] Allow defining multiple substituters | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
Hi Maxime, Maxime Devos <maximedevos@telenet.be> skribis: > This patch series is my suggestion for allowing > multiple "substitution methods" or "substituters" > as I call them. Currently, only a method for HTTP/S > is defined, though I hope it will be a good basis > for a common framework for substitutes over GNUnet > and IPFS. Thanks for working on this! As discussed on IRC, the daemon used to have support for multiple substituters, but as a built-in C++ interface, which I removed in f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f. The Scheme interface you propose is of course nicer :-), but I’m still not sure it’s necessary. For example, in the IPFS prototype at <https://issues.guix.gnu.org/33899>, IPFS support goes hand in hand with HTTP support: narinfos are retrieved over HTTP and nars can be retrieved over IPFS, or HTTP. Likewise with “digests”: <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00080.html>. Another issue is that it may be that, instead of letting users choose one method and stick to it, we’d rather let them choose a policy that can automatically pick the “best” method, dynamically adjusting choices. All in all, I would prefer to wait until there’s a clear need for this abstraction. WDYT? Thanks, Ludo’.
On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote: > Hi Maxime, > > Maxime Devos <maximedevos@telenet.be> skribis: > > > This patch series is my suggestion for allowing > > multiple "substitution methods" or "substituters" > > as I call them. Currently, only a method for HTTP/S > > is defined, though I hope it will be a good basis > > for a common framework for substitutes over GNUnet > > and IPFS. > > Thanks for working on this! > > As discussed on IRC, the daemon used to have support for multiple > substituters, but as a built-in C++ interface, which I removed in > f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f. Was there any particular reason this support was removed, beyond moving from C++ to Scheme and the absence of any alternative substituters? > The Scheme interface you propose is of course nicer :-), but I’m still > not sure it’s necessary. For example, in the IPFS prototype at > <https://issues.guix.gnu.org/33899>;, IPFS support goes hand in hand with > HTTP support: narinfos are retrieved over HTTP and nars can be retrieved > over IPFS, or HTTP. About X going hand-in-hand with Y: Note that fetching narinfos, or fetching the nar itself are separated A method can support both procedures, or just one of them (or none, but that's rather useless.) Users (well, the system administrator) can choose multiple methods, which will be each fetch narinfos after each other & combine the results into one large list (or maybe some other data structure, I don't recall the details), and each substituter will be asked to produce a nar until a substituter succeeds or all have said "sorry, I don't have that nar". (That's different from C++ interface for multiple substituters I think, where the methods are only tried sequentialy, they aren't combined.) In case of IPFS, the idea is that *both* the IPFS and HTTP substituter are enabled, in that order: "--substitute-methods=ipfs http". The IPFS substituter won't be able to produce any narinfos by itself, but that's no problem as the HTTP substituter can find some. Then, the IPFS substituter will be asked first to download a substitute, as it's first in the "--substitute-methods" list. And what if the narinfo doesn't have a IPFS URI, as the substitute server doesn't support that? Then "guix substitute" automatically fall-backs to HTTP. Summary: some substitution methods can't do everything on their own, but that's ok, as "guix substitute" will just ask them to try what they can and will see if some combination of methods works. About ‘not sure it's necessary’: there presumably will be a GNUnet substituter at some point. I suppose it would be possible to define all substitute methods in (guix scripts substitute), but then you would still end up with a procedure that tries all methods (e.g. in wip-ipfs-substitutes, process-substitution has an "if" structures that tries downloading via IPFS with fall-back to HTTP; this would become a (cond (ipfs? ipfs-code) (gnunet? gnunet-code) (#t http-code?)) Note that there's (guix scripts import X) and (guix build-system X). > Likewise with “digests”: <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00080.html>;. I haven't taken a close look at this yet before (I haven't been around guix development for long). To me, this seems compatible with this patch actually. The HTTP substituter's procedure for downloading the substitute itself (process-substitution/http in my patch) could be split in two, and look at the narinfo to see whether the 'digest' or the usual mechanism should be used. Alternatively, one could define *two* substituters: the ‘standard’ http substituter ‘http’, and the ‘http-digest’ substituter that can't fetch narinfo's, but rather is an alternative method for downloading the substitute. The daemon can be started with "--substitute-methods http-digest http" to prefer downloading via the ‘http-digest’ method when possible, but uses ‘http’ for the narinfos and as a fallback for when the narinfo does not have a digest. But what if a non-HTTP substituter wants to use digests? Well, I don't know any such substituters (-:. But for the (hypothetical) GNUnet substituter & the wip IPFS substituter, I don't think they will use the digests code. > Another issue is that it may be that, instead of letting users choose > one method and stick to it, They (at least the system administrator) can choose a list of substituters, see above. > we’d rather let them choose a policy that > can automatically pick the “best” method, dynamically adjusting choices. Who's the user here? (a) the system administrator, who configuring the daemon to use a certain list of substituters and defines a default list of substitute uris. (b) the ‘user’, that doesn't directly have the capability to modify the system's guix daemon (or possibly an administrator that wants to to test some things out without the possibility of accidentally messing up the ‘real’ system). If (b), I think it would be ideal to give the (unprivileged) user the possibility of using their own substituter(s) (under their own capabilities, not root), albeit at the cost of the guix daemon having to verify the narhash & narinfo signature. That could be implemented as a separate patch (though this patch would need to be rebased then). WDYT? Would be useful for developing new substituters and testing them, I think. About *automatically* dynamically adjusting choices: would be nice, but how is this supposed to work? Any ideas? The only thing I could think of is a allowing the user to choose which narinfo to use (e.g. from the list of found narinfos try to choose a narinfo that has an IPFS URI). Also, for (a) the shepherd service could use a "set-substitute-methods" option, and perhaps the user (b) could be allowed to select a subset of these substitute methods to use when running "guix build PACKAGE" and the like (but only a subset, as "guix substitute" when invoked by the daemon runs as root and therefore the potential attack surface shouldn't be increased beyond what the administrator allows). > All in all, I would prefer to wait until there’s a clear need for this > abstraction. See above responses. WDYT? Thanks, Maxime.
On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote: > Hi Maxime, > > Maxime Devos <maximedevos@telenet.be> skribis: > > > This patch series is my suggestion for allowing > > multiple "substitution methods" or "substituters" > > as I call them. Currently, only a method for HTTP/S > > is defined, though I hope it will be a good basis > > for a common framework for substitutes over GNUnet > > and IPFS. > > [Ludovic's reply] (See previous mail for my responses) FYI: I've implemented a GNUnet substituter using this patch series and the "publish hooks" patch (+ an unsubmitted patch that passes some extra information to the publish hook) here: https://notabug.org/mdevos/guix-gnunet/src/download-hooks3 (Warning: it does some questionable things with add-to-load-path. Will hopefully be fixed eventually. Also requires <https://notabug.org/mdevos/scheme-gnunet> in a special location.) Also, there's a bug in fetch-narinfos that causes an error if the "fetch-narinfos" field of a subtituter is #f. Also, recognised-uri-scheme should be removed or reworked, as otherwise the IPFS and GNUnet substituter won't be used for downloading a substitute if http and https are not in the list. I've worked around that for now by setting the latter '(http https file), and setting the former to (const '()). Greetings, Maxime.
Hi Maxime, Maxime Devos <maximedevos@telenet.be> skribis: > On Tue, 2021-03-02 at 21:37 +0100, Ludovic Courtès wrote: [...] >> As discussed on IRC, the daemon used to have support for multiple >> substituters, but as a built-in C++ interface, which I removed in >> f6919ebdc6b0ce0286814cc6ab0564b1a4c67f5f. > > Was there any particular reason this support was removed, beyond > moving from C++ to Scheme and the absence of any alternative substituters? These were the main reasons, yes. >> The Scheme interface you propose is of course nicer :-), but I’m still >> not sure it’s necessary. For example, in the IPFS prototype at >> <https://issues.guix.gnu.org/33899>;, IPFS support goes hand in hand with >> HTTP support: narinfos are retrieved over HTTP and nars can be retrieved >> over IPFS, or HTTP. > > About X going hand-in-hand with Y: > > Note that fetching narinfos, or fetching the nar itself are separated > A method can support both procedures, or just one of them (or none, > but that's rather useless.) > > Users (well, the system administrator) can choose multiple methods, which > will be each fetch narinfos after each other & combine the > results into > one large list (or maybe some other data structure, I don't recall the > details), and each substituter will be asked to produce > a nar until a substituter > succeeds or all have said "sorry, I don't have that nar". OK. > (That's different from C++ interface for multiple substituters I think, where > the methods are only tried sequentialy, they aren't combined.) > > In case of IPFS, the idea is that *both* the IPFS and HTTP substituter are > enabled, in that order: "--substitute-methods=ipfs http". The IPFS substituter > won't be able to produce any narinfos by itself, but that's no problem as > the HTTP substituter can find some. Then, the IPFS substituter will be asked > first to download a substitute, as it's first in the "--substitute-methods" list. > > And what if the narinfo doesn't have a IPFS URI, as the substitute server doesn't > support that? Then "guix substitute" automatically fall-backs to HTTP. > > Summary: some substitution methods can't do everything on their own, but that's ok, > as "guix substitute" will just ask them to try what they can and will see if some > combination of methods works. Alright. > About ‘not sure it's necessary’: there presumably will be a GNUnet substituter > at some point. I suppose it would be possible to define all substitute methods > in (guix scripts substitute), but then you would still end up with a procedure > that tries all methods (e.g. in wip-ipfs-substitutes, process-substitution has > an "if" structures that tries downloading via IPFS with fall-back to HTTP; this > would become a (cond (ipfs? ipfs-code) (gnunet? gnunet-code) (#t http-code?)) I guess considerations that are more important to me (and to users, I suppose) now than a few years back are maintainability and robustness. Concretely, I wouldn’t want Guix to offer out of the box 4 methods, 3 of which perform poorly or are downright buggy. I think it would be more fruitful if, as a project, we would focus on one or two methods or method combinations that we have battle-tested, perform well, and a nice long-term maintenance story, and so on. [...] >> we’d rather let them choose a policy that >> can automatically pick the “best” method, dynamically adjusting choices. > > Who's the user here? > (a) the system administrator, who configuring the daemon to use a certain > list of substituters and defines a default list of substitute uris. > (b) the ‘user’, that doesn't directly have the capability to modify > the system's guix daemon (or possibly an administrator that wants to > to test some things out without the possibility of accidentally messing > up the ‘real’ system). I think (b) should be possible, just like users can pass ‘--substitute-urls’. [...] > About *automatically* dynamically adjusting choices: would be nice, but how is > this supposed to work? Any ideas? The only thing I could think of is a > allowing the user to choose which narinfo to use (e.g. from the list of found > narinfos try to choose a narinfo that has an IPFS URI). I think it’ll have to be fine-tuned once we have several stable substitute methods. After all, we have yet to figure out how to choose between zstd and lzip for the current substitution mechanism; the tradeoffs when very different methods are in use may be more complex! >> All in all, I would prefer to wait until there’s a clear need for this >> abstraction. > > See above responses. I don’t think my concerns are really addressed :-), but at the same time I think we need a playground for these things so we can actually grow new substitute methods like those you’ve been looking at. Hmmm tricky! Ludo’.
Hi, any news on this patch? Thanks, ix
From b0b3f3e339ff834fd973f2f0f0bc7ad9be6ffd04 Mon Sep 17 00:00:00 2001 From: Maxime Devos <maximedevos@telenet.be> Date: Fri, 26 Feb 2021 15:30:04 +0100 Subject: [PATCH 4/4] =?UTF-8?q?substitute:=20Unstub=20=E2=80=98verify-hash?= =?UTF-8?q?/unknown=E2=80=99.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This procedure is called if a substitution method returns 'unpacked'. While no method does that yet, it is expected the IPFS and GNUnet substituter will. * guix/scripts/substitute.scm (verify-hash/unknown): Unstub procedure. (process-substitution): When the substituter returns ‘unpacked’, verify whether we got the right substitute and canonicalize permissions and timestamps. * doc/substituters.texi (Defining Substituters): Document the absence of a requirement for substituters to normalize timestamps and file permissions. * tests/substitute.scm (write-string-as-nar): Define procedure, and use in test cases. Also test that the hash is verified when a substituter returns 'unpacked'. --- doc/substituters.texi | 4 +++ guix/scripts/substitute.scm | 43 +++++++++++++++++++---- tests/substitute.scm | 70 ++++++++++++++++++++++++++++++++----- 3 files changed, 102 insertions(+), 15 deletions(-) diff --git a/doc/substituters.texi b/doc/substituters.texi index 516e2c1eea..65b1a929b0 100644 --- a/doc/substituters.texi +++ b/doc/substituters.texi @@ -47,4 +47,8 @@ are correctly signed and have a correct hash; this is handled by @code{(guix scripts substitute)}. @var{nar-downloader} and @var{fetch-narinfos} can be @code{#f} if unimplemented by this substituter. + +Likewise, when returning @code{unpacked}, @var{nar-downloader} +does not need to normalize timestamps and file permissions. + @end deffn diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm index 78345dce8f..90fd7dd021 100755 --- a/guix/scripts/substitute.scm +++ b/guix/scripts/substitute.scm @@ -31,8 +31,10 @@ #:use-module (guix records) #:use-module (guix diagnostics) #:use-module (guix i18n) - #:use-module ((guix serialization) #:select (restore-file dump-file)) + #:use-module ((guix serialization) + #:select (restore-file write-file dump-file)) #:autoload (guix store deduplication) (dump-file/deduplicate) + #:autoload (guix store database) (reset-timestamps) #:autoload (guix scripts discover) (read-substitute-urls) #:autoload (guix scripts substitute http) (open-connection-for-uri/cached) #:use-module (gcrypt hash) @@ -49,6 +51,7 @@ #:use-module ((guix build syscalls) #:select (set-thread-name)) #:use-module (ice-9 rdelim) + #:use-module (ice-9 receive) #:use-module (ice-9 regex) #:use-module (ice-9 match) #:use-module (ice-9 format) @@ -531,11 +534,20 @@ to the narinfo." (bytevector->nix-base32-string expected) (bytevector->nix-base32-string actual)))) -(define (verify-hash/unknown . rest) - ;; Variant of verify-hash where the hash hasn't yet been computed. - ;; TODO: this will be implemented later in the patch series! - ;; (To be used by the IPFS and GNUnet substituter) - (leave (G_ "TODO verify-hash/unknown is unimplemented~%"))) +(define* (verify-hash/unknown file expected algorithm narinfo + #:key thunk) + "Check whether we got the data announced in the narinfo NARINFO. +FILE is the actual file we got and EXPECTED is the hash according +to the narinfo. Call THUNK after FILE was read, but before +the daemon is informed." + ;; Recreate the nar, hash it, and let verify-hash + ;; produce the 'success' or 'hash-mismatch' output. + (receive (hashed get-hash) + (open-hash-port algorithm) + (write-file file hashed) + (close hashed) + (thunk) + (verify-hash (get-hash) expected algorithm narinfo))) (define-syntax-rule (receive* kwargs exp exp* exp** ...) (call-with-values (lambda () exp) @@ -606,7 +618,24 @@ the current output port." (when after-input-close (after-input-close)) ;; Check whether we got the data announced in the NARINFO. - (verify-hash/unknown destination narinfo)) + (receive (algorithm expected) + (narinfo-hash-algorithm+value narinfo) + (verify-hash/unknown + destination expected algorithm narinfo + ;; Make sure the permissions and timestamps are canonical. + ;; + ;; This could theoretically be done somewhat more + ;; cache-friendly if done in the substitution method, + ;; by canonicalising a file right after it has been + ;; downloaded, but let's try for correctness first + ;; before efficiency. + ;; + ;; Also, this must be done *after* verifying the hash, + ;; in order to make the access time is set correctly. + ;; + ;; TODO it would be nice to deduplicate DESTINATION. + #:thunk + (lambda () (reset-timestamps destination))))) (else (format (current-error-port) "~s~%" input) (leave diff --git a/tests/substitute.scm b/tests/substitute.scm index 6c754f774d..1cb1a10402 100644 --- a/tests/substitute.scm +++ b/tests/substitute.scm @@ -324,6 +324,16 @@ System: mips64el-linux\n") (lambda () (guix-substitute "--substitute"))))) +(define (write-string-as-nar port content) + (write-file-tree "foo" port + #:file-type+size + (lambda _ + (values 'regular + (string-length content))) + #:file-port + (lambda _ + (open-input-string content)))) + (test-equal "substitute, invalid hash" (string-append "hash-mismatch sha256 " (bytevector->nix-base32-string (sha256 #vu8())) " " @@ -331,14 +341,7 @@ System: mips64el-linux\n") (open-hash-port (hash-algorithm sha256))) ((content) "Substitutable data.")) - (write-file-tree "foo" port - #:file-type+size - (lambda _ - (values 'regular - (string-length content))) - #:file-port - (lambda _ - (open-input-string content))) + (write-string-as-nar port content) (close-port port) (bytevector->nix-base32-string (get-hash))) "\n") @@ -367,6 +370,57 @@ System: mips64el-linux\n"))) (lambda () (guix-substitute "--substitute")))))))))) +(test-equal "substitute (unpacked), invalid hash" + (string-append "hash-mismatch sha256 " + (bytevector->nix-base32-string (sha256 #vu8())) " " + (let-values (((port get-hash) + (open-hash-port (hash-algorithm sha256))) + ((content) "Wrong data!")) + (write-string-as-nar port content) + (close-port port) + (bytevector->nix-base32-string (get-hash))) + "\n") + (with-output-to-string + (lambda () + ;; Arrange so the actual data hash does not match the 'NarHash' field in the + ;; narinfo. Use a substituter that does not produce a nar, but rather + ;; writes the item to the store by itself. + (define narinfo + ;; Pretend this hash and size actually correspond to + ;; some nar. + (string-append "StorePath: " (%store-prefix) + "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash +URL: irrelevant +Compression: none +NarHash: sha256:" (bytevector->nix-base32-string (sha256 #vu8())) " +NarSize: 123 +References: +Deriver: " (%store-prefix) "/foo.drv +System: mips64el-linux\n")) + (parameterize ((substituters + (list + (make-substituter + 'test + (lambda (destination . rest) + (call-with-output-file destination + (cut display "Wrong data!" <>)) + 'unpacked) + (const + (list + (string->narinfo + (string-append narinfo "Signature: " + (signature-field narinfo) + "\n") + "test://"))) + '(test)))) + (substitute-urls '("test://"))) + (call-with-temporary-directory + (lambda (directory) + (request-substitution + (string-append (%store-prefix) + "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo") + (string-append directory "/wrong-hash")))))))) + (test-quit "substitute, unauthorized key" "no valid substitute" (with-narinfo (string-append %narinfo "Signature: " -- 2.30.0