mbox

[bug#36029,0/2] 'publish' and 'substitute' support severalcompression methods

Message ID 20190531144828.9585-1-ludo@gnu.org
Headers show

Message

Ludovic Courtès May 31, 2019, 2:48 p.m. UTC
Hello Guix!

This is a followup to <https://issues.guix.gnu.org/issue/35880>.

One idea we discussed there was to allow clients to pass an
‘X-Guix-Accepted-Encoding’ header in HTTP requests, and the server
would return an lzip narinfo or a gzip narinfo depending on that.
However, I thought that this was not very flexible, and that we
were bound to mess up with caching.

This patch implements a different solution: ‘guix publish’ can
be passed multiple ‘-C’ options, in which case it compresses
substitutes with all these compression methods.  The corresponding
narinfo looks like this:

--8<---------------cut here---------------start------------->8---
StorePath: /gnu/store/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
URL: nar/gzip/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
Compression: gzip
FileSize: 40308611
URL: nar/lzip/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
Compression: lzip
FileSize: 19867767
NarHash: sha256:1jv4nkq68a7zwqhi9inrnh340a4jxcpb91wq7d25hgw0nk8isbbk
NarSize: 136499024
References: …
--8<---------------cut here---------------end--------------->8---

IOW, it’s like before, except that there are multiple
URL/Compression/FileSize fields instead of just one of each.

The trick is that old clients take the first occurrence of each
of these fields and ignore subsequent occurrences.  In the example
above, they’d just take gzip and ignore the rest.

The new ‘guix substitute’ (second patch) “sees” all these fields
and is able to choose the most appropriate compression method (i.e.,
the best one among those it supports.)

This adds a bit of complexity that is useless beyond the transitioning
period from gzip to lzip, but I think that’s OK; plus there might be
an lzip to super-lzip transition in the future, who knows.

Thoughts?

When we deploy that, we’ll obviously more use storage and more CPU on
the build farm, but that seems unavoidable.  OTOH, we’ll progressively
end up sending less data over the wire (and paying less for the CDN!),
given that lzip compresses better.

Ludo’.

Ludovic Courtès (2):
  publish: '--compression' can be repeated.
  substitute: Select the best compression methods.

 doc/guix.texi               |   5 +
 guix/scripts/challenge.scm  |   4 +-
 guix/scripts/publish.scm    | 204 ++++++++++++++++++++++--------------
 guix/scripts/substitute.scm | 141 ++++++++++++++++++-------
 guix/scripts/weather.scm    |   5 +-
 tests/publish.scm           |  89 ++++++++++++++--
 tests/substitute.scm        |  51 ++++++++-
 7 files changed, 370 insertions(+), 129 deletions(-)

Comments

Pierre Neidhardt June 1, 2019, 6:19 a.m. UTC | #1
Hi!

> One idea we discussed there was to allow clients to pass an
> ‘X-Guix-Accepted-Encoding’ header in HTTP requests, and the server
> would return an lzip narinfo or a gzip narinfo depending on that.
> However, I thought that this was not very flexible, and that we
> were bound to mess up with caching.
>
> This patch implements a different solution: ‘guix publish’ can
> be passed multiple ‘-C’ options, in which case it compresses
> substitutes with all these compression methods.  The corresponding
> narinfo looks like this:
>
> --8<---------------cut here---------------start------------->8---
> StorePath: /gnu/store/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
> URL: nar/gzip/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
> Compression: gzip
> FileSize: 40308611
> URL: nar/lzip/9czlz7ss3187l2vi1hvrlkwlgrggdg5p-inkscape-0.92.4
> Compression: lzip
> FileSize: 19867767
> NarHash: sha256:1jv4nkq68a7zwqhi9inrnh340a4jxcpb91wq7d25hgw0nk8isbbk
> NarSize: 136499024
> References: …
> --8<---------------cut here---------------end--------------->8---

Huhu, inkscape's size is already halved ;)

> IOW, it’s like before, except that there are multiple
> URL/Compression/FileSize fields instead of just one of each.
>
> The trick is that old clients take the first occurrence of each
> of these fields and ignore subsequent occurrences.  In the example
> above, they’d just take gzip and ignore the rest.

Smart!  I like it!

I gave the patches a quick skim but I'm not very knowledgeable of those
parts, so there is little for me to comment about I'm afraid.

Other than that, excited to see .lz substitutes becoming a reality!