diff mbox series

[bug#35880,1/7] lzlib: Add 'make-lzip-input-port/compressed'.

Message ID 20190524134238.22802-1-ludo@gnu.org
State Accepted
Headers show
Series [bug#35880,1/7] lzlib: Add 'make-lzip-input-port/compressed'. | expand

Checks

Context Check Description
cbaines/applying patch fail Apply failed

Commit Message

Ludovic Courtès May 24, 2019, 1:42 p.m. UTC
* guix/lzlib.scm (lzwrite!, make-lzip-input-port/compressed): New
procedures.
* tests/lzlib.scm ("make-lzip-input-port/compressed"): New test.
* guix/tests.scm (%seed): Export.
---
 guix/lzlib.scm  | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
 guix/tests.scm  |  1 +
 tests/lzlib.scm | 10 ++++++++
 3 files changed, 73 insertions(+)

Comments

Pierre Neidhardt May 25, 2019, 5:24 p.m. UTC | #1
As an Lzip enthusiast, I have some questions ;)

I see you are using make-lzip-input-port/compressed in a subsequent
patch, but this does not map how it's done for gzip et al., the latter
being invoked via it's system command "gzip -c ...".  Why did you decide
to do it differently for lzip?

Much of the code induced by make-lzip-input-port/compressed seems to
repeat the lzread! / lzwrite business, maybe there is a way to factor
some of it?

> +(define (lzwrite! encoder source source-offset source-count
> +                  target target-offset target-count)
> +  "Write up to SOURCE-COUNT bytes from SOURCE to ENCODER, and read up to
> +TARGET-COUNT bytes into TARGET at TARGET-OFFSET.  Return two values: the
> +number of bytes read from SOURCE, and the number of bytes written to TARGET."
> +  (define read
> +    (if (< 0 (lz-compress-write-size encoder))
> +        (match (lz-compress-write encoder source source-offset source-count)
> +          (0 (lz-compress-finish encoder) 0)
> +          (n n))
> +        0))
> +
> +  (let loop ()
> +    (match (lz-compress-read encoder target target-offset target-count)
> +      (0       (loop))
> +      (written (values read written)))))

Why looping on 0?  If there is no byte to read, wouldn't this loop indefinitely?
Ludovic Courtès May 26, 2019, 7:51 p.m. UTC | #2
Hi!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> As an Lzip enthusiast, I have some questions ;)
>
> I see you are using make-lzip-input-port/compressed in a subsequent
> patch, but this does not map how it's done for gzip et al., the latter
> being invoked via it's system command "gzip -c ...".  Why did you decide
> to do it differently for lzip?
>
> Much of the code induced by make-lzip-input-port/compressed seems to
> repeat the lzread! / lzwrite business, maybe there is a way to factor
> some of it?

Actually, ‘make-lzip-input-port/compressed’ exists solely so we can have
‘compressed-port’ for lzip, which in turn allows us to write tests.

It uses (guix lzlib) instead of invoking the ‘lzip’ command because we
can.  :-)  Invoking commands is not as nice, because it’s more
expensive, requires us to spawn an additional process when the input is
not a file port (e.g., it’s a string port), and it’s forking is not
possible in multi-threaded programs like ‘guix publish’.

>> +(define (lzwrite! encoder source source-offset source-count
>> +                  target target-offset target-count)
>> +  "Write up to SOURCE-COUNT bytes from SOURCE to ENCODER, and read up to
>> +TARGET-COUNT bytes into TARGET at TARGET-OFFSET.  Return two values: the
>> +number of bytes read from SOURCE, and the number of bytes written to TARGET."
>> +  (define read
>> +    (if (< 0 (lz-compress-write-size encoder))
>> +        (match (lz-compress-write encoder source source-offset source-count)
>> +          (0 (lz-compress-finish encoder) 0)
>> +          (n n))
>> +        0))
>> +
>> +  (let loop ()
>> +    (match (lz-compress-read encoder target target-offset target-count)
>> +      (0       (loop))
>> +      (written (values read written)))))
>
> Why looping on 0?  If there is no byte to read, wouldn't this loop indefinitely?

Hmm, good point.  The idea is that ‘lzwrite!’ should return 0 only on
end-of-file, but then the loop should include reading more from SOURCE.
I’ll follow up on this one.

Thanks!

Ludo’.
Ludovic Courtès May 27, 2019, 3:45 p.m. UTC | #3
Ludovic Courtès <ludo@gnu.org> skribis:

> Pierre Neidhardt <mail@ambrevar.xyz> skribis:

[...]

>>> +(define (lzwrite! encoder source source-offset source-count
>>> +                  target target-offset target-count)
>>> +  "Write up to SOURCE-COUNT bytes from SOURCE to ENCODER, and read up to
>>> +TARGET-COUNT bytes into TARGET at TARGET-OFFSET.  Return two values: the
>>> +number of bytes read from SOURCE, and the number of bytes written to TARGET."
>>> +  (define read
>>> +    (if (< 0 (lz-compress-write-size encoder))
>>> +        (match (lz-compress-write encoder source source-offset source-count)
>>> +          (0 (lz-compress-finish encoder) 0)
>>> +          (n n))
>>> +        0))
>>> +
>>> +  (let loop ()
>>> +    (match (lz-compress-read encoder target target-offset target-count)
>>> +      (0       (loop))
>>> +      (written (values read written)))))
>>
>> Why looping on 0?  If there is no byte to read, wouldn't this loop indefinitely?
>
> Hmm, good point.  The idea is that ‘lzwrite!’ should return 0 only on
> end-of-file, but then the loop should include reading more from SOURCE.
> I’ll follow up on this one.

I noticed that ‘lz-compress-read’ is documented to return a “strictly
positive integer”, so I’m changing it to this:

--8<---------------cut here---------------start------------->8---
(define (lzwrite! encoder source source-offset source-count
                  target target-offset target-count)
  "Write up to SOURCE-COUNT bytes from SOURCE to ENCODER, and read up to
TARGET-COUNT bytes into TARGET at TARGET-OFFSET.  Return two values: the
number of bytes read from SOURCE, and the non-zero number of bytes written to
TARGET."
  (define read
    (if (< 0 (lz-compress-write-size encoder))
        (match (lz-compress-write encoder source source-offset source-count)
          (0 (lz-compress-finish encoder) 0)
          (n n))
        0))

  (define written
    ;; Note: 'lz-compress-read' promises to return a non-zero integer.
    (lz-compress-read encoder target target-offset target-count))

  (values read written))
--8<---------------cut here---------------end--------------->8---

Let me know what you think!

Ludo’.
Pierre Neidhardt May 27, 2019, 4:24 p.m. UTC | #4
Ludovic Courtès <ludo@gnu.org> writes:

> I noticed that ‘lz-compress-read’ is documented to return a “strictly
> positive integer”, so I’m changing it to this:

The docstring (that I wrote) says "non-negative positive integer."
"positive" is a typo (sorry about that), it should read "non-negative
integer" since the return value can be zero.

In general, lzlib's "reads" and "writes" don't give any guarantee about
the size of bytes that are actually processed.  You need to loop over
the calls until some condition is met, see the "finish(ed)" functions.

Here in particular, it's not clear that lz-compress-read is going to read
all the bytes in the encoder buffer.  Maybe that's OK for this
particular functions if we don't expect TARGET-COUNT to be reached.  See

  http://www.nongnu.org/lzip/manual/lzlib_manual.html#Buffering

That said, if the encoder buffer is not empty, I think lz-compress-read
should always return something >0.

Yup, lzlib is very low-level! :p
Ludovic Courtès May 27, 2019, 8:53 p.m. UTC | #5
Hi!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> That said, if the encoder buffer is not empty, I think lz-compress-read
> should always return something >0.

Yes, probably.  The docstring for ‘lz-compress-read’ says:

    "Read up to COUNT bytes from the encoder stream, storing the results in LZFILE-BV.
  Return the number of uncompressed bytes written, a strictly positive integer."
                                                     ^~~~~~~~~~~~~~~~~

However, the lzlib manual doesn’t say that for ‘LZ_compress_read’ (info
"(lzlib) Compression functions").

But that’s OK: the ‘read!’ method in ‘make-lzip-input-port/compressed’
can just call ‘lzwrite!’ again with more data when that happens, so I’ve
done that.

And I pushed the whole thing! :-)

I think it’d be good to let people play with it in their personal
setups.

Next up: multi-compression support in ‘guix publish’ (possibly?) so we
can smoothly transition on our build farms.

Thanks!

Ludo’.
Pierre Neidhardt May 27, 2019, 9:12 p.m. UTC | #6
Ludovic Courtès <ludo@gnu.org> writes:

>> That said, if the encoder buffer is not empty, I think lz-compress-read
>> should always return something >0.
>
> Yes, probably.  The docstring for ‘lz-compress-read’ says:

Oops, I read the docstring of lz-DEcompress-read.  My bad.

>     "Read up to COUNT bytes from the encoder stream, storing the results in LZFILE-BV.
>   Return the number of uncompressed bytes written, a strictly positive integer."
>                                                      ^~~~~~~~~~~~~~~~~

Bigger oops!  This comes from a copy-paste of the gzip docstring which I
forgot to update properly (I did for the decompression functions, but
not for the compression functions).  The docstrings should be fixed.

> But that’s OK: the ‘read!’ method in ‘make-lzip-input-port/compressed’
> can just call ‘lzwrite!’ again with more data when that happens, so I’ve
> done that.

This could work, but I've had some headaches on such assumptions
before.  Tests are very necessary here to validate those assumptions ;)

The thing is that we are not using lzlib as it is meant to be used
(i.e. with the finish* functions) because of the functional approach of
the binary ports which don't really play well with the procedural
approach of the C library.

> And I pushed the whole thing! :-)

Hurray!  Can't wait to say lz-compressed archives coming to Guix! :)
Ludovic Courtès May 28, 2019, 7:52 a.m. UTC | #7
Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> That said, if the encoder buffer is not empty, I think lz-compress-read
>>> should always return something >0.
>>
>> Yes, probably.  The docstring for ‘lz-compress-read’ says:
>
> Oops, I read the docstring of lz-DEcompress-read.  My bad.
>
>>     "Read up to COUNT bytes from the encoder stream, storing the results in LZFILE-BV.
>>   Return the number of uncompressed bytes written, a strictly positive integer."
>>                                                      ^~~~~~~~~~~~~~~~~
>
> Bigger oops!  This comes from a copy-paste of the gzip docstring which I
> forgot to update properly (I did for the decompression functions, but
> not for the compression functions).  The docstrings should be fixed.

I fixed this one in e13354a7ca5a0d5e28e02c4cfce6fecb1ab770e4.

>> But that’s OK: the ‘read!’ method in ‘make-lzip-input-port/compressed’
>> can just call ‘lzwrite!’ again with more data when that happens, so I’ve
>> done that.
>
> This could work, but I've had some headaches on such assumptions
> before.  Tests are very necessary here to validate those assumptions ;)

Definitely!

> The thing is that we are not using lzlib as it is meant to be used
> (i.e. with the finish* functions) because of the functional approach of
> the binary ports which don't really play well with the procedural
> approach of the C library.

I think we’re using it the way it’s meant to be used, roughly along the
lines of the examples of its manual (info "(lzlib) Examples").

(I/O ports are not very “functional”.)

>> And I pushed the whole thing! :-)
>
> Hurray!  Can't wait to say lz-compressed archives coming to Guix! :)

I’ve updated the ‘guix’ package so people can start using
‘guix publish -C lzip’ and fetch substitute from there.

Thanks for making it possible!

Ludo’.
Pierre Neidhardt May 28, 2019, 8:46 a.m. UTC | #8
Ludovic Courtès <ludo@gnu.org> writes:

>> Bigger oops!  This comes from a copy-paste of the gzip docstring which I
>> forgot to update properly (I did for the decompression functions, but
>> not for the compression functions).  The docstrings should be fixed.
>
> I fixed this one in e13354a7ca5a0d5e28e02c4cfce6fecb1ab770e4.

Thanks!

>> The thing is that we are not using lzlib as it is meant to be used
>> (i.e. with the finish* functions) because of the functional approach of
>> the binary ports which don't really play well with the procedural
>> approach of the C library.
>
> I think we’re using it the way it’s meant to be used, roughly along the
> lines of the examples of its manual (info "(lzlib) Examples").
>
> (I/O ports are not very “functional”.)

In the sense that we define  "read!" and "write!" functions which don't
allow us to call the "finish" functions properly.

So maybe we are following the doc too "roughly" :p

This has multiple implications, e.g. it impedes support for multiple
members, (which is OK as long as we don't accept archives produced by a
third-party) and more importantly it lifts the guarantee that the
library is going to work as per the documentation.

>>> And I pushed the whole thing! :-)
>>
>> Hurray!  Can't wait to say lz-compressed archives coming to Guix! :)
>
> I’ve updated the ‘guix’ package so people can start using
> ‘guix publish -C lzip’ and fetch substitute from there.
>
> Thanks for making it possible!

And thanks for doing most of the work! :D
Ludovic Courtès May 28, 2019, 1:47 p.m. UTC | #9
Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>>> The thing is that we are not using lzlib as it is meant to be used
>>> (i.e. with the finish* functions) because of the functional approach of
>>> the binary ports which don't really play well with the procedural
>>> approach of the C library.
>>
>> I think we’re using it the way it’s meant to be used, roughly along the
>> lines of the examples of its manual (info "(lzlib) Examples").
>>
>> (I/O ports are not very “functional”.)
>
> In the sense that we define  "read!" and "write!" functions which don't
> allow us to call the "finish" functions properly.
>
> So maybe we are following the doc too "roughly" :p
>
> This has multiple implications, e.g. it impedes support for multiple
> members, (which is OK as long as we don't accept archives produced by a
> third-party) and more importantly it lifts the guarantee that the
> library is going to work as per the documentation.

I’m not sure I follow.  I think ‘make-lzip-input-port/compressed’
corresponds to Example 2 in the manual (info "(lzlib) Examples"),
‘make-lzip-output-port’ corresponds to Example 1, and
‘make-lzip-input-port’ corresponds to Example 4 (with the exception that
‘lzread!’ doesn’t call ‘lz-decompress-finished?’, but it has other means
to tell whether we’re done processing input.)

What do you think is not done as documented?

Thanks,
Ludo’.
Pierre Neidhardt May 29, 2019, 2:57 p.m. UTC | #10
Ludovic Courtès <ludo@gnu.org> writes:

> I’m not sure I follow.  I think ‘make-lzip-input-port/compressed’
> corresponds to Example 2 in the manual (info "(lzlib) Examples"),
> ‘make-lzip-output-port’ corresponds to Example 1, and
> ‘make-lzip-input-port’ corresponds to Example 4 (with the exception that
> ‘lzread!’ doesn’t call ‘lz-decompress-finished?’, but it has other means
> to tell whether we’re done processing input.)

Example 4 is:

     1) LZ_decompress_open
     2) go to step 5 if LZ_decompress_write_size returns 0
     3) LZ_decompress_write
     4) if no more data to write, call LZ_decompress_finish
     5) LZ_decompress_read
     5a) optionally, if LZ_decompress_member_finished returns 1, read
         final values for member with LZ_decompress_data_crc, etc.
     6) go back to step 2 until LZ_decompress_finished returns 1
     7) LZ_decompress_close

In `lzread!', we don't call lz-decompress-finished? nor do we loop on
lz-decompress-finished.

This only works for decompression of single-member archive, but the
documentation does not say that.

--8<---------------cut here---------------start------------->8---
    (match (get-bytevector-n port (lz-decompress-write-size decoder))
      ((? eof-object? eof) eof)
      (bv (lz-decompress-write decoder bv)))
--8<---------------cut here---------------end--------------->8---

In the above if lz-decompress-write-size returns 0, we won't be reading
anything (infinite loop?).  While I understand this should not happen in
practice, the documentation of the library does not give such guarantees.
Antonio told me that explicitly.

--8<---------------cut here---------------start------------->8---
           (match (lz-decompress-read decoder bv start (- count read))
             (0 (if (eof-object? (feed-decoder! decoder))
                    read
                    (loop read start)))
--8<---------------cut here---------------end--------------->8---

I'm not sure I understand the above: if we read nothing, then we try
again?  This might loop forever.

What do you think?
Ludovic Courtès May 31, 2019, 8:54 p.m. UTC | #11
Hello,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> I’m not sure I follow.  I think ‘make-lzip-input-port/compressed’
>> corresponds to Example 2 in the manual (info "(lzlib) Examples"),
>> ‘make-lzip-output-port’ corresponds to Example 1, and
>> ‘make-lzip-input-port’ corresponds to Example 4 (with the exception that
>> ‘lzread!’ doesn’t call ‘lz-decompress-finished?’, but it has other means
>> to tell whether we’re done processing input.)
>
> Example 4 is:
>
>      1) LZ_decompress_open
>      2) go to step 5 if LZ_decompress_write_size returns 0
>      3) LZ_decompress_write
>      4) if no more data to write, call LZ_decompress_finish
>      5) LZ_decompress_read
>      5a) optionally, if LZ_decompress_member_finished returns 1, read
>          final values for member with LZ_decompress_data_crc, etc.
>      6) go back to step 2 until LZ_decompress_finished returns 1
>      7) LZ_decompress_close
>
> In `lzread!', we don't call lz-decompress-finished? nor do we loop on
> lz-decompress-finished.

Indeed, we’re missing a call to ‘lz-decompress-finish’, so lzlib could
in theory think there’s still data coming, and so fail to produce more
output, possibly leading to an infinite loop.

I think the ‘lzread!’ loop should look like this (the tests still pass
with this):

--8<---------------cut here---------------start------------->8---
  (let loop ((read 0)
             (start start))
    (cond ((< read count)
           (match (lz-decompress-read decoder bv start (- count read))
             (0 (cond ((lz-decompress-finished? decoder)
                       read)
                      ((eof-object? (feed-decoder! decoder))
                       (lz-decompress-finish decoder)
                       (loop read start))
                      (else                       ;read again
                       (loop read start))))
             (n (loop (+ read n) (+ start n)))))
          (else
           read)))
--8<---------------cut here---------------end--------------->8---

That way, I believe all cases are correctly handled.

WDYT?

> This only works for decompression of single-member archive, but the
> documentation does not say that.
>
>     (match (get-bytevector-n port (lz-decompress-write-size decoder))
>       ((? eof-object? eof) eof)
>       (bv (lz-decompress-write decoder bv)))
>
>
> In the above if lz-decompress-write-size returns 0, we won't be reading
> anything (infinite loop?).

We’re calling ‘feed-decoder!’ iff ‘lz-decompress-read’ returned 0; when
that happens ‘lz-decompress-write-size’ must return a strictly positive
number.  Otherwise there’s an inconsistency.

>            (match (lz-decompress-read decoder bv start (- count read))
>              (0 (if (eof-object? (feed-decoder! decoder))
>                     read
>                     (loop read start)))
>
> I'm not sure I understand the above: if we read nothing, then we try
> again?

No: if we read *something*, we try again; if we read nothing, we return.

Thanks for your careful review, much appreciated!

Ludo’.
Pierre Neidhardt June 1, 2019, 6:02 a.m. UTC | #12
Ludovic Courtès <ludo@gnu.org> writes:

> I think the ‘lzread!’ loop should look like this (the tests still pass
> with this):
>
> --8<---------------cut here---------------start------------->8---
>   (let loop ((read 0)
>              (start start))
>     (cond ((< read count)
>            (match (lz-decompress-read decoder bv start (- count read))
>              (0 (cond ((lz-decompress-finished? decoder)
>                        read)
>                       ((eof-object? (feed-decoder! decoder))
>                        (lz-decompress-finish decoder)
>                        (loop read start))
>                       (else                       ;read again
>                        (loop read start))))
>              (n (loop (+ read n) (+ start n)))))
>           (else
>            read)))
> --8<---------------cut here---------------end--------------->8---

Looks good to me!

>>            (match (lz-decompress-read decoder bv start (- count read))
>>              (0 (if (eof-object? (feed-decoder! decoder))
>>                     read
>>                     (loop read start)))
>>
>> I'm not sure I understand the above: if we read nothing, then we try
>> again?
>
> No: if we read *something*, we try again; if we read nothing, we return.

If we read nothing _and_ it is not an EOF (it can be an empty vector),
then we loop indefinitely, no?

> Thanks for your careful review, much appreciated!

You are welcome, thanks for your invaluable work!
Ludovic Courtès June 1, 2019, 9:41 a.m. UTC | #13
Hi!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> I think the ‘lzread!’ loop should look like this (the tests still pass
>> with this):
>>
>> --8<---------------cut here---------------start------------->8---
>>   (let loop ((read 0)
>>              (start start))
>>     (cond ((< read count)
>>            (match (lz-decompress-read decoder bv start (- count read))
>>              (0 (cond ((lz-decompress-finished? decoder)
>>                        read)
>>                       ((eof-object? (feed-decoder! decoder))
>>                        (lz-decompress-finish decoder)
>>                        (loop read start))
>>                       (else                       ;read again
>>                        (loop read start))))
>>              (n (loop (+ read n) (+ start n)))))
>>           (else
>>            read)))
>> --8<---------------cut here---------------end--------------->8---
>
> Looks good to me!

OK, committed!

>>>            (match (lz-decompress-read decoder bv start (- count read))
>>>              (0 (if (eof-object? (feed-decoder! decoder))
>>>                     read
>>>                     (loop read start)))
>>>
>>> I'm not sure I understand the above: if we read nothing, then we try
>>> again?
>>
>> No: if we read *something*, we try again; if we read nothing, we return.
>
> If we read nothing _and_ it is not an EOF (it can be an empty vector),
> then we loop indefinitely, no?

‘feed-decoder!’ cannot return an empty bytevector because
‘lz-decompress-write-size’ necessarily returns a strictly positive
integer at this point.

(Imperative programming is hard! :-))

Thanks,
Ludo’.
Pierre Neidhardt June 1, 2019, 9:58 a.m. UTC | #14
Ludovic Courtès <ludo@gnu.org> writes:

> ‘feed-decoder!’ cannot return an empty bytevector because
> ‘lz-decompress-write-size’ necessarily returns a strictly positive
> integer at this point.

I'm not sure that's true: if the buffer is full and the next
lz-decompress-read does not read anything, then the buffer will still be
full and lz-decompress-write-size will return 0.

The specs don't guarantee that lz-decompress-read will always read
something.

But that's the only assumption we are making I believe, and it's fair :)
Ludovic Courtès June 1, 2019, 12:21 p.m. UTC | #15
Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> ‘feed-decoder!’ cannot return an empty bytevector because
>> ‘lz-decompress-write-size’ necessarily returns a strictly positive
>> integer at this point.
>
> I'm not sure that's true: if the buffer is full and the next
> lz-decompress-read does not read anything, then the buffer will still be
> full and lz-decompress-write-size will return 0.

Hmm I don’t think that’s a reasonable scenario, otherwise it would mean
that you’re in a deadlock anyway (decoder buffer is already full yet it
does not contain enough data to actually decode anything.)

Dunno, I’m rather confident here but we’ll see if that causes any
troubles.

Thanks,
Ludo’.
diff mbox series

Patch

diff --git a/guix/lzlib.scm b/guix/lzlib.scm
index a6dac46049..48927c6262 100644
--- a/guix/lzlib.scm
+++ b/guix/lzlib.scm
@@ -1,5 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2019 Pierre Neidhardt <mail@ambrevar.xyz>
+;;; Copyright © 2019 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -23,9 +24,11 @@ 
   #:use-module (ice-9 match)
   #:use-module (system foreign)
   #:use-module (guix config)
+  #:use-module (srfi srfi-11)
   #:export (lzlib-available?
             make-lzip-input-port
             make-lzip-output-port
+            make-lzip-input-port/compressed
             call-with-lzip-input-port
             call-with-lzip-output-port
             %default-member-length-limit
@@ -515,6 +518,23 @@  the end-of-stream has been reached."
         (loop rd)))
     read))
 
+(define (lzwrite! encoder source source-offset source-count
+                  target target-offset target-count)
+  "Write up to SOURCE-COUNT bytes from SOURCE to ENCODER, and read up to
+TARGET-COUNT bytes into TARGET at TARGET-OFFSET.  Return two values: the
+number of bytes read from SOURCE, and the number of bytes written to TARGET."
+  (define read
+    (if (< 0 (lz-compress-write-size encoder))
+        (match (lz-compress-write encoder source source-offset source-count)
+          (0 (lz-compress-finish encoder) 0)
+          (n n))
+        0))
+
+  (let loop ()
+    (match (lz-compress-read encoder target target-offset target-count)
+      (0       (loop))
+      (written (values read written)))))
+
 (define* (lzwrite encoder bv lz-port
                   #:optional (start 0) (count (bytevector-length bv)))
   "Write up to COUNT bytes from BV at offset START into LZ-PORT.  Return
@@ -597,6 +617,48 @@  port is closed."
                                     (lz-compress-close encoder)
                                     (close-port port))))
 
+(define* (make-lzip-input-port/compressed port
+                                          #:key
+                                          (level %default-compression-level))
+  "Return an input port that compresses data read from PORT, with the given LEVEL.
+PORT is automatically closed when the resulting port is closed."
+  (define encoder (apply lz-compress-open
+                         (car (assoc-ref %compression-levels level))))
+
+  (define input-buffer (make-bytevector 8192))
+  (define input-len 0)
+  (define input-offset 0)
+
+  (define input-eof? #f)
+
+  (define (read! bv start count)
+    (cond
+     (input-eof?
+      (lz-compress-read encoder bv start count))
+     ((= input-offset input-len)
+      (match (get-bytevector-n! port input-buffer 0
+                                (bytevector-length input-buffer))
+        ((? eof-object?)
+         (set! input-eof? #t)
+         (lz-compress-finish encoder))
+        (count
+         (set! input-offset 0)
+         (set! input-len count)))
+      (read! bv start count))
+     (else
+      (let-values (((read written)
+                    (lzwrite! encoder
+                              input-buffer input-offset
+                              (- input-len input-offset)
+                              bv start count)))
+        (set! input-offset (+ input-offset read))
+        written))))
+
+  (make-custom-binary-input-port "lzip-input/compressed"
+                                 read! #f #f
+                                 (lambda ()
+                                   (close-port port))))
+
 (define* (call-with-lzip-input-port port proc)
   "Call PROC with a port that wraps PORT and decompresses data read from it.
 PORT is closed upon completion."
diff --git a/guix/tests.scm b/guix/tests.scm
index 35ebf8464d..66d60e964e 100644
--- a/guix/tests.scm
+++ b/guix/tests.scm
@@ -33,6 +33,7 @@ 
   #:use-module (web uri)
   #:export (open-connection-for-tests
             with-external-store
+            %seed
             random-text
             random-bytevector
             file=?
diff --git a/tests/lzlib.scm b/tests/lzlib.scm
index cf53a9417d..543622bb45 100644
--- a/tests/lzlib.scm
+++ b/tests/lzlib.scm
@@ -108,4 +108,14 @@ 
 (test-assert* "Bytevector of size relative to Lzip internal buffers (1MiB+1)"
   (compress-and-decompress (random-bytevector (1+ (* 1024 1024)))))
 
+(test-assert "make-lzip-input-port/compressed"
+  (let* ((len        (pk 'len (+ 10 (random 4000 %seed))))
+         (data       (random-bytevector len))
+         (compressed (make-lzip-input-port/compressed
+                      (open-bytevector-input-port data)))
+         (result     (call-with-lzip-input-port compressed
+                                                get-bytevector-all)))
+    (pk (bytevector-length result) (bytevector-length data))
+    (bytevector=? result data)))
+
 (test-end)