diff mbox series

[bug#35880,7/7] lzlib: 'lzread!' never returns more than itwas asked for.

Message ID 20190524134238.22802-7-ludo@gnu.org
State Accepted
Headers show
Series [bug#35880,1/7] lzlib: Add 'make-lzip-input-port/compressed'. | expand

Checks

Context Check Description
cbaines/applying patch fail Apply failed

Commit Message

Ludovic Courtès May 24, 2019, 1:42 p.m. UTC
Fixes a bug whereby 'lzread!' could return more than COUNT.

* guix/lzlib.scm (lzread!): Rewrite in a semi-functional style.
---
 guix/lzlib.scm | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

Comments

Pierre Neidhardt May 25, 2019, 5:31 p.m. UTC | #1
> Fixes a bug whereby 'lzread!' could return more than COUNT.

Hmm... But why is this a bug?  lzread! returns the number of
_uncompressed_ bytes, while COUNT is number of _compressed_ bytes
written, so it's often expected that the former is more than the latter.

(By the way, nice rewrite, I like it much better than my C->Scheme
translation ;p)
Ludovic Courtès May 26, 2019, 7:54 p.m. UTC | #2
Pierre Neidhardt <mail@ambrevar.xyz> skribis:

>> Fixes a bug whereby 'lzread!' could return more than COUNT.
>
> Hmm... But why is this a bug?

Because then the ‘read!’ method of the custom binary input port could
return more than ‘count’, which is understandably not permitted.

> (By the way, nice rewrite, I like it much better than my C->Scheme
> translation ;p)

Heheh, I initially tried to minimize changes but I found it easier to
reason about this version.

Ludo’.
Pierre Neidhardt May 26, 2019, 8:57 p.m. UTC | #3
Ludovic Courtès <ludo@gnu.org> writes:

> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>
>>> Fixes a bug whereby 'lzread!' could return more than COUNT.
>>
>> Hmm... But why is this a bug?
>
> Because then the ‘read!’ method of the custom binary input port could
> return more than ‘count’, which is understandably not permitted.

That's the part where I'm a bit confused because we deal with compressed
data here.

So when we say "(read count)", does COUNT refer to the compressed or
uncompressed data?
Ludovic Courtès May 26, 2019, 9:28 p.m. UTC | #4
Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>>
>>>> Fixes a bug whereby 'lzread!' could return more than COUNT.
>>>
>>> Hmm... But why is this a bug?
>>
>> Because then the ‘read!’ method of the custom binary input port could
>> return more than ‘count’, which is understandably not permitted.
>
> That's the part where I'm a bit confused because we deal with compressed
> data here.
>
> So when we say "(read count)", does COUNT refer to the compressed or
> uncompressed data?

We have this:

  (define* (make-lzip-input-port port)
    (define decoder (lz-decompress-open))

    (define (read! bv start count)
      (lzread! decoder port bv start count))

    (make-custom-binary-input-port "lzip-input" read! #f #f
                                   (lambda () …)))

Here ‘read!’ must return an integer between 1 and COUNT; it must return
0 if and only if the end-of-file is reached.

IOW, ‘lzread!’ must return the number of uncompressed bytes of BV that
it consumed, and that number is necessarily <= COUNT.

Does that make sense?

Thanks,
Ludo’.
Pierre Neidhardt May 27, 2019, 7 a.m. UTC | #5
It does make sense, but then don't we have the same issue with zlib.scm:

--8<---------------cut here---------------start------------->8---
(define gzread!
  (let ((proc (zlib-procedure int "gzread" (list '* '* unsigned-int))))
    (lambda* (gzfile bv #:optional (start 0) (count (bytevector-length bv)))
      "Read up to COUNT bytes from GZFILE into BV at offset START.  Return the
number of uncompressed bytes actually read; it is zero if COUNT is zero or if
the end-of-stream has been reached."
 ...
--8<---------------cut here---------------end--------------->8---

I initially tried to mimic zlib.scm and this part confused me a lot back then.
Ludovic Courtès May 27, 2019, 10 a.m. UTC | #6
Hi Pierre,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> It does make sense, but then don't we have the same issue with zlib.scm:

Which issue?

> (define gzread!
>   (let ((proc (zlib-procedure int "gzread" (list '* '* unsigned-int))))
>     (lambda* (gzfile bv #:optional (start 0) (count (bytevector-length bv)))
>       "Read up to COUNT bytes from GZFILE into BV at offset START.  Return the
> number of uncompressed bytes actually read; it is zero if COUNT is zero or if
> the end-of-stream has been reached."
>  ...
>
> I initially tried to mimic zlib.scm and this part confused me a lot back then.

There’s a key difference: the ‘gzread’ etc. API is high-level and easy
to use, but it wants a file descriptor to read from (thus a file port in
Scheme land.)

That’s enough for ‘guix publish’, which writes gzipped data to files,
but that’s not enough for ‘guix substitute’, which can read data from
non-file ports (e.g., chunked-encoding ports or TLS ports from the HTTP
client.)

HTH,
Ludo’.
diff mbox series

Patch

diff --git a/guix/lzlib.scm b/guix/lzlib.scm
index 48927c6262..0e384f6cb8 100644
--- a/guix/lzlib.scm
+++ b/guix/lzlib.scm
@@ -494,29 +494,28 @@  perhaps not yet read."
 
 
 ;; High level functions.
-(define* (lzread! decoder file-port bv
+
+(define* (lzread! decoder port bv
                   #:optional (start 0) (count (bytevector-length bv)))
-  "Read up to COUNT bytes from FILE-PORT into BV at offset START.  Return the
+  "Read up to COUNT bytes from PORT into BV at offset START.  Return the
 number of uncompressed bytes actually read; it is zero if COUNT is zero or if
 the end-of-stream has been reached."
-  ;; WARNING: Because we don't alternate between lz-reads and lz-writes, we can't
-  ;; process more than lz-decompress-write-size from the file-port.
-  (when (> count (lz-decompress-write-size decoder))
-    (set! count (lz-decompress-write-size decoder)))
-  (let ((file-bv (get-bytevector-n file-port count)))
-    (unless (eof-object? file-bv)
-      (lz-decompress-write decoder file-bv 0 (bytevector-length file-bv))))
-  (let ((read 0))
-    (let loop ((rd 0))
-      (if (< start (bytevector-length bv))
-          (begin
-            (set! rd (lz-decompress-read decoder bv start (- (bytevector-length bv) start)))
-            (set! start (+ start rd))
-            (set! read (+ read rd)))
-          (set! rd 0))
-      (unless (= rd 0)
-        (loop rd)))
-    read))
+  (define (feed-decoder! decoder)
+    ;; Feed DECODER with data read from PORT.
+    (match (get-bytevector-n port (lz-decompress-write-size decoder))
+      ((? eof-object? eof) eof)
+      (bv (lz-decompress-write decoder bv))))
+
+  (let loop ((read 0)
+             (start start))
+    (cond ((< read count)
+           (match (lz-decompress-read decoder bv start (- count read))
+             (0 (if (eof-object? (feed-decoder! decoder))
+                    read
+                    (loop read start)))
+             (n (loop (+ read n) (+ start n)))))
+          (else
+           read))))
 
 (define (lzwrite! encoder source source-offset source-count
                   target target-offset target-count)