Message ID | 20190524134238.22802-7-ludo@gnu.org |
---|---|
State | Accepted |
Headers | show |
Series | [bug#35880,1/7] lzlib: Add 'make-lzip-input-port/compressed'. | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | Apply failed |
> Fixes a bug whereby 'lzread!' could return more than COUNT.
Hmm... But why is this a bug? lzread! returns the number of
_uncompressed_ bytes, while COUNT is number of _compressed_ bytes
written, so it's often expected that the former is more than the latter.
(By the way, nice rewrite, I like it much better than my C->Scheme
translation ;p)
Pierre Neidhardt <mail@ambrevar.xyz> skribis: >> Fixes a bug whereby 'lzread!' could return more than COUNT. > > Hmm... But why is this a bug? Because then the ‘read!’ method of the custom binary input port could return more than ‘count’, which is understandably not permitted. > (By the way, nice rewrite, I like it much better than my C->Scheme > translation ;p) Heheh, I initially tried to minimize changes but I found it easier to reason about this version. Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Pierre Neidhardt <mail@ambrevar.xyz> skribis: > >>> Fixes a bug whereby 'lzread!' could return more than COUNT. >> >> Hmm... But why is this a bug? > > Because then the ‘read!’ method of the custom binary input port could > return more than ‘count’, which is understandably not permitted. That's the part where I'm a bit confused because we deal with compressed data here. So when we say "(read count)", does COUNT refer to the compressed or uncompressed data?
Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> Pierre Neidhardt <mail@ambrevar.xyz> skribis: >> >>>> Fixes a bug whereby 'lzread!' could return more than COUNT. >>> >>> Hmm... But why is this a bug? >> >> Because then the ‘read!’ method of the custom binary input port could >> return more than ‘count’, which is understandably not permitted. > > That's the part where I'm a bit confused because we deal with compressed > data here. > > So when we say "(read count)", does COUNT refer to the compressed or > uncompressed data? We have this: (define* (make-lzip-input-port port) (define decoder (lz-decompress-open)) (define (read! bv start count) (lzread! decoder port bv start count)) (make-custom-binary-input-port "lzip-input" read! #f #f (lambda () …))) Here ‘read!’ must return an integer between 1 and COUNT; it must return 0 if and only if the end-of-file is reached. IOW, ‘lzread!’ must return the number of uncompressed bytes of BV that it consumed, and that number is necessarily <= COUNT. Does that make sense? Thanks, Ludo’.
It does make sense, but then don't we have the same issue with zlib.scm: --8<---------------cut here---------------start------------->8--- (define gzread! (let ((proc (zlib-procedure int "gzread" (list '* '* unsigned-int)))) (lambda* (gzfile bv #:optional (start 0) (count (bytevector-length bv))) "Read up to COUNT bytes from GZFILE into BV at offset START. Return the number of uncompressed bytes actually read; it is zero if COUNT is zero or if the end-of-stream has been reached." ... --8<---------------cut here---------------end--------------->8--- I initially tried to mimic zlib.scm and this part confused me a lot back then.
Hi Pierre, Pierre Neidhardt <mail@ambrevar.xyz> skribis: > It does make sense, but then don't we have the same issue with zlib.scm: Which issue? > (define gzread! > (let ((proc (zlib-procedure int "gzread" (list '* '* unsigned-int)))) > (lambda* (gzfile bv #:optional (start 0) (count (bytevector-length bv))) > "Read up to COUNT bytes from GZFILE into BV at offset START. Return the > number of uncompressed bytes actually read; it is zero if COUNT is zero or if > the end-of-stream has been reached." > ... > > I initially tried to mimic zlib.scm and this part confused me a lot back then. There’s a key difference: the ‘gzread’ etc. API is high-level and easy to use, but it wants a file descriptor to read from (thus a file port in Scheme land.) That’s enough for ‘guix publish’, which writes gzipped data to files, but that’s not enough for ‘guix substitute’, which can read data from non-file ports (e.g., chunked-encoding ports or TLS ports from the HTTP client.) HTH, Ludo’.
diff --git a/guix/lzlib.scm b/guix/lzlib.scm index 48927c6262..0e384f6cb8 100644 --- a/guix/lzlib.scm +++ b/guix/lzlib.scm @@ -494,29 +494,28 @@ perhaps not yet read." ;; High level functions. -(define* (lzread! decoder file-port bv + +(define* (lzread! decoder port bv #:optional (start 0) (count (bytevector-length bv))) - "Read up to COUNT bytes from FILE-PORT into BV at offset START. Return the + "Read up to COUNT bytes from PORT into BV at offset START. Return the number of uncompressed bytes actually read; it is zero if COUNT is zero or if the end-of-stream has been reached." - ;; WARNING: Because we don't alternate between lz-reads and lz-writes, we can't - ;; process more than lz-decompress-write-size from the file-port. - (when (> count (lz-decompress-write-size decoder)) - (set! count (lz-decompress-write-size decoder))) - (let ((file-bv (get-bytevector-n file-port count))) - (unless (eof-object? file-bv) - (lz-decompress-write decoder file-bv 0 (bytevector-length file-bv)))) - (let ((read 0)) - (let loop ((rd 0)) - (if (< start (bytevector-length bv)) - (begin - (set! rd (lz-decompress-read decoder bv start (- (bytevector-length bv) start))) - (set! start (+ start rd)) - (set! read (+ read rd))) - (set! rd 0)) - (unless (= rd 0) - (loop rd))) - read)) + (define (feed-decoder! decoder) + ;; Feed DECODER with data read from PORT. + (match (get-bytevector-n port (lz-decompress-write-size decoder)) + ((? eof-object? eof) eof) + (bv (lz-decompress-write decoder bv)))) + + (let loop ((read 0) + (start start)) + (cond ((< read count) + (match (lz-decompress-read decoder bv start (- count read)) + (0 (if (eof-object? (feed-decoder! decoder)) + read + (loop read start))) + (n (loop (+ read n) (+ start n))))) + (else + read)))) (define (lzwrite! encoder source source-offset source-count target target-offset target-count)