Message ID | e13090c1-dabb-da63-cc62-3975f2697527@yahoo.de |
---|---|
State | Accepted |
Headers | show |
Series | [bug#34223] Fixing timestamps in archives. | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | success | Successfully applied |
Hi Tim, Sorry for the delay! Tim Gesthuizen <tim.gesthuizen@yahoo.de> skribis: > as discussed before I have looked into the problems of timestamps in the > zip files. > I looked at the way this is solved in ant-build-system with jar files > and thought that this could be done in a more elegant way. > Because of this I wrote a simple frontend for LibArchive in C that > repacks archives and sets their timestamps to zero and disables > compression as it is done in the ant-build-system. > Creative as I am the program is called repack. > You find a git repository attached with the history of the repack program. > The attached patches add repack to Guix and use it for pwsafe and the > ant-build-system. Nice work! It’s great that libarchive doesn’t need to actually extract the zip file to operate on it. Overall I think the approach of factorizing archive-timestamp-resetting in one place and using it everywhere (‘ant-build-system’ and all) is the right thing to do. However, I’m not sure whether we should introduce a new program for this purpose. I believe ‘strip-nondeterminism’¹ (in Perl) by fellow Reproducible Builds hackers also addresses this problem, so it may be wiser to use it. But really, since (guix build utils) already implements a significant subset of ‘strip-nondeterminism’, it would be even better if could avoid to shell out to a C or Perl program. I played a bit with this idea and, as an example, the attached file allows you to traverse the list of entries in a zip file (it uses ‘guile-bytestructures’). Specifically, you can get the list of file names in a zip file by running: (call-with-input-file "something.zip" (lambda (port) (fold-entries cons '() port))) Resetting timestamps should be just as simple. How about taking this route? Thanks, Ludo’. ¹ https://salsa.debian.org/reproducible-builds/strip-nondeterminism (define-module (guix zip) #:use-module (rnrs bytevectors) #:use-module (rnrs io ports) #:use-module (bytestructures guile) #:use-module (ice-9 match) #:export (fold-entries)) (define <file-header> ;; File header, see ;; <https://en.wikipedia.org/wiki/Zip_(file_format)#File_headers>. (bs:struct #t ;packed `((signature ,uint32le) (version-needed ,uint16le) (flags ,uint16le) (compression ,uint16le) (modification-time ,uint16le) (modification-date ,uint16le) (crc32 ,uint32le) (compressed-size ,uint32le) (uncompressed-size ,uint32le) (file-name-length ,uint16le) (extra-field-length ,uint16le)))) (define-bytestructure-accessors <file-header> file-header-unwrap file-header-ref set-file-header!) (define (fold-entries proc seed port) "Fold PROC over all the entries in the zip file at PORT." (let loop ((result seed)) (match (get-bytevector-n port (bytestructure-descriptor-size <file-header>)) ((? bytevector? bv) (match (file-header-ref bv signature) (#x04034b50 ;local file header (let* ((len (file-header-ref bv file-name-length)) (name (utf8->string (get-bytevector-n port len)))) (set-port-position! port (+ (file-header-ref bv extra-field-length) (file-header-ref bv compressed-size) (port-position port))) (loop (proc name result)))) (#x02014b50 ;central directory record result) (#x06054b50 ;end of central directory record result))) ((? eof-object?) result))))
Le 16 février 2019 23:35:50 GMT+01:00, "Ludovic Courtès" <ludo@gnu.org> a écrit : >Hi Tim, > >Sorry for the delay! > >Tim Gesthuizen <tim.gesthuizen@yahoo.de> skribis: > >> as discussed before I have looked into the problems of timestamps in >the >> zip files. >> I looked at the way this is solved in ant-build-system with jar files >> and thought that this could be done in a more elegant way. >> Because of this I wrote a simple frontend for LibArchive in C that >> repacks archives and sets their timestamps to zero and disables >> compression as it is done in the ant-build-system. >> Creative as I am the program is called repack. >> You find a git repository attached with the history of the repack >program. >> The attached patches add repack to Guix and use it for pwsafe and the >> ant-build-system. > >Nice work! It’s great that libarchive doesn’t need to actually extract >the zip file to operate on it. > >Overall I think the approach of factorizing archive-timestamp-resetting >in one place and using it everywhere (‘ant-build-system’ and all) is >the >right thing to do. > >However, I’m not sure whether we should introduce a new program for >this >purpose. I believe ‘strip-nondeterminism’¹ (in Perl) by fellow >Reproducible Builds hackers also addresses this problem, so it may be >wiser to use it. > >But really, since (guix build utils) already implements a significant >subset of ‘strip-nondeterminism’, it would be even better if could >avoid >to shell out to a C or Perl program. > >I played a bit with this idea and, as an example, the attached file >allows you to traverse the list of entries in a zip file (it uses >‘guile-bytestructures’). Specifically, you can get the list of file >names in a zip file by running: > > (call-with-input-file "something.zip" > (lambda (port) > (fold-entries cons '() port))) > >Resetting timestamps should be just as simple. > >How about taking this route? > >Thanks, >Ludo’. > >¹ https://salsa.debian.org/reproducible-builds/strip-nondeterminism One of the reasons why we extract jar files is to remove compression, because the content might have store references that would be hidden, so grafting for instance wouldn't work.
Hi Ludo, > Sorry for the delay! No problem! I have very little time anyway. > Nice work! It’s great that libarchive doesn’t need to actually extract > the zip file to operate on it. > > Overall I think the approach of factorizing archive-timestamp-resetting > in one place and using it everywhere (‘ant-build-system’ and all) is the > right thing to do. > > However, I’m not sure whether we should introduce a new program for this > purpose. I believe ‘strip-nondeterminism’¹ (in Perl) by fellow > Reproducible Builds hackers also addresses this problem, so it may be > wiser to use it. I also think so. If there is already another program that does the job we should probably use it. > But really, since (guix build utils) already implements a significant > subset of ‘strip-nondeterminism’, it would be even better if could avoid > to shell out to a C or Perl program. > > I played a bit with this idea and, as an example, the attached file > allows you to traverse the list of entries in a zip file (it uses > ‘guile-bytestructures’). Specifically, you can get the list of file > names in a zip file by running: > > (call-with-input-file "something.zip" > (lambda (port) > (fold-entries cons '() port))) > > Resetting timestamps should be just as simple. > > How about taking this route? I also thought about taking this route. There are some problems with it though: - As Julien pointed out, the archive contents need to be uncompressed. This makes the problem much more complex and keeps us from writing a partial ZIP parser that replaces the timestamps in place. - While it would be quite elegant to just implement the parser in Scheme it would be redundant. After all we are developing a package manager so we should use it. This approach would be more attractive if there would be a Guile library for this. The best solution would be creating a proper library for handling archives when going with Scheme. - Maintaining a ZIP parser in Guix is a burden we should not take. - We need to care about a lot of details (ZIP64, probably more exotic extensions). I would be fine with writing an own parser in Scheme but I would like to point out that in every other place in Guix we are using external tools for handling archives (AFAIK). I am not quite sure which version would be the best, so I am open for other opinions on this. Maybe you could rephrase your position taking the compression problem into consideration. Tim.
Hello, Tim Gesthuizen <tim.gesthuizen@yahoo.de> skribis: >> I played a bit with this idea and, as an example, the attached file >> allows you to traverse the list of entries in a zip file (it uses >> ‘guile-bytestructures’). Specifically, you can get the list of file >> names in a zip file by running: >> >> (call-with-input-file "something.zip" >> (lambda (port) >> (fold-entries cons '() port))) >> >> Resetting timestamps should be just as simple. >> >> How about taking this route? > > I also thought about taking this route. > There are some problems with it though: > > - As Julien pointed out, the archive contents need to be uncompressed. > This makes the problem much more complex and keeps us from writing > a partial ZIP parser that replaces the timestamps in place. True, I had overlooked that. In that case, we should definitely unpack and repack using the ‘zip’ package (I wasn’t suggesting to write a complete ‘zip’ implementation; I do think it would be valuable in the long term, but it’s a project for another time, no question here.) In that case though, it probably doesn’t buy us much to use libarchive in a separate C program, WDYT? Should we just stick to the current approach that invokes ‘unzip’ and ‘zip’? Thanks, Ludo’.
From 3df6e33f52ac2906ec98cc9b74ef93d9cbb22108 Mon Sep 17 00:00:00 2001 From: Tim Gesthuizen <tim.gesthuizen@yahoo.de> Date: Sat, 19 Jan 2019 17:13:45 +0100 Subject: [PATCH 11/11] gnu: ant: Use repack for repacking archives * gnu/packages/java.scm (ant): [native-inputs]: Use repack in favour of zip and unzip. --- gnu/packages/java.scm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gnu/packages/java.scm b/gnu/packages/java.scm index ad61bf294..0002c1f83 100644 --- a/gnu/packages/java.scm +++ b/gnu/packages/java.scm @@ -1887,8 +1887,7 @@ new Date();")) "1k28mka0m3isy9yr8gz84kz1f3f879rwaxrd44vdn9xbfwvwk86n")))) (native-inputs `(("jdk" ,icedtea-7 "jdk") - ("zip" ,zip) - ("unzip" ,unzip))))) + ("repack" ,repack))))) (define-public ant-apache-bcel (package -- 2.20.1