Message ID | 20220527082519.501697-1-zimon.toutoune@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | [bug#55673] cache: Catch valid integer for 'last-expiry-cleanup'. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
zimoun schreef op vr 27-05-2022 om 10:25 [+0200]: > (catch 'system-error > (lambda () > - (call-with-input-file expiry-file read)) > + (match (call-with-input-file expiry-file read) > + ((? integer? date) date) > + (_ 0))) It might be possible to end up wit hsomething more bogus on some file system, it's possible to end up with something even more boguse (e.g., "unterminated-string), which 'read' doesn't understand. I suggest using 'get-string-all' + 'number->string'. For completeness, a comment like ;; Handle the 'write' below being interrupted before the write ;; could complete (e.g. with C-c) and handle file system crashes ;; causing empty files or corrupted contents. and a regression test in tets/cache.scm would be nice. Also, I'd switch the catch and the (match ...) because 'read' and integer? shouldn't raise any 'system-error'. Greetings, Maxime.
Hi Maxime, On Fri, 27 May 2022 at 11:54, Maxime Devos <maximedevos@telenet.be> wrote: > zimoun schreef op vr 27-05-2022 om 10:25 [+0200]: > > (catch 'system-error > > (lambda () > > - (call-with-input-file expiry-file read)) > > + (match (call-with-input-file expiry-file read) > > + ((? integer? date) date) > > + (_ 0))) > > It might be possible to end up wit hsomething more bogus on some file > system, it's possible to end up with something even more boguse (e.g., > "unterminated-string), which 'read' doesn't understand. I suggest > using 'get-string-all' + 'number->string'. I do not see how. The integer is written by Guile using 'write'. From my understanding, 'read' understands 'write', and vice-versa. > Also, I'd switch the catch and the (match ...) because 'read' and > integer? shouldn't raise any 'system-error'. All the cases are covered, IMHO. Cheers, simon
zimoun schreef op vr 27-05-2022 om 12:28 [+0200]: > > It might be possible to end up wit hsomething more bogus on some > > file > > system, it's possible to end up with something even more boguse > > (e.g., > > "unterminated-string), which 'read' doesn't understand. I suggest > > using 'get-string-all' + 'number->string'. > > I do not see how. The integer is written by Guile using 'write'. > From my understanding, 'read' understands 'write', and vice-versa. But what if the computer crashes while the file is being made, and due to file system details, when rebooted, you end up with a non-integer? E.g. "unterminated-string. Greetings, Maxime.
zimoun schreef op vr 27-05-2022 om 12:28 [+0200]: > > Also, I'd switch the catch and the (match ...) because 'read' and > > integer? shouldn't raise any 'system-error'. > > All the cases are covered, IMHO. Too many cases are covered. E.g., what if the file was a directory? scheme@(guile-user)> (call-with-input-file "." read) ice-9/boot-9.scm:1669:16: In procedure raise-exception: In procedure fport_read: Is een map Greetings, Maxime.
On Fri, 27 May 2022 at 13:17, Maxime Devos <maximedevos@telenet.be> wrote: > scheme@(guile-user)> (call-with-input-file "." read) > ice-9/boot-9.scm:1669:16: In procedure raise-exception: > In procedure fport_read: Is een map Euh, you are overengineering, no? We are talking about an internal file used by the Guix cache. Yes, if the user tweaks this cache, then bad things can happen. It is true for almost what lives in ~/.cache/guix. Cheers, simon
On Fri, 27 May 2022 at 13:12, Maxime Devos <maximedevos@telenet.be> wrote: > But what if the computer crashes while the file is being made, and due > to file system details, when rebooted, you end up with a non-integer? > E.g. "unterminated-string. If the value returned by 'read' is a non-integer, then it is set to 0 by the 'match'. --8<---------------cut here---------------start------------->8--- $ echo -n -e \\x00\\x01\\x02\\x03 > ~/.cache/guix/inferiors/last-expiry-cleanup $ cat ~/.cache/guix/inferiors/last-expiry-cleanup \0 [env] $ ./pre-inst-env guix time-machine --commit=9d795fb -- help 1653651128[env] --8<---------------cut here---------------end--------------->8--- The only case is when 'read' fails. Personally, I do not see how it would be possible. If you have a concrete example, then we can examine. Cheers, simon
zimoun schreef op vr 27-05-2022 om 13:24 [+0200]: > On Fri, 27 May 2022 at 13:17, Maxime Devos <maximedevos@telenet.be> wrote: > > > scheme@(guile-user)> (call-with-input-file "." read) > > ice-9/boot-9.scm:1669:16: In procedure raise-exception: > > In procedure fport_read: Is een map > > Euh, you are overengineering, no? We are talking about an internal > file used by the Guix cache. Yes, if the user tweaks this cache, then > bad things can happen. It is true for almost what lives in > ~/.cache/guix. Probably yes. Maybe it makes more sense when applied to get-string-all + string->number in a limited form: (or (string->number (catch 'system-error (lambda () (call-with-input-file [...] get-string-all)) (lambda arglist (if (= ENOENT (system-error-errno arglist)) "0" ; file does not exist (apply throw arglist))))) 0) Though even then there remain potential problems, try scheme@(guile-user)> (string->number "#e1e1000") ice-9/boot-9.scm:1669:16: In procedure raise-exception: In procedure string->number: Value out of range: 1000 (seems unlikely to encounter such corruption in practice though). Greetings, MAxime.
zimoun schreef op vr 27-05-2022 om 13:39 [+0200]: > The only case is when 'read' fails. Personally, I do not see how it > would be possible. If you have a concrete example, then we can > examine. I don't have a 100%-concrete example, but wasn't there some file system crash mode, where the contents of a new file has not yet been written to disk yet the length of the file is > 0, so effectively the file points to an arbitrary range on the disk? E.g., say Guix told the FS write 1234 to last-expiry-cleanup. Then the FS created last-expiry- cleanup, choose a range of 4 bytes to save it as, then crashes before writing the contents. After restarting, the file contains the _old_ 4 bytes. These old 4 bytes could be the ASCII representation of "foo . Then, when 'read' is run (after rebooting), it sees an incomplete string "foo, so it fails. Greetings, Maxime.
Hi Maxime, On Fri, 27 May 2022 at 13:49, Maxime Devos <maximedevos@telenet.be> wrote: > These old 4 bytes could be the ASCII representation of > > "foo > > . Then, when 'read' is run (after rebooting), it sees an incomplete > string "foo, so it fails. The question is how would 'read' fail or what would 'read' return? For instance, the patch works for these cases: - empty file - non-integer Now, if you are able to generate an incomplete file (from an integer or whatever) against the patch fails, then we can examine. However, I miss what would be the difference between this incomplete file and, let say, this case: echo -n -e \\x12 > ~/.cache/guix/inferiors/last-expiry-cleanup handled by the patch. Cheers, simon
zimoun schreef op vr 27-05-2022 om 14:40 [+0200]: > > These old 4 bytes could be the ASCII representation of > > > > "foo > > > > . Then, when 'read' is run (after rebooting), it sees an > > incomplete > > string "foo, so it fails. > > The question is how would 'read' fail or what would 'read' return? > For instance, the patch works for these cases: > > - empty file > - non-integer > > Now, if you are able to generate an incomplete file (from an integer > or whatever) against the patch fails, then we can examine. However, > I > miss what would be the difference between this incomplete file and, > let say, this case: > > echo -n -e \\x12 > ~/.cache/guix/inferiors/last-expiry-cleanup > > handled by the patch. The incomplete file is: "foo as mentioned previously. Here's how it fails: scheme@(guile-user)> (call-with-input-file "a" read) ice-9/boot-9.scm:1669:16: In procedure raise-exception: In procedure scm_lreadr: a:2:1: end of file in string constant The difference is that ^R is interpreted as a symbol, whereas "foo cannot be interpreted as anything at all by 'read'. Greetings, Maxime. "foo
Hi Maxime, On Fri, 27 May 2022 at 15:04, Maxime Devos <maximedevos@telenet.be> wrote: > scheme@(guile-user)> (call-with-input-file "a" read) > ice-9/boot-9.scm:1669:16: In procedure raise-exception: > In procedure scm_lreadr: a:2:1: end of file in string constant This is an ad-hoc example and not a real test case. --8<---------------cut here---------------start------------->8--- $ echo "foo" > ~/.cache/guix/inferiors/last-expiry-cleanup $ cat ~/.cache/guix/inferiors/last-expiry-cleanup foo $ ./pre-inst-env guix time-machine --commit=9d795fb -- help Usage: guix OPTION | COMMAND ARGS... Run COMMAND with ARGS, if given. --8<---------------cut here---------------end--------------->8--- As previously mentioned, the patch fixes: - empty file - non-integer I am not able to imagine an incomplete file worse than \\x00. > The difference is that ^R is interpreted as a symbol, whereas "foo > cannot be interpreted as anything at all by 'read'. I do not understand what you mean and I think you are overengineering. If you are able to produce a corrupted file which breaks "guix time-machine", then we can examine. Else let move on. :-) Cheers, simon
On Fri, 27 May 2022 at 15:23, zimoun <zimon.toutoune@gmail.com> wrote: > On Fri, 27 May 2022 at 15:04, Maxime Devos <maximedevos@telenet.be> wrote: > > > scheme@(guile-user)> (call-with-input-file "a" read) > > ice-9/boot-9.scm:1669:16: In procedure raise-exception: > > In procedure scm_lreadr: a:2:1: end of file in string constant > > This is an ad-hoc example and not a real test case. [...] > I am not able to imagine an incomplete file worse than \\x00. Just to be sure, I mean: an incomplete integer. For sure, any incomplete (unbalanced) sexp is breaking 'read', as the example "foo or (1 or whatever else; as you are correctly pointing. But since the cache 'write' an integer, it means it would be an incomplete integer. Cheers, simon
> > The difference is that ^R is interpreted as a symbol, whereas "foo > > cannot be interpreted as anything at all by 'read'. > I do not understand what you mean ^R is interpreted as a symbol: (symbol? (call-with-input-string "\x12" read)). "foo" is interpreted as a string: (string? (call-with-input-string "\"foo\"" read)) "foo without a terminating string cannot be interpreted at all: (call-with-input-string "\"foo" read) > and I think you are overengineering. It's not any more overengineering than catching not-an-integer IMO. AFAICT, this does not find the definition of overengineering I found on Wikipedia. Also, I do not understand the resistance -- I have a simple proposal for generalising your patch to more failure modes, with a demonstration and test case (see the file "a") on when it is necessary and a proposed implementation. zimoun schreef op vr 27-05-2022 om 15:23 [+0200]: If you are able to produce a corrupted file which breaks "guix time-machine", then we can examine. Else let move on. :-) I previously produced the corrupted file, see the file "a". I am not willing to deliberately corrupt my file system for this, especially when I can just give a synthetic example of corrupted file (see the file "a") and especially since making a synthetic example is much simpler and faster. Greetings, Maxime.
Hi, On Fri, 27 May 2022 at 16:02, Maxime Devos <maximedevos@telenet.be> wrote: > Also, I do not understand the resistance -- I have a simple proposal > for generalising your patch to more failure modes, with a demonstration > and test case (see the file "a") on when it is necessary and a > proposed implementation. I have sent a v2 using your proposal (which appears to me overcomplicated). It is not resistance but pragmatic: the only case of interest is the empty file, which happens -- all the others, I am still waiting at least one bug report about them i.e., a user runs "guix time-machine" and suddenly the file last-expiry-cleanup is corrupted and "guix time-machine" unusable. Pragmatic because, for instance, from 2 to " or from 8 to ( it is one bit-flip and thus 'read' would be easily broken. I miss why such lengthy discussion about these theoretical failures of last-expiry-cleanup when it is also true each time 'read' is used, see least-authority or ui.scm etc. But I have never read a word. Anyway. Cheers, simon
zimoun schreef op vr 27-05-2022 om 18:19 [+0200]: > I miss why such lengthy discussion about these theoretical > failures of last-expiry-cleanup when it is also true each time 'read' > is used, see least-authority or ui.scm etc. (guix ui) cannot do anything about corruption except report the read failure, whereas (guix cache) has a very strict file format so it is feasible to detect whether it's corruption or just the user making a typo (because those files aren't directly written by a user) and additionally it can very easily handle the corruption. For (guix authority), there is already a corruption detection mechanism ("guix gc --verify=contents") -- there even already is a repair mechanism: "guix gc --verify=contents,repair". > It is not resistance but pragmatic: the only case of interest is > the empty file, which happens -- all the others, I am still waiting > at least one bug report about them i.e., a user runs "guix time- > machine" and suddenly the file last-expiry-cleanup is corrupted and > "guix time-machine" unusable. * The general issue of file system corruption in Guix is already known (the Guix daemon never calls fsync or sync except on the SQLite database), though I don't know if a formal bug report exists about that. There have been many bug reports on individual cases though. * This bug report already exists: <http://issues.guix.gnu.org/55638>. (You say the file system is not corrupted, but how would you know? Even if not, the symptoms are almost identical.) * I do not see the point of waiting for any known suffering users reporting the bug before fixing the bug. Seems negligent to me if the fix is easy and known, and not very pragmatic for those future (or maybe current and shy) users. Also has a risk of rebase conflicts, which does not seem pragmatic to me. Greetings, Maxime
diff --git a/guix/cache.scm b/guix/cache.scm index 51009809bd..4a74c42afe 100644 --- a/guix/cache.scm +++ b/guix/cache.scm @@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2020, 2021 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2022 Simon Tournier <zimon.toutoune@gmail.com> ;;; ;;; This file is part of GNU Guix. ;;; @@ -93,7 +94,9 @@ (define expiry-file (define last-expiry-date (catch 'system-error (lambda () - (call-with-input-file expiry-file read)) + (match (call-with-input-file expiry-file read) + ((? integer? date) date) + (_ 0))) (const 0))) (when (obsolete? last-expiry-date now cleanup-period)