diff mbox series

[bug#55673] cache: Catch valid integer for 'last-expiry-cleanup'.

Message ID 20220527082519.501697-1-zimon.toutoune@gmail.com
State Accepted
Headers show
Series [bug#55673] cache: Catch valid integer for 'last-expiry-cleanup'. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Simon Tournier May 27, 2022, 8:25 a.m. UTC
Fixes <http://issues.guix.gnu.org/55638>.

* guix/cache.scm (maybe-remove-expired-cache-entries)[last-expiry-date]: Check
if the date is a valid integer.
---
 guix/cache.scm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


base-commit: 38bf6c7d0cb588e8d4546db7d2e0bae2ec25183d

Comments

M May 27, 2022, 9:54 a.m. UTC | #1
zimoun schreef op vr 27-05-2022 om 10:25 [+0200]:
>      (catch 'system-error
>        (lambda ()
> -        (call-with-input-file expiry-file read))
> +        (match (call-with-input-file expiry-file read)
> +          ((? integer? date) date)
> +          (_ 0)))

It might be possible to end up wit hsomething more bogus on some file
system, it's possible to end up with something even more boguse (e.g.,
"unterminated-string), which 'read' doesn't understand.  I suggest
using 'get-string-all' + 'number->string'.

For completeness, a comment like

   ;; Handle the 'write' below being interrupted before the write
   ;; could complete (e.g. with C-c) and handle file system crashes
   ;; causing empty files or corrupted contents.

and a regression test in tets/cache.scm would be nice.

Also, I'd switch the catch and the (match ...) because 'read' and
integer? shouldn't raise any 'system-error'.

Greetings,
Maxime.
Simon Tournier May 27, 2022, 10:28 a.m. UTC | #2
Hi Maxime,

On Fri, 27 May 2022 at 11:54, Maxime Devos <maximedevos@telenet.be> wrote:
> zimoun schreef op vr 27-05-2022 om 10:25 [+0200]:

> >      (catch 'system-error
> >        (lambda ()
> > -        (call-with-input-file expiry-file read))
> > +        (match (call-with-input-file expiry-file read)
> > +          ((? integer? date) date)
> > +          (_ 0)))
>
> It might be possible to end up wit hsomething more bogus on some file
> system, it's possible to end up with something even more boguse (e.g.,
> "unterminated-string), which 'read' doesn't understand.  I suggest
> using 'get-string-all' + 'number->string'.

I do not see how.  The integer is written by Guile using 'write'.
From my understanding, 'read' understands 'write', and vice-versa.


> Also, I'd switch the catch and the (match ...) because 'read' and
> integer? shouldn't raise any 'system-error'.

All the cases are covered, IMHO.


Cheers,
simon
M May 27, 2022, 11:12 a.m. UTC | #3
zimoun schreef op vr 27-05-2022 om 12:28 [+0200]:
> > It might be possible to end up wit hsomething more bogus on some
> > file
> > system, it's possible to end up with something even more boguse
> > (e.g.,
> > "unterminated-string), which 'read' doesn't understand.  I suggest
> > using 'get-string-all' + 'number->string'.
> 
> I do not see how.  The integer is written by Guile using 'write'.
> From my understanding, 'read' understands 'write', and vice-versa.

But what if the computer crashes while the file is being made, and due
to file system details, when rebooted, you end up with a non-integer?
E.g. "unterminated-string.

Greetings,
Maxime.
M May 27, 2022, 11:17 a.m. UTC | #4
zimoun schreef op vr 27-05-2022 om 12:28 [+0200]:
> > Also, I'd switch the catch and the (match ...) because 'read' and
> > integer? shouldn't raise any 'system-error'.
> 
> All the cases are covered, IMHO.

Too many cases are covered.  E.g., what if the file was a directory?

scheme@(guile-user)> (call-with-input-file "." read)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure fport_read: Is een map

Greetings,
Maxime.
Simon Tournier May 27, 2022, 11:24 a.m. UTC | #5
On Fri, 27 May 2022 at 13:17, Maxime Devos <maximedevos@telenet.be> wrote:

> scheme@(guile-user)> (call-with-input-file "." read)
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> In procedure fport_read: Is een map

Euh, you are overengineering, no?  We are talking about an internal
file used by the Guix cache.  Yes, if the user tweaks this cache, then
bad things can happen.  It is true for almost what lives in
~/.cache/guix.


Cheers,
simon
Simon Tournier May 27, 2022, 11:39 a.m. UTC | #6
On Fri, 27 May 2022 at 13:12, Maxime Devos <maximedevos@telenet.be> wrote:

> But what if the computer crashes while the file is being made, and due
> to file system details, when rebooted, you end up with a non-integer?
> E.g. "unterminated-string.

If the value returned by 'read' is a non-integer, then it is set to 0
by the 'match'.

--8<---------------cut here---------------start------------->8---
$ echo -n -e \\x00\\x01\\x02\\x03 > ~/.cache/guix/inferiors/last-expiry-cleanup
$ cat ~/.cache/guix/inferiors/last-expiry-cleanup
\0 [env]
$ ./pre-inst-env guix time-machine --commit=9d795fb -- help
1653651128[env]
--8<---------------cut here---------------end--------------->8---

The only case is when 'read' fails.  Personally, I do not see how it
would be possible.  If you have a concrete example, then we can
examine.


Cheers,
simon
M May 27, 2022, 11:40 a.m. UTC | #7
zimoun schreef op vr 27-05-2022 om 13:24 [+0200]:
> On Fri, 27 May 2022 at 13:17, Maxime Devos <maximedevos@telenet.be> wrote:
> 
> > scheme@(guile-user)> (call-with-input-file "." read)
> > ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> > In procedure fport_read: Is een map
> 
> Euh, you are overengineering, no?  We are talking about an internal
> file used by the Guix cache.  Yes, if the user tweaks this cache, then
> bad things can happen.  It is true for almost what lives in
> ~/.cache/guix.

Probably yes.  Maybe it makes more sense when applied to get-string-all
+ string->number in a limited form:

   (or (string->number
         (catch 'system-error
           (lambda () (call-with-input-file [...] get-string-all))
           (lambda arglist
             (if (= ENOENT (system-error-errno arglist))
                 "0" ; file does not exist
                 (apply throw arglist)))))
       0)

Though even then there remain potential problems, try

scheme@(guile-user)> (string->number "#e1e1000")
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure string->number: Value out of range: 1000

(seems unlikely to encounter such corruption in practice though).

Greetings,
MAxime.
M May 27, 2022, 11:49 a.m. UTC | #8
zimoun schreef op vr 27-05-2022 om 13:39 [+0200]:
> The only case is when 'read' fails.  Personally, I do not see how it
> would be possible.  If you have a concrete example, then we can
> examine.

I don't have a 100%-concrete example, but wasn't there some file system
crash mode, where the contents of a new file has not yet been written
to disk yet the length of the file is > 0, so effectively the file
points to an arbitrary range on the disk?  E.g., say Guix told the FS
write 1234 to last-expiry-cleanup.  Then the FS created last-expiry-
cleanup, choose a range of 4 bytes to save it as, then crashes before
writing the contents.  After restarting, the file contains the _old_ 4
bytes.

These old 4 bytes could be the ASCII representation of

  "foo

.  Then, when 'read' is run (after rebooting), it sees an incomplete
string "foo, so it fails.

Greetings,
Maxime.
Simon Tournier May 27, 2022, 12:40 p.m. UTC | #9
Hi Maxime,

On Fri, 27 May 2022 at 13:49, Maxime Devos <maximedevos@telenet.be> wrote:

> These old 4 bytes could be the ASCII representation of
>
>   "foo
>
> .  Then, when 'read' is run (after rebooting), it sees an incomplete
> string "foo, so it fails.

The question is how would 'read' fail or what would 'read' return?
For instance, the patch works for these cases:

 - empty file
 - non-integer

Now, if you are able to generate an incomplete file (from an integer
or whatever) against the patch fails, then we can examine.  However, I
miss what would be the difference between this incomplete file and,
let say, this case:

     echo -n -e \\x12 > ~/.cache/guix/inferiors/last-expiry-cleanup

handled by the patch.


Cheers,
simon
M May 27, 2022, 1:04 p.m. UTC | #10
zimoun schreef op vr 27-05-2022 om 14:40 [+0200]:
> > These old 4 bytes could be the ASCII representation of
> > 
> >    "foo
> > 
> > .  Then, when 'read' is run (after rebooting), it sees an
> > incomplete
> > string "foo, so it fails.
> 
> The question is how would 'read' fail or what would 'read' return?
> For instance, the patch works for these cases:
> 
>  - empty file
>  - non-integer
> 
> Now, if you are able to generate an incomplete file (from an integer
> or whatever) against the patch fails, then we can examine.  However,
> I
> miss what would be the difference between this incomplete file and,
> let say, this case:
> 
>      echo -n -e \\x12 > ~/.cache/guix/inferiors/last-expiry-cleanup
> 
> handled by the patch.

The incomplete file is:

   "foo

as mentioned previously.  Here's how it fails:

scheme@(guile-user)> (call-with-input-file "a" read)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure scm_lreadr: a:2:1: end of file in string constant

The difference is that ^R is interpreted as a symbol, whereas "foo
cannot be interpreted as anything at all by 'read'.

Greetings,
Maxime.
"foo
Simon Tournier May 27, 2022, 1:23 p.m. UTC | #11
Hi Maxime,

On Fri, 27 May 2022 at 15:04, Maxime Devos <maximedevos@telenet.be> wrote:

> scheme@(guile-user)> (call-with-input-file "a" read)
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> In procedure scm_lreadr: a:2:1: end of file in string constant

This is an ad-hoc example and not a real test case.

--8<---------------cut here---------------start------------->8---
$ echo "foo" > ~/.cache/guix/inferiors/last-expiry-cleanup
$ cat ~/.cache/guix/inferiors/last-expiry-cleanup
foo
$ ./pre-inst-env guix time-machine --commit=9d795fb -- help
Usage: guix OPTION | COMMAND ARGS...
Run COMMAND with ARGS, if given.
--8<---------------cut here---------------end--------------->8---

As previously mentioned, the patch fixes:

 - empty file
 - non-integer

I am not able to imagine an incomplete file worse than \\x00.


> The difference is that ^R is interpreted as a symbol, whereas "foo
> cannot be interpreted as anything at all by 'read'.

I do not understand what you mean and I think you are overengineering.

If you are able to produce a corrupted file which breaks "guix
time-machine", then we can examine.  Else let move on. :-)


Cheers,
simon
Simon Tournier May 27, 2022, 1:30 p.m. UTC | #12
On Fri, 27 May 2022 at 15:23, zimoun <zimon.toutoune@gmail.com> wrote:
> On Fri, 27 May 2022 at 15:04, Maxime Devos <maximedevos@telenet.be> wrote:
>
> > scheme@(guile-user)> (call-with-input-file "a" read)
> > ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> > In procedure scm_lreadr: a:2:1: end of file in string constant
>
> This is an ad-hoc example and not a real test case.

[...]

> I am not able to imagine an incomplete file worse than \\x00.

Just to be sure, I mean: an incomplete integer.

For sure, any incomplete (unbalanced) sexp is breaking 'read', as the
example "foo or (1 or whatever else; as you are correctly pointing.
But since the cache 'write' an integer, it means it would be an
incomplete integer.


Cheers,
simon
M May 27, 2022, 2:02 p.m. UTC | #13
> > The difference is that ^R is interpreted as a symbol, whereas "foo
> > cannot be interpreted as anything at all by 'read'.
> I do not understand what you mean

^R is interpreted as a symbol:
 (symbol? (call-with-input-string "\x12" read)).
"foo" is interpreted as a string:
 (string? (call-with-input-string "\"foo\"" read))
"foo without a terminating string cannot be interpreted at all:
 (call-with-input-string "\"foo" read)

> and I think you are overengineering.

It's not any more overengineering than catching not-an-integer IMO.

AFAICT, this does not find the definition of overengineering I found
on Wikipedia.

Also, I do not understand the resistance -- I have a simple proposal
for generalising your patch to more failure modes, with a demonstration
and test case (see the file "a") on when it is necessary and a
proposed implementation.

zimoun schreef op vr 27-05-2022 om 15:23 [+0200]:
If you are able to produce a corrupted file which breaks "guix
time-machine", then we can examine.  Else let move on. :-)

I previously produced the corrupted file, see the file "a".

I am not willing to deliberately corrupt my file system for this,
especially when I can just give a synthetic example of corrupted
file (see the file "a") and especially since making a synthetic
example is much simpler and faster.

Greetings,
Maxime.
Simon Tournier May 27, 2022, 4:19 p.m. UTC | #14
Hi,

On Fri, 27 May 2022 at 16:02, Maxime Devos <maximedevos@telenet.be> wrote:

> Also, I do not understand the resistance -- I have a simple proposal
> for generalising your patch to more failure modes, with a demonstration
> and test case (see the file "a") on when it is necessary and a
> proposed implementation.

I have sent a v2 using your proposal (which appears to me overcomplicated).

It is not resistance but pragmatic: the only case of interest is the
empty file, which happens -- all the others, I am still waiting at
least one bug report about them i.e., a user runs "guix time-machine"
and suddenly the file last-expiry-cleanup is corrupted and "guix
time-machine" unusable.  Pragmatic because, for instance, from 2 to "
or from 8 to ( it is one bit-flip and thus 'read' would be easily
broken.  I miss why such lengthy discussion about these theoretical
failures of last-expiry-cleanup when it is also true each time 'read'
is used, see least-authority or ui.scm etc. But I have never read a
word.  Anyway.


Cheers,
simon
M May 27, 2022, 5:23 p.m. UTC | #15
zimoun schreef op vr 27-05-2022 om 18:19 [+0200]:
> I miss why such lengthy discussion about these theoretical
> failures of last-expiry-cleanup when it is also true each time 'read'
> is used, see least-authority or ui.scm etc.

(guix ui) cannot do anything about corruption except report the read
failure, whereas (guix cache) has a very strict file format so it is
feasible to detect whether it's corruption or just the user making a
typo (because those files aren't directly written by a user) and
additionally it can very easily handle the corruption.

For (guix authority), there is already a corruption detection mechanism
("guix gc --verify=contents") -- there even already is a repair
mechanism: "guix gc --verify=contents,repair".

> It is not resistance but pragmatic: the only case of interest is
> the empty file, which happens -- all the others, I am still waiting
> at least one bug report about them i.e., a user runs "guix time-
> machine" and suddenly the file last-expiry-cleanup is corrupted and
> "guix time-machine" unusable.

* The general issue of file system corruption in Guix is already known
  (the Guix daemon never calls fsync or sync except on the SQLite
  database), though I don't know if a formal bug report exists about
  that.  There have been many bug reports on individual cases though.
* This bug report already exists: <http://issues.guix.gnu.org/55638>.
  (You say the file system is not corrupted, but how would you know?
  Even if not, the symptoms are almost identical.)
* I do not see the point of waiting for any known suffering users
  reporting the bug before fixing the bug.  Seems negligent to me
  if the fix is easy and known, and not very pragmatic for those
  future (or maybe current and shy) users.  Also has a risk of rebase
  conflicts, which does not seem pragmatic to me.

Greetings,
Maxime
diff mbox series

Patch

diff --git a/guix/cache.scm b/guix/cache.scm
index 51009809bd..4a74c42afe 100644
--- a/guix/cache.scm
+++ b/guix/cache.scm
@@ -1,5 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2022 Simon Tournier <zimon.toutoune@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -93,7 +94,9 @@  (define expiry-file
   (define last-expiry-date
     (catch 'system-error
       (lambda ()
-        (call-with-input-file expiry-file read))
+        (match (call-with-input-file expiry-file read)
+          ((? integer? date) date)
+          (_ 0)))
       (const 0)))
 
   (when (obsolete? last-expiry-date now cleanup-period)