Message ID | 87eeq3vbat.fsf@gnu.org |
---|---|
State | Accepted |
Headers | show |
Series | [bug#42023] database: register-items: reduce transaction scope. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
Hi reepca, Did you have time to look into this (see below)? I still see a lot of contention on /var/guix/db/db.sqlite-shm on berlin and I feel like these changes would be welcome. :-) Thanks, Ludo’. Ludovic Courtès <ludo@gnu.org> skribis: > Hi, > > Caleb Ristvedt <caleb.ristvedt@cune.org> skribis: > >> Ludovic Courtès <ludo@gnu.org> writes: > > [...] > >>> Two questions: >>> >>> 1. Can another process come and fiddle with TO-REGISTER while we’re >>> still in ‘reset-timestamps’? Or can GC happen while we’re in >>> ‘reset-timestamps’ and delete TO-REGISTER and remove it from the >>> database? >> >>> >>> I think none of these scenarios can happen, as long as we’ve taken the >>> .lock file for TO-REGISTER before, like ‘finalize-store-file’ does. >> >> The lock file has no bearing on liveness of the path it locks, though >> the liveness of the path it locks *does* count as liveness for the lock >> file and for the .chroot file; see tryToDelete() in nix/libstore/gc.cc. >> >> (Well, semi-liveness; .lock and .chroot files won't show up in a list of >> live paths, but they will still be protected from deletion if their >> associated store item is a temp root) >> >> So general "fiddling" can't happen, but GC can. The responsibility for >> both locking and registering as temporary roots falls on the caller, >> currently, as I believe it should. We may want to document this >> responsibility in the docstring for register-items, though. > > Yes, it would be good to add a sentence to document it. > >> So currently finalize-store-file is safe from clobbering by another >> process attempting to finalize to the same path, but not safe from >> garbage collection between when the temporary store file is renamed and >> when it is registered. It needs an add-temp-root prior to renaming. > > Ah, so we’d need to do that before applying the patch that reduces the > scope of the transaction, right? > >>> 2. After the transaction, TO-REGISTER is considered valid. But are >>> the effects of the on-going deduplication observable, due to >>> non-atomicity of some operation? >> >> Subdirectories of store items need to be made writable prior to renaming >> the temp link, so there will necessarily be a window of time during >> which various subdirectories will appear writable. I don't think this >> should cause problems; we already assume that the .lock file is held, so >> we should be the only process attempting to deduplicate it. On a related >> note, we should probably use dynamic-wind for setting and restoring the >> permissions in replace-with-link. > > Yes. Here’s a patch for ‘dynamic-wind’: > > diff --git a/.dir-locals.el b/.dir-locals.el > index 92979fc5ed..155255a770 100644 > --- a/.dir-locals.el > +++ b/.dir-locals.el > @@ -37,6 +37,7 @@ > (eval . (put 'with-file-lock 'scheme-indent-function 1)) > (eval . (put 'with-file-lock/no-wait 'scheme-indent-function 1)) > (eval . (put 'with-profile-lock 'scheme-indent-function 1)) > + (eval . (put 'with-writable-file 'scheme-indent-function 1)) > > (eval . (put 'package 'scheme-indent-function 0)) > (eval . (put 'origin 'scheme-indent-function 0)) > diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm > index 6784ee0b92..af52c03370 100644 > --- a/guix/store/deduplication.scm > +++ b/guix/store/deduplication.scm > @@ -94,6 +94,20 @@ LINK-PREFIX." > (try (tempname-in link-prefix)) > (apply throw args)))))) > > +(define (call-with-writable-file file thunk) > + (let ((stat (lstat file))) > + (dynamic-wind > + (lambda () > + (make-file-writable file)) > + thunk > + (lambda () > + (set-file-time file stat) > + (chmod file (stat:mode stat)))))) > + > +(define-syntax-rule (with-writable-file file exp ...) > + "Make FILE writable for the dynamic extent of EXP..." > + (call-with-writable-file file (lambda () exp ...))) > + > ;; There are 3 main kinds of errors we can get from hardlinking: "Too many > ;; things link to this" (EMLINK), "this link already exists" (EEXIST), and > ;; "can't fit more stuff in this directory" (ENOSPC). > @@ -120,20 +134,14 @@ Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on the same file system." > ;; If we couldn't create TEMP-LINK, that's OK: just don't do the > ;; replacement, which means TO-REPLACE won't be deduplicated. > (when temp-link > - (let* ((parent (dirname to-replace)) > - (stat (stat parent))) > - (make-file-writable parent) > + (with-writable-file (dirname to-replace) > (catch 'system-error > (lambda () > (rename-file temp-link to-replace)) > (lambda args > (delete-file temp-link) > (unless (= EMLINK (system-error-errno args)) > - (apply throw args)))) > - > - ;; Restore PARENT's mtime and permissions. > - (set-file-time parent stat) > - (chmod parent (stat:mode stat))))) > + (apply throw args))))))) > > (define* (deduplicate path hash #:key (store %store-directory)) > "Check if a store item with sha256 hash HASH already exists. If so, > > >> Also, replace-with-link doesn't check whether the "containing directory" >> is the store like optimisePath_() does, so in theory if another process >> tried to make a permanent change to the store's permissions it could be >> clobbered when replace-with-link restores the permissions. I don't know >> of any instance where this could happen currently, but it's something to >> keep in mind. > > I guess we should also avoid changing permissions on /gnu/store, it > would be wiser. > >> And, of course, there's the inherent visible change of deduplication - >> possible modification of inode number, but I don't see how that could >> cause problems with any reasonable programs. > > Yes, that’s fine. > >>> I think the ‘replace-with-link’ dance is atomic, so we should be fine. >>> >>> Thoughts? >> >> replace-with-link is atomic, but it can fail if the "canonical" link in >> .links is deleted by the garbage collector between when its existence is >> checked in 'deduplicate' and when temp link creation in >> replace-with-link happens. Then ENOENT would be returned, and >> 'deduplicate' wouldn't catch that exception, nor would optimisePath_() >> in nix/libstore/optimise-store.cc. >> >> The proper behavior there, in my opinion, would be to retry the >> deduplication. Attached is a patch that makes 'deduplicate' do that. >> >> Also, while I'm perusing optimisePath_(), there's a minor bug in which >> EMLINK generated by renaming the temp link may still be treated as a >> fatal error. This is because errno may be clobbered by unlink() prior to >> being checked (according to the errno man page it can still be modified >> even if the call succeeds). > > Indeed, good catch! > >> From 461064da9e169df3dd939b734bb2860598d113f4 Mon Sep 17 00:00:00 2001 >> From: Caleb Ristvedt <caleb.ristvedt@cune.org> >> Date: Wed, 24 Jun 2020 00:56:50 -0500 >> Subject: [PATCH 1/2] deduplication: retry on ENOENT. >> >> It's possible for the garbage collector to remove the "canonical" link after >> it's been detected as existing by 'deduplicate'. This would cause an ENOENT >> error when replace-with-link attempts to create the temporary link. This >> changes it so that it will properly handle that by retrying. >> >> * guix/store/deduplication.scm (deduplicate): retry on ENOENT. >> --- >> guix/store/deduplication.scm | 64 +++++++++++++++++++++++------------- >> 1 file changed, 41 insertions(+), 23 deletions(-) >> >> diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm >> index 6784ee0b92..b6d94e49c2 100644 >> --- a/guix/store/deduplication.scm >> +++ b/guix/store/deduplication.scm >> @@ -161,26 +161,44 @@ under STORE." >> (scandir* path)) >> (let ((link-file (string-append links-directory "/" >> (bytevector->nix-base32-string hash)))) >> - (if (file-exists? link-file) >> - (replace-with-link link-file path >> - #:swap-directory links-directory) >> - (catch 'system-error >> - (lambda () >> - (link path link-file)) >> - (lambda args >> - (let ((errno (system-error-errno args))) >> - (cond ((= errno EEXIST) >> - ;; Someone else put an entry for PATH in >> - ;; LINKS-DIRECTORY before we could. Let's use it. >> - (replace-with-link path link-file >> - #:swap-directory links-directory)) >> - ((= errno ENOSPC) >> - ;; There's not enough room in the directory index for >> - ;; more entries in .links, but that's fine: we can >> - ;; just stop. >> - #f) >> - ((= errno EMLINK) >> - ;; PATH has reached the maximum number of links, but >> - ;; that's OK: we just can't deduplicate it more. >> - #f) >> - (else (apply throw args))))))))))) >> + (let retry () >> + (if (file-exists? link-file) >> + (catch 'system-error >> + (lambda () >> + (replace-with-link link-file path >> + #:swap-directory links-directory)) >> + (lambda args >> + (if (and (= (system-error-errno args) ENOENT) >> + ;; ENOENT can be produced because: >> + ;; - LINK-FILE has missing directory components >> + ;; - LINKS-DIRECTORY has missing directory >> + ;; components >> + ;; - PATH has missing directory components >> + ;; >> + ;; the last two are errors, the first just >> + ;; requires a retry. >> + (file-exists? (dirname path)) >> + (file-exists? links-directory)) >> + (retry) >> + (apply throw args)))) > > I feel like there are TOCTTOU issues here and redundant ‘stat’ calls, > plus the risk of catching a ‘system-error’ coming from “somewhere else.” > > What about baking this logic directly in ‘replace-with-link’, and > replacing ‘file-exists?’ calls by 'system-error handling? > >> From a14ff0a9dab4d73680befaf9d244d6cce0a73ab3 Mon Sep 17 00:00:00 2001 >> From: Caleb Ristvedt <caleb.ristvedt@cune.org> >> Date: Wed, 24 Jun 2020 01:00:40 -0500 >> Subject: [PATCH 2/2] .dir-locals.el: fix call-with-{retrying-}transaction >> indenting. >> >> * .dir-locals.el (call-with-transaction, call-with-retrying-transaction): >> change scheme-indent-function property from 2 to 1. > > LGTM! > > Thanks for your quick feedback and thorough analyses! > > Ludo’.
diff --git a/.dir-locals.el b/.dir-locals.el index 92979fc5ed..155255a770 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -37,6 +37,7 @@ (eval . (put 'with-file-lock 'scheme-indent-function 1)) (eval . (put 'with-file-lock/no-wait 'scheme-indent-function 1)) (eval . (put 'with-profile-lock 'scheme-indent-function 1)) + (eval . (put 'with-writable-file 'scheme-indent-function 1)) (eval . (put 'package 'scheme-indent-function 0)) (eval . (put 'origin 'scheme-indent-function 0)) diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm index 6784ee0b92..af52c03370 100644 --- a/guix/store/deduplication.scm +++ b/guix/store/deduplication.scm @@ -94,6 +94,20 @@ LINK-PREFIX." (try (tempname-in link-prefix)) (apply throw args)))))) +(define (call-with-writable-file file thunk) + (let ((stat (lstat file))) + (dynamic-wind + (lambda () + (make-file-writable file)) + thunk + (lambda () + (set-file-time file stat) + (chmod file (stat:mode stat)))))) + +(define-syntax-rule (with-writable-file file exp ...) + "Make FILE writable for the dynamic extent of EXP..." + (call-with-writable-file file (lambda () exp ...))) + ;; There are 3 main kinds of errors we can get from hardlinking: "Too many ;; things link to this" (EMLINK), "this link already exists" (EEXIST), and ;; "can't fit more stuff in this directory" (ENOSPC). @@ -120,20 +134,14 @@ Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on the same file system." ;; If we couldn't create TEMP-LINK, that's OK: just don't do the ;; replacement, which means TO-REPLACE won't be deduplicated. (when temp-link - (let* ((parent (dirname to-replace)) - (stat (stat parent))) - (make-file-writable parent) + (with-writable-file (dirname to-replace) (catch 'system-error (lambda () (rename-file temp-link to-replace)) (lambda args (delete-file temp-link) (unless (= EMLINK (system-error-errno args)) - (apply throw args)))) - - ;; Restore PARENT's mtime and permissions. - (set-file-time parent stat) - (chmod parent (stat:mode stat))))) + (apply throw args))))))) (define* (deduplicate path hash #:key (store %store-directory)) "Check if a store item with sha256 hash HASH already exists. If so,