Message ID | 20201219220630.24605-1-ludo@gnu.org |
---|---|
State | Accepted |
Headers | show |
Series | [bug#45327] git: Periodically delete least-recently-used cached checkouts. | expand |
Context | Check | Description |
---|---|---|
cbaines/submitting builds | success | |
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
Ludovic Courtès <ludo@gnu.org> skribis: > Hi! > > I noticed that my ~/.cache/guix/checkouts directory had accumulated > a lot of cruft from channels, playing with ‘--with-branch’ and such, > and that it would be nice to clean it up once in a while. > > This is what this patch does. It uses the (guix cache) default > strategy, which consists in deleting least-recently-used items by > looking at their atime. > > Thoughts? How does it behave when the cache is on a file system mounted with the 'noatime' option?
Hi, Guillaume Le Vaillant <glv@posteo.net> skribis: > Ludovic Courtès <ludo@gnu.org> skribis: > >> Hi! >> >> I noticed that my ~/.cache/guix/checkouts directory had accumulated >> a lot of cruft from channels, playing with ‘--with-branch’ and such, >> and that it would be nice to clean it up once in a while. >> >> This is what this patch does. It uses the (guix cache) default >> strategy, which consists in deleting least-recently-used items by >> looking at their atime. >> >> Thoughts? > > How does it behave when the cache is on a file system mounted with the > 'noatime' option? I guess the worst that could happen is that checkouts are removed too frequently (because the atime is not updated), meaning that users find themselves making full clones more often than we’d like. Perhaps we could use the mtime instead, since when checkouts are updated, the mtime is presumably updated too. Thoughts? Ludo’.
Ludovic Courtès <ludo@gnu.org> skribis: > Hi, > > Guillaume Le Vaillant <glv@posteo.net> skribis: > >> Ludovic Courtès <ludo@gnu.org> skribis: >> >>> Hi! >>> >>> I noticed that my ~/.cache/guix/checkouts directory had accumulated >>> a lot of cruft from channels, playing with ‘--with-branch’ and such, >>> and that it would be nice to clean it up once in a while. >>> >>> This is what this patch does. It uses the (guix cache) default >>> strategy, which consists in deleting least-recently-used items by >>> looking at their atime. >>> >>> Thoughts? >> >> How does it behave when the cache is on a file system mounted with the >> 'noatime' option? > > I guess the worst that could happen is that checkouts are removed too > frequently (because the atime is not updated), meaning that users find > themselves making full clones more often than we’d like. > > Perhaps we could use the mtime instead, since when checkouts are > updated, the mtime is presumably updated too. > > Thoughts? > > Ludo’. I guess either using mtime or making Guix update the atime when using a cached checkout would work.
Hi Ludo, On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote: > This ensures ~/.cache/guix/checkouts is periodically cleaned up. > > * guix/git.scm (cached-checkout-expiration) > (%checkout-cache-cleanup-period): New variables. > (delete-checkout): New procedure. > (update-cached-checkout)[cache-entries]: New procedure. > Add call to 'maybe-remove-expired-cache-entries'. > --- > guix/git.scm | 38 +++++++++++++++++++++++++++++++++++++- > 1 file changed, 37 insertions(+), 1 deletion(-) > > Hi! > > I noticed that my ~/.cache/guix/checkouts directory had accumulated > a lot of cruft from channels, playing with ‘--with-branch’ and such, > and that it would be nice to clean it up once in a while. > > This is what this patch does. It uses the (guix cache) default > strategy, which consists in deleting least-recently-used items by > looking at their atime. This is done at pull time, right? Personally, I would prefer at gc time, and even maybe with an option to “guix gc”. Because, IIUC, every 5 days, the entries older than 1 month will be deleted. As an extensive user of the time-machine, it means that I will do this extra work more than often, slowing down the already slow “time-machine”. Cheers, simon
Hi, zimoun <zimon.toutoune@gmail.com> skribis: > On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote: [...] >> I noticed that my ~/.cache/guix/checkouts directory had accumulated >> a lot of cruft from channels, playing with ‘--with-branch’ and such, >> and that it would be nice to clean it up once in a while. >> >> This is what this patch does. It uses the (guix cache) default >> strategy, which consists in deleting least-recently-used items by >> looking at their atime. > > This is done at pull time, right? This is happens when ‘update-cached-checkout’ is called: when updating a channel, using ‘--with-git-url’, etc. > Personally, I would prefer at gc time, and even maybe with an option > to “guix gc”. Hmm yes (currently the two things are unrelated.) I have a slight preference for something automated that you don’t have to worry about. > Because, IIUC, every 5 days, the entries older than 1 month will be > deleted. Correct. > As an extensive user of the time-machine, it means that I will do this > extra work more than often, slowing down the already slow > “time-machine”. Let’s say there’s a couple of channels you regularly pull from, like a few times per months. Their checkout would never be deleted so you wouldn’t notice any change in terms of performance. The difference you’d see is if you pull from a few channels, but less than once per month. In that case, the ‘guix’ channel would remain in cache (because cache cleanup happens after the cached checkout has been updated), but the other channels would be deleted just to be cloned again soon after that. Does that make sense? Thanks, Ludo’.
Hi, On Tue, 22 Dec 2020 at 14:33, Ludovic Courtès <ludo@gnu.org> wrote: > The difference you’d see is if you pull from a few channels, but less > than once per month. In that case, the ‘guix’ channel would remain in > cache (because cache cleanup happens after the cached checkout has been > updated), but the other channels would be deleted just to be cloned > again soon after that. > > Does that make sense? All make sense. Thanks for explaining. As said, my preference of such thing is “guix gc” and not “guix pull”. But since “guix pull --delete-generations” is already here… it pulls by deleting. :-) Anyway, it is “guix pull”. However, I still do not like the “automatic” part. Personally, I would prefer something like: guix pull --delete-generations=2m deleting the cache older than 2 months, in addition to the old profiles. Because, with your patch, if I want to change the expiration or the period, it is not convenient: via channels.scm, maybe. Sometime, I am one or two months off (vacation). And I am sure to forget to tweak the channels.scm when I am back. Well, I have 3-4 channels.scm files; for example I am not pulling guix-past each time I pull. Therefore, I should have these 3-4 channels.scm duplicated with the expiration+period tweaked for when I am back from long breaks. Well, if the automatic is the default, a way to turn off could be nice, at the CLI level or even globally. IMHO. Cheers, simon
Hi, zimoun <zimon.toutoune@gmail.com> skribis: > However, I still do not like the “automatic” part. Personally, I would > prefer something like: > > guix pull --delete-generations=2m > > deleting the cache older than 2 months, in addition to the old profiles. > Because, with your patch, if I want to change the expiration or the > period, it is not convenient: via channels.scm, maybe. There’s a difference though: generations are things you manage by yourself as a user, whereas this cache is really just an internal cache. Of course it becomes a user-visible feature if it gets in the way, and I agree we should avoid that. But it also gets in the way by not being vacuumed, hence this patch. > Sometime, I am one or two months off (vacation). And I am sure to > forget to tweak the channels.scm when I am back. Well, I have 3-4 > channels.scm files; for example I am not pulling guix-past each time I > pull. Therefore, I should have these 3-4 channels.scm duplicated with > the expiration+period tweaked for when I am back from long breaks. > > Well, if the automatic is the default, a way to turn off could be nice, > at the CLI level or even globally. IMHO. Yeah, maybe we can have a more conservative default together with an environment variable to tweak it. Thanks, Ludo’.
diff --git a/guix/git.scm b/guix/git.scm index ca77b9f54b..5df11db38e 100644 --- a/guix/git.scm +++ b/guix/git.scm @@ -23,8 +23,10 @@ #:use-module (git submodule) #:use-module (guix i18n) #:use-module (guix base32) + #:use-module (guix cache) #:use-module (gcrypt hash) - #:use-module ((guix build utils) #:select (mkdir-p)) + #:use-module ((guix build utils) + #:select (mkdir-p delete-file-recursively)) #:use-module (guix store) #:use-module (guix utils) #:use-module (guix records) @@ -35,6 +37,7 @@ #:use-module (rnrs bytevectors) #:use-module (ice-9 format) #:use-module (ice-9 match) + #:use-module (ice-9 ftw) #:use-module (srfi srfi-1) #:use-module (srfi srfi-11) #:use-module (srfi srfi-34) @@ -318,6 +321,20 @@ definitely available in REPOSITORY, false otherwise." (_ #f))) +(define cached-checkout-expiration + ;; Return the expiration time of a cached checkout. + (file-expiration-time (* 30 24 3600))) + +(define %checkout-cache-cleanup-period + ;; Period for the removal of expired cached checkouts. + (* 5 24 3600)) + +(define (delete-checkout directory) + "Delete DIRECTORY recursively, in an atomic fashion." + (let ((trashed (string-append directory ".trashed"))) + (rename-file directory trashed) + (delete-file-recursively trashed))) + (define* (update-cached-checkout url #:key (ref '(branch . "master")) @@ -341,6 +358,14 @@ When RECURSIVE? is true, check out submodules as well, if any. When CHECK-OUT? is true, reset the cached working tree to REF; otherwise leave it unchanged." + (define (cache-entries directory) + (filter-map (match-lambda + ((or "." "..") + #f) + (file + (string-append directory "/" file))) + (or (scandir directory) '()))) + (define canonical-ref ;; We used to require callers to specify "origin/" for each branch, which ;; made little sense since the cache should be transparent to them. So @@ -387,6 +412,17 @@ it unchanged." ;; REPOSITORY as soon as possible. (repository-close! repository) + ;; When CACHE-DIRECTORY is a sub-directory of the default cache + ;; directory, remove expired checkouts that are next to it. + (let ((parent (dirname cache-directory))) + (when (string=? parent (%repository-cache-directory)) + (maybe-remove-expired-cache-entries parent cache-entries + #:entry-expiration + cached-checkout-expiration + #:delete-entry delete-checkout + #:cleanup-period + %checkout-cache-cleanup-period))) + (values cache-directory (oid->string oid) relation))))) (define* (latest-repository-commit store url