diff mbox series

[bug#45327] git: Periodically delete least-recently-used cached checkouts.

Message ID 20201219220630.24605-1-ludo@gnu.org
State Accepted
Headers show
Series [bug#45327] git: Periodically delete least-recently-used cached checkouts. | expand

Checks

Context Check Description
cbaines/submitting builds success
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Ludovic Courtès Dec. 19, 2020, 10:06 p.m. UTC
This ensures ~/.cache/guix/checkouts is periodically cleaned up.

* guix/git.scm (cached-checkout-expiration)
(%checkout-cache-cleanup-period): New variables.
(delete-checkout): New procedure.
(update-cached-checkout)[cache-entries]: New procedure.
Add call to 'maybe-remove-expired-cache-entries'.
---
 guix/git.scm | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

Hi!

I noticed that my ~/.cache/guix/checkouts directory had accumulated
a lot of cruft from channels, playing with ‘--with-branch’ and such,
and that it would be nice to clean it up once in a while.

This is what this patch does.  It uses the (guix cache) default
strategy, which consists in deleting least-recently-used items by
looking at their atime.

Thoughts?

Ludo’.

Comments

Guillaume Le Vaillant Dec. 20, 2020, 10:46 a.m. UTC | #1
Ludovic Courtès <ludo@gnu.org> skribis:

> Hi!
>
> I noticed that my ~/.cache/guix/checkouts directory had accumulated
> a lot of cruft from channels, playing with ‘--with-branch’ and such,
> and that it would be nice to clean it up once in a while.
>
> This is what this patch does.  It uses the (guix cache) default
> strategy, which consists in deleting least-recently-used items by
> looking at their atime.
>
> Thoughts?

How does it behave when the cache is on a file system mounted with the
'noatime' option?
Ludovic Courtès Dec. 20, 2020, 1:47 p.m. UTC | #2
Hi,

Guillaume Le Vaillant <glv@posteo.net> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Hi!
>>
>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>> and that it would be nice to clean it up once in a while.
>>
>> This is what this patch does.  It uses the (guix cache) default
>> strategy, which consists in deleting least-recently-used items by
>> looking at their atime.
>>
>> Thoughts?
>
> How does it behave when the cache is on a file system mounted with the
> 'noatime' option?

I guess the worst that could happen is that checkouts are removed too
frequently (because the atime is not updated), meaning that users find
themselves making full clones more often than we’d like.

Perhaps we could use the mtime instead, since when checkouts are
updated, the mtime is presumably updated too.

Thoughts?

Ludo’.
Guillaume Le Vaillant Dec. 20, 2020, 2:16 p.m. UTC | #3
Ludovic Courtès <ludo@gnu.org> skribis:

> Hi,
>
> Guillaume Le Vaillant <glv@posteo.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> Hi!
>>>
>>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>>> and that it would be nice to clean it up once in a while.
>>>
>>> This is what this patch does.  It uses the (guix cache) default
>>> strategy, which consists in deleting least-recently-used items by
>>> looking at their atime.
>>>
>>> Thoughts?
>>
>> How does it behave when the cache is on a file system mounted with the
>> 'noatime' option?
>
> I guess the worst that could happen is that checkouts are removed too
> frequently (because the atime is not updated), meaning that users find
> themselves making full clones more often than we’d like.
>
> Perhaps we could use the mtime instead, since when checkouts are
> updated, the mtime is presumably updated too.
>
> Thoughts?
>
> Ludo’.

I guess either using mtime or making Guix update the atime when using
a cached checkout would work.
Simon Tournier Dec. 21, 2020, 10:26 a.m. UTC | #4
Hi Ludo,

On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote:
> This ensures ~/.cache/guix/checkouts is periodically cleaned up.
>
> * guix/git.scm (cached-checkout-expiration)
> (%checkout-cache-cleanup-period): New variables.
> (delete-checkout): New procedure.
> (update-cached-checkout)[cache-entries]: New procedure.
> Add call to 'maybe-remove-expired-cache-entries'.
> ---
>  guix/git.scm | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
>
> Hi!
>
> I noticed that my ~/.cache/guix/checkouts directory had accumulated
> a lot of cruft from channels, playing with ‘--with-branch’ and such,
> and that it would be nice to clean it up once in a while.
>
> This is what this patch does.  It uses the (guix cache) default
> strategy, which consists in deleting least-recently-used items by
> looking at their atime.

This is done at pull time, right?  Personally, I would prefer at gc
time, and even maybe with an option to “guix gc”.

Because, IIUC, every 5 days, the entries older than 1 month will be
deleted.  As an extensive user of the time-machine, it means that I will
do this extra work more than often, slowing down the already slow
“time-machine”.

Cheers,
simon
Ludovic Courtès Dec. 22, 2020, 1:33 p.m. UTC | #5
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>> and that it would be nice to clean it up once in a while.
>>
>> This is what this patch does.  It uses the (guix cache) default
>> strategy, which consists in deleting least-recently-used items by
>> looking at their atime.
>
> This is done at pull time, right?

This is happens when ‘update-cached-checkout’ is called: when updating a
channel, using ‘--with-git-url’, etc.

> Personally, I would prefer at gc time, and even maybe with an option
> to “guix gc”.

Hmm yes (currently the two things are unrelated.)  I have a slight
preference for something automated that you don’t have to worry about.

> Because, IIUC, every 5 days, the entries older than 1 month will be
> deleted.

Correct.

> As an extensive user of the time-machine, it means that I will do this
> extra work more than often, slowing down the already slow
> “time-machine”.

Let’s say there’s a couple of channels you regularly pull from, like a
few times per months.  Their checkout would never be deleted so you
wouldn’t notice any change in terms of performance.

The difference you’d see is if you pull from a few channels, but less
than once per month.  In that case, the ‘guix’ channel would remain in
cache (because cache cleanup happens after the cached checkout has been
updated), but the other channels would be deleted just to be cloned
again soon after that.

Does that make sense?

Thanks,
Ludo’.
Simon Tournier Dec. 22, 2020, 3:19 p.m. UTC | #6
Hi,

On Tue, 22 Dec 2020 at 14:33, Ludovic Courtès <ludo@gnu.org> wrote:

> The difference you’d see is if you pull from a few channels, but less
> than once per month.  In that case, the ‘guix’ channel would remain in
> cache (because cache cleanup happens after the cached checkout has been
> updated), but the other channels would be deleted just to be cloned
> again soon after that.
>
> Does that make sense?

All make sense.  Thanks for explaining.

As said, my preference of such thing is “guix gc” and not “guix pull”.
But since “guix pull --delete-generations” is already here… it pulls by
deleting. :-) Anyway, it is “guix pull”.

However, I still do not like the “automatic” part.  Personally, I would
prefer something like:

  guix pull --delete-generations=2m

deleting the cache older than 2 months, in addition to the old profiles.
Because, with your patch, if I want to change the expiration or the
period, it is not convenient: via channels.scm, maybe.

Sometime, I am one or two months off (vacation).  And I am sure to
forget to tweak the channels.scm when I am back.  Well, I have 3-4
channels.scm files; for example I am not pulling guix-past each time I
pull.  Therefore, I should have these 3-4 channels.scm duplicated with
the expiration+period tweaked for when I am back from long breaks.

Well, if the automatic is the default, a way to turn off could be nice,
at the CLI level or even globally.  IMHO.


Cheers,
simon
Ludovic Courtès Jan. 7, 2021, 9:39 a.m. UTC | #7
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> However, I still do not like the “automatic” part.  Personally, I would
> prefer something like:
>
>   guix pull --delete-generations=2m
>
> deleting the cache older than 2 months, in addition to the old profiles.
> Because, with your patch, if I want to change the expiration or the
> period, it is not convenient: via channels.scm, maybe.

There’s a difference though: generations are things you manage by
yourself as a user, whereas this cache is really just an internal cache.

Of course it becomes a user-visible feature if it gets in the way, and I
agree we should avoid that.  But it also gets in the way by not being
vacuumed, hence this patch.

> Sometime, I am one or two months off (vacation).  And I am sure to
> forget to tweak the channels.scm when I am back.  Well, I have 3-4
> channels.scm files; for example I am not pulling guix-past each time I
> pull.  Therefore, I should have these 3-4 channels.scm duplicated with
> the expiration+period tweaked for when I am back from long breaks.
>
> Well, if the automatic is the default, a way to turn off could be nice,
> at the CLI level or even globally.  IMHO.

Yeah, maybe we can have a more conservative default together with an
environment variable to tweak it.

Thanks,
Ludo’.
diff mbox series

Patch

diff --git a/guix/git.scm b/guix/git.scm
index ca77b9f54b..5df11db38e 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -23,8 +23,10 @@ 
   #:use-module (git submodule)
   #:use-module (guix i18n)
   #:use-module (guix base32)
+  #:use-module (guix cache)
   #:use-module (gcrypt hash)
-  #:use-module ((guix build utils) #:select (mkdir-p))
+  #:use-module ((guix build utils)
+                #:select (mkdir-p delete-file-recursively))
   #:use-module (guix store)
   #:use-module (guix utils)
   #:use-module (guix records)
@@ -35,6 +37,7 @@ 
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
+  #:use-module (ice-9 ftw)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-34)
@@ -318,6 +321,20 @@  definitely available in REPOSITORY, false otherwise."
     (_
      #f)))
 
+(define cached-checkout-expiration
+  ;; Return the expiration time of a cached checkout.
+  (file-expiration-time (* 30 24 3600)))
+
+(define %checkout-cache-cleanup-period
+  ;; Period for the removal of expired cached checkouts.
+  (* 5 24 3600))
+
+(define (delete-checkout directory)
+  "Delete DIRECTORY recursively, in an atomic fashion."
+  (let ((trashed (string-append directory ".trashed")))
+    (rename-file directory trashed)
+    (delete-file-recursively trashed)))
+
 (define* (update-cached-checkout url
                                  #:key
                                  (ref '(branch . "master"))
@@ -341,6 +358,14 @@  When RECURSIVE? is true, check out submodules as well, if any.
 
 When CHECK-OUT? is true, reset the cached working tree to REF; otherwise leave
 it unchanged."
+  (define (cache-entries directory)
+    (filter-map (match-lambda
+                  ((or "." "..")
+                   #f)
+                  (file
+                   (string-append directory "/" file)))
+                (or (scandir directory) '())))
+
   (define canonical-ref
     ;; We used to require callers to specify "origin/" for each branch, which
     ;; made little sense since the cache should be transparent to them.  So
@@ -387,6 +412,17 @@  it unchanged."
        ;; REPOSITORY as soon as possible.
        (repository-close! repository)
 
+       ;; When CACHE-DIRECTORY is a sub-directory of the default cache
+       ;; directory, remove expired checkouts that are next to it.
+       (let ((parent (dirname cache-directory)))
+         (when (string=? parent (%repository-cache-directory))
+           (maybe-remove-expired-cache-entries parent cache-entries
+                                               #:entry-expiration
+                                               cached-checkout-expiration
+                                               #:delete-entry delete-checkout
+                                               #:cleanup-period
+                                               %checkout-cache-cleanup-period)))
+
        (values cache-directory (oid->string oid) relation)))))
 
 (define* (latest-repository-commit store url