Message ID | 87k041dui7.fsf@gnu.org |
---|---|
State | New |
Headers | show |
Series | [bug#59078] lint: Split the derivation lint checker by system. | expand |
Context | Check | Description |
---|---|---|
cbaines/comparison | success | View comparision |
cbaines/git-branch | success | View Git branch |
cbaines/applying patch | success | |
cbaines/issue | success | View issue |
Ludovic Courtès <ludo@gnu.org> writes: > Christopher Baines <mail@cbaines.net> skribis: > >> Currently, if you attempt to run the derivation checker on all packages, the >> Guile process will run out of memory. I think a contributing factor to this is >> that the checker forces an inefficient order when you want to generate >> derivations for all the supported systems of each package, constantly >> switching system then package. >> >> This problem also impacts the Guix Data Service, since it tries to run the >> derivation checker for all packages. >> >> The changes in this commit to split the derivation lint checker in to several, >> one for each system, means that you can now treat each system separately, >> which should be better for caching purposes. >> >> If it's desirable to keep some notion of checking all supported systems for a >> single package, I think lint checker groups could be added, so that you could >> ask for the "derivation" checker, and this would run all the derivation >> checkers. > > The ‘derivation’ checker was added for this purpose: making sure that a > package’s derivation can be computed for all the supported systems. > Previously it was easy to overlook that kind of breakage. > > I think it’s important to keep a ‘derivation’ checker that does this. What aspect of it do you think is important? I realise just splitting the checker in to several doesn't quite pick up on all the same problems, but an additional checker could be added that flags when packages support systems that don't appear as platforms. I think that would catch everything that the derivation checker currently does. If it's a "we need a singular checker" kind of problem, I was thinking of having some kind of lint checker groups. So you could have a group called "derivation", which runs all of the relevant checkers. > Now, the memory consumption you report is unacceptable and this needs to > be addressed. > > Most (all?) caches are now per-session (associated with > <store-connection>). Since ‘guix lint’ uses a single session, those > caches keep growing because there’s no eviction mechanism in place. > > A hack like the one below should work around that. Could you check how > well it works for you? I tried in the Guix Data Service processing packages in chunks of 1000 plus closing the store connection after each batch, and that led to a heap size of 3090MiB. But this is still higher than 1778MiB heap usage I got just by splitting the derivation linter. Fundamentally, I think it's still going to be more memory/time efficient to move towards processing the derivations by system, rather than by package. Thanks, Chris
Hi! Christopher Baines <mail@cbaines.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: [...] >> The ‘derivation’ checker was added for this purpose: making sure that a >> package’s derivation can be computed for all the supported systems. >> Previously it was easy to overlook that kind of breakage. >> >> I think it’s important to keep a ‘derivation’ checker that does this. > > What aspect of it do you think is important? I meant that it’s important to have a single ‘derivation’ checker that checks derivations for all the supported systems. Packagers should be able to run ‘guix lint -c derivation PKG’ and be confident that it’s fine for all systems. >> Now, the memory consumption you report is unacceptable and this needs to >> be addressed. >> >> Most (all?) caches are now per-session (associated with >> <store-connection>). Since ‘guix lint’ uses a single session, those >> caches keep growing because there’s no eviction mechanism in place. >> >> A hack like the one below should work around that. Could you check how >> well it works for you? > > I tried in the Guix Data Service processing packages in chunks of 1000 > plus closing the store connection after each batch, How was it implemented? Was it after the caches came into <store-connection>? > and that led to a heap size of 3090MiB. But this is still higher than > 1778MiB heap usage I got just by splitting the derivation linter. I didn’t take the time to do it, but it would be nice to see, with the patch I gave, how ‘guix lint -c derivations’ behaves. Thanks, Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Hi! > > Christopher Baines <mail@cbaines.net> skribis: > >> Ludovic Courtès <ludo@gnu.org> writes: > > [...] > >>> The ‘derivation’ checker was added for this purpose: making sure that a >>> package’s derivation can be computed for all the supported systems. >>> Previously it was easy to overlook that kind of breakage. >>> >>> I think it’s important to keep a ‘derivation’ checker that does this. >> >> What aspect of it do you think is important? > > I meant that it’s important to have a single ‘derivation’ checker that > checks derivations for all the supported systems. Packagers should be > able to run ‘guix lint -c derivation PKG’ and be confident that it’s > fine for all systems. I think we can still keep that by adding support for grouping lint checkers. So have a derivation group, instead of a single checker. Maybe that'll change the command slightly to 'guix lint -g derivation PKG', but I think that can be equivalent. >>> Now, the memory consumption you report is unacceptable and this needs to >>> be addressed. >>> >>> Most (all?) caches are now per-session (associated with >>> <store-connection>). Since ‘guix lint’ uses a single session, those >>> caches keep growing because there’s no eviction mechanism in place. >>> >>> A hack like the one below should work around that. Could you check how >>> well it works for you? >> >> I tried in the Guix Data Service processing packages in chunks of 1000 >> plus closing the store connection after each batch, > > How was it implemented? Was it after the caches came into > <store-connection>? I'm not sure what aspect of the implementation is important, but I think it's working correctly. Closing the store connection wasn't very easy. Previously there was a fresh store connection with each call to inferior-eval-with-store, but for this test I close the connections in %store-table and then clear the hash table between the inferior-eval-with-store calls. >> and that led to a heap size of 3090MiB. But this is still higher than >> 1778MiB heap usage I got just by splitting the derivation linter. > > I didn’t take the time to do it, but it would be nice to see, with the > patch I gave, how ‘guix lint -c derivations’ behaves. I've put some numbers below, with no changes the last batch to finish processing leaves the heap at 7755MiB [1], then Guile crashes after that. With the patch you sent, the heap size seems to stabilise at 4042MiB [2]. It also crashes at the end due to the match block not matching '(), but that's not important. I also hacked the lint script to run the checkers in the same manor as the Guix Data Service, so one checker at a time rather than one package at a time. That led to a max heap size of 3505MiB [3]. By adding in batching (as the Guix Data Service already does), I think it's possible to further reduce this to the 1778MiB number I give above. Reducing the memory usage helps reduce the cost/improve the throughput of loading data in to the Guix Data Service which is my primary motivation here. I'm also not only concerned with reducing the peak memory usage, but trying to have an implementation that'll gracefully handle more systems being supported in the future. It's for that second point that I think arranging the derivation linting so that it's possible to process each system in turn is important for the Guix Data Service, so that when new platforms are added, the memory usage won't grow as much. 1: no batching, one derivation checker 1065.0 MiB 1409.0 MiB 2089.0 MiB 2297.0 MiB 2513.0 MiB 2705.0 MiB 3077.0 MiB 3373.0 MiB 3557.0 MiB 3661.0 MiB 3901.0 MiB 3997.0 MiB 4147.0 MiB 4491.0 MiB 4635.0 MiB 5899.0 MiB 7755.0 MiB 2: batches of 1000 with fresh store connection, one derivation checker 1057.0 MiB 1137.0 MiB 1481.0 MiB 1481.0 MiB 1481.0 MiB 1481.0 MiB 1697.0 MiB 1697.0 MiB 1697.0 MiB 1761.0 MiB 1761.0 MiB 1761.0 MiB 1937.0 MiB 1977.0 MiB 1985.0 MiB 3065.0 MiB 3633.0 MiB 4041.0 MiB 4042.0 MiB 4042.0 MiB 4042.0 MiB 3: multiple derivation checkers, batched by system 1297.0 MiB 1545.0 MiB 1873.0 MiB 2113.0 MiB 2353.0 MiB 2609.0 MiB 2961.0 MiB 3193.0 MiB 3505.0 MiB
Hi, On Mon, 14 Nov 2022 at 13:51, Ludovic Courtès <ludo@gnu.org> wrote: > I meant that it’s important to have a single ‘derivation’ checker that > checks derivations for all the supported systems. Packagers should be > able to run ‘guix lint -c derivation PKG’ and be confident that it’s > fine for all systems. The CLI invokation is unrelated to the invoked checkers, no? As Chris is proposing, it seems being worth to group some checkers. For instance, we already have the option ’-n, --no-network’ which does that but probably at the wrong level. Maybe we could have another command line option, guix lint --group=no-network guix lint --group=derivation guix lint --group=no-network,derivation I do not know… Cheers, simon
Hi, Christopher Baines <mail@cbaines.net> skribis: > I've put some numbers below, with no changes the last batch to finish > processing leaves the heap at 7755MiB [1], then Guile crashes after > that. > > With the patch you sent, the heap size seems to stabilise at 4042MiB > [2]. It also crashes at the end due to the match block not matching '(), > but that's not important. > > I also hacked the lint script to run the checkers in the same manor as > the Guix Data Service, so one checker at a time rather than one package > at a time. That led to a max heap size of 3505MiB [3]. Thanks for testing and reporting back! > Reducing the memory usage helps reduce the cost/improve the throughput > of loading data in to the Guix Data Service which is my primary > motivation here. I'm also not only concerned with reducing the peak > memory usage, but trying to have an implementation that'll gracefully > handle more systems being supported in the future. Understood. From the viewpoint of core Guix, I’d like to understand where all this memory is going; this is unreasonable. The heap profiler I posted recently¹, coupled with ‘gcprof’, might help us understand what’s going on. ¹ https://lists.gnu.org/archive/html/guile-user/2022-11/msg00012.html > 1: no batching, one derivation checker [...] > 7755.0 MiB > > 2: batches of 1000 with fresh store connection, one derivation checker > 1057.0 MiB > 1137.0 MiB > 1481.0 MiB > 1481.0 MiB > 1481.0 MiB > 1481.0 MiB > 1697.0 MiB > 1697.0 MiB > 1697.0 MiB > 1761.0 MiB > 1761.0 MiB > 1761.0 MiB > 1937.0 MiB > 1977.0 MiB > 1985.0 MiB > 3065.0 MiB > 3633.0 MiB > 4041.0 MiB > 4042.0 MiB > 4042.0 MiB > 4042.0 MiB This seems to indicate a memory leak outside <store-connection>, such as a static cache. “GUIX_PROFILING=memoization” may give a hint. I’d really like to get it solved before we come up with workarounds like those we’re talking about. (However I’ll postpone this one because I’d like to focus on the release for now.) Thanks, Ludo’.
Hi, On mar., 15 nov. 2022 at 10:03, zimoun <zimon.toutoune@gmail.com> wrote: > On Mon, 14 Nov 2022 at 13:51, Ludovic Courtès <ludo@gnu.org> wrote: > >> I meant that it’s important to have a single ‘derivation’ checker that >> checks derivations for all the supported systems. Packagers should be >> able to run ‘guix lint -c derivation PKG’ and be confident that it’s >> fine for all systems. > > The CLI invokation is unrelated to the invoked checkers, no? As Chris > is proposing, it seems being worth to group some checkers. For > instance, we already have the option ’-n, --no-network’ which does that > but probably at the wrong level. > > Maybe we could have another command line option, > > guix lint --group=no-network > guix lint --group=derivation > guix lint --group=no-network,derivation What about this? Cheers, simon
Hi, Simon Tournier <zimon.toutoune@gmail.com> skribis: > On mar., 15 nov. 2022 at 10:03, zimoun <zimon.toutoune@gmail.com> wrote: >> On Mon, 14 Nov 2022 at 13:51, Ludovic Courtès <ludo@gnu.org> wrote: >> >>> I meant that it’s important to have a single ‘derivation’ checker that >>> checks derivations for all the supported systems. Packagers should be >>> able to run ‘guix lint -c derivation PKG’ and be confident that it’s >>> fine for all systems. >> >> The CLI invokation is unrelated to the invoked checkers, no? As Chris >> is proposing, it seems being worth to group some checkers. For >> instance, we already have the option ’-n, --no-network’ which does that >> but probably at the wrong level. >> >> Maybe we could have another command line option, >> >> guix lint --group=no-network >> guix lint --group=derivation >> guix lint --group=no-network,derivation > > What about this? In general, being able to tell which category a checker belongs to, and then being able to select checkers by categories sounds like a useful improvement to me. I don’t think it solves the problem Christopher initially reported though (about memory consumption of the ‘derivation’ checker.) Ludo’.
Hi Ludo, On Tue, 31 Jan 2023 at 17:33, Ludovic Courtès <ludo@gnu.org> wrote: > In general, being able to tell which category a checker belongs to, and > then being able to select checkers by categories sounds like a useful > improvement to me. > > I don’t think it solves the problem Christopher initially reported > though (about memory consumption of the ‘derivation’ checker.) Indeed. My understanding of the proposal is to split some checkers (as derivation) and then group them to have the expected behaviour. You wrote [1]: I meant that it’s important to have a single ‘derivation’ checker that checks derivations for all the supported systems. Packagers should be able to run ‘guix lint -c derivation PKG’ and be confident that it’s fine for all systems. and Chris answered [2]: I think we can still keep that by adding support for grouping lint checkers. So have a derivation group, instead of a single checker. Maybe that'll change the command slightly to 'guix lint -g derivation PKG', but I think that can be equivalent. where I try to feed Chris’s proposal. :-) I do not know if it is the correct level but being able to run “smaller“ checkers appears to me worth to try. 1: http://issues.guix.gnu.org/msgid/87h6z1puli.fsf@gnu.org 2: http://issues.guix.gnu.org/msgid/87leocfsnm.fsf@cbaines.net Cheers, simon
diff --git a/guix/scripts/lint.scm b/guix/scripts/lint.scm index 9920c3ee62..6d9b9e96a9 100644 --- a/guix/scripts/lint.scm +++ b/guix/scripts/lint.scm @@ -209,23 +209,31 @@ (define (parse-options) (list-checkers-and-exit checkers)) (with-error-handling - (let ((any-lint-checker-requires-store? - (any lint-checker-requires-store? checkers))) + (let ((store-needed? (any lint-checker-requires-store? checkers))) - (define (call-maybe-with-store proc) - (if any-lint-checker-requires-store? - (with-store store - (proc store)) - (proc #f))) - - (call-maybe-with-store - (lambda (store) - (cond - ((null? args) - (fold-packages (lambda (p r) (run-checkers p checkers - #:store store)) '())) - (else - (for-each (lambda (package) - (run-checkers package checkers - #:store store)) - args))))))))) + (cond + ((null? args) + (let loop ((packages (fold-packages cons '())) + (processed 0) + (store #f)) + (match packages + ((package . rest) + (let ((store (and store-needed? + (if store + ;; XXX: Periodically start a new session + ;; with empty caches to avoid unbounded + ;; heap growth caused by the 'derivation' + ;; checker. + (if (zero? (modulo processed 1000)) + (begin + (close-connection store) + (open-connection)) + store) + (open-connection))))) + (run-checkers package checkers #:store store) + (loop rest (+ 1 processed) store)))))) + (else + (for-each (lambda (package) + (run-checkers package checkers + #:store store)) + args)))))))