[bug#74394,0/2] Skip slow tests by default and run 'check' in Git pre-push hook.

Message ID cover.1731844485.git.maxim.cournoyer@gmail.com
Headers
Series Skip slow tests by default and run 'check' in Git pre-push hook. |

Message

Maxim Cournoyer Nov. 17, 2024, 12:03 p.m. UTC
Hello,

This is a simple change that should ensure test suite breakages are detected
as early as possible and avoid tests breaking changes to be pushed.  This is
made possible by skipping a few expensive tests suite, bringing down the total
test time to about 1 minute on a fast machine.

We could call it a "distributed CI" approach ;-).

Note: I initially pursued an Automake or Make-based approach, but it ended up
far from trivial, hitting old issues such as [0] along the way.  This solution
simply puts the skip logic in the tests that must be skipped (a one liner).

To run the complete test suite including the slow tests (as is the case prior
this change):

make check WITH_SLOW_TESTS=1

[0]  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=74387

Maxim Cournoyer (2):
  build: Exclude expensive tests in check target by default.
  etc: Ensure test suite passes in pre-push git hook.

 Makefile.am                | 9 ++++++++-
 etc/git/pre-push           | 1 +
 tests/guix-home.sh         | 5 +++++
 tests/guix-package.sh      | 5 +++++
 tests/guix-system.sh       | 4 ++++
 tests/guix-time-machine.sh | 4 +++-
 6 files changed, 26 insertions(+), 2 deletions(-)


base-commit: 94133452aa49de672d69950b2e1a99432111074c
  

Comments

Ludovic Courtès Nov. 29, 2024, 10:05 a.m. UTC | #1
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> This is a simple change that should ensure test suite breakages are detected
> as early as possible and avoid tests breaking changes to be pushed.  This is
> made possible by skipping a few expensive tests suite, bringing down the total
> test time to about 1 minute on a fast machine.
>
> We could call it a "distributed CI" approach ;-).

I agree with the goal, of course, but not with the method: even without
expensive tests, “make check” is going to take maybe 5–10 minutes, and
having that happen when you run “git push” can be a terrible development
experience (especially since the developer most likely either already
ran the test suite or part of it right before, or pushes package changes
that have infinitely small probability of breaking “make check”).

Back to CI and not breaking things: I think that we should have a
workflow where the forge triggers those checks and puts a green light if
it passes, red light otherwise.  (Basically what everybody else is
doing. :-))

To me this should be one of the goals for the project in 2025.

> To run the complete test suite including the slow tests (as is the case prior
> this change):
>
> make check WITH_SLOW_TESTS=1

This variable itself may still be useful though (I’d call it
‘RUN_EXPENSIVE_TESTS’ or something like that—that’s the name used in
Coreutils—, “expensive” being the key word).  I would also keep it on by
default.

Thanks,
Ludo’.
  
Maxim Cournoyer Dec. 17, 2024, 12:28 a.m. UTC | #2
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> This is a simple change that should ensure test suite breakages are detected
>> as early as possible and avoid tests breaking changes to be pushed.  This is
>> made possible by skipping a few expensive tests suite, bringing down the total
>> test time to about 1 minute on a fast machine.
>>
>> We could call it a "distributed CI" approach ;-).
>
> I agree with the goal, of course, but not with the method: even without
> expensive tests, “make check” is going to take maybe 5–10 minutes, and
> having that happen when you run “git push” can be a terrible development
> experience (especially since the developer most likely either already
> ran the test suite or part of it right before, or pushes package changes
> that have infinitely small probability of breaking “make check”).

As I wrote, 'make check' with this change takes about 1 minute on my
machine; I'd be curious to know how long it takes on other people
machines; I suspect a bit more; if it's too slow, we can skip more, or
find out ways to make the tests run faster.

> Back to CI and not breaking things: I think that we should have a
> workflow where the forge triggers those checks and puts a green light if
> it passes, red light otherwise.  (Basically what everybody else is
> doing. :-))
>
> To me this should be one of the goals for the project in 2025.

That would be great.  I wonder how far QA is from making this
achievable.

>> To run the complete test suite including the slow tests (as is the case prior
>> this change):
>>
>> make check WITH_SLOW_TESTS=1
>
> This variable itself may still be useful though (I’d call it
> ‘RUN_EXPENSIVE_TESTS’ or something like that—that’s the name used in
> Coreutils—, “expensive” being the key word).  I would also keep it on by
> default.

One of the tests that was unbearably long when I measured was the
time-machine test.  It took about 20 minutes to fetch the git repository
with guile-git and run the test (which does extra work compared to the
CLI like validating each object).  I don't think we want this kind of
experience by default (because that'd probably convince people that
running the test suite often is not a reasonable thing to do).  The
other tests were more reasonable, with the longer ones in the 2-3
minutes range on my machine, IIRC.  Perhaps we could have this 20 minute
outlier skipped by default, maybe with a RUN_PROHIBITIVE_TESTS flag that
would default to 0 (false).

A long time ago I had read a blog post that argued that unit tests
should be small and fast [0], and there's a lot of good about that.
Fast tests usually translates in running the test suite more often and
catching breakage earlier.  As the author states, it also makes it
possible to determine whether a bug lies in the core logic or in the
integration of the many parts (unit tests vs integration tests), when
unit tests are decoupled from the whole system (typically by mocking all
external interfaces).

[0]  https://www.artima.com/weblogs/viewpost.jsp?thread=126923
  
Ludovic Courtès Dec. 17, 2024, 2:51 p.m. UTC | #3
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> I agree with the goal, of course, but not with the method: even without
>> expensive tests, “make check” is going to take maybe 5–10 minutes, and
>> having that happen when you run “git push” can be a terrible development
>> experience (especially since the developer most likely either already
>> ran the test suite or part of it right before, or pushes package changes
>> that have infinitely small probability of breaking “make check”).
>
> As I wrote, 'make check' with this change takes about 1 minute on my
> machine;

Right now, without your patch, we have:

--8<---------------cut here---------------start------------->8---
$ wget -qO- $(guix build --log-file guix --no-grafts)|gunzip |grep "\`check'"
starting phase `check'
phase `check' succeeded after 2049.2 seconds
--8<---------------cut here---------------end--------------->8---

More than 30mn on the fast machines of the build farm, and with some of
the expensive tests already skipped (those that require network access:
time-machine, pack -RR, etc.).

This patch is not dividing wall-clock time by 30, is it?

>> This variable itself may still be useful though (I’d call it
>> ‘RUN_EXPENSIVE_TESTS’ or something like that—that’s the name used in
>> Coreutils—, “expensive” being the key word).  I would also keep it on by
>> default.
>
> One of the tests that was unbearably long when I measured was the
> time-machine test.  It took about 20 minutes to fetch the git repository
> with guile-git and run the test (which does extra work compared to the
> CLI like validating each object).  I don't think we want this kind of
> experience by default (because that'd probably convince people that
> running the test suite often is not a reasonable thing to do).  The
> other tests were more reasonable, with the longer ones in the 2-3
> minutes range on my machine, IIRC.  Perhaps we could have this 20 minute
> outlier skipped by default, maybe with a RUN_PROHIBITIVE_TESTS flag that
> would default to 0 (false).

Yeah okay, maybe we should skip them by default, and maybe we can find a
way to ensure developers do run them periodically.

> A long time ago I had read a blog post that argued that unit tests
> should be small and fast [0],

I actually agree.  :-)

Thanks!

Ludo’.