diff mbox series

[bug#49659,core-updates] gnu: guile: Fix failing tests on i686-linux.

Message ID 20210720112712.25905-1-maximedevos@telenet.be
State Accepted
Headers show
Series [bug#49659,core-updates] gnu: guile: Fix failing tests on i686-linux. | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job
cbaines/issue success View issue

Commit Message

M July 20, 2021, 11:27 a.m. UTC
i586-gnu might have the same issue.

* gnu/packages/guile.scm
  (guile-3.0)[arguments]<#:configure-flags>: Add
  "-fexcess-precision=standard" to CFLAGS.
---
 gnu/packages/guile.scm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Ludovic Courtès July 20, 2021, 1:55 p.m. UTC | #1
Hi!

Maxime Devos <maximedevos@telenet.be> skribis:

> i586-gnu might have the same issue.

Please add a “Fixes …” line.

> * gnu/packages/guile.scm
>   (guile-3.0)[arguments]<#:configure-flags>: Add
>   "-fexcess-precision=standard" to CFLAGS.

Nitpick: the first two lines can be joined.  :-)

>       (substitute-keyword-arguments (package-arguments guile-2.2)
>         ((#:configure-flags flags ''())
> -        (let ((flags `(cons "--enable-mini-gmp" ,flags)))
> +        ;; -fexcess-precision=standard is required when compiling for
> +        ;; i686-linux, otherwise "numbers.test" will fail.
> +        (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard"
> +                              "--enable-mini-gmp" ,flags)))

Yay!  Questions:

  1. Should we make it conditional on
       (or (string-prefix? "i686-" %host-type)
           (string-prefix? "i586-" %host-type))
     ?  (I wonder why armhf-linux doesn’t have the same problem.)

  2. Is there any downside to compiling all of libguile with this flag?

  3. Do we have a clear explanation of why ‘-fexcess-precision=fast’
     (the default) would lead to failures in ‘numbers.test’?

I looked at the GCC manual (info "(gcc) Optimize Options") and at links
you provided earlier on IRC, but I can’t really explain how this would
lead those tests to fail: <https://issues.guix.gnu.org/49368>.

I added a ‘printf’ call in ‘scm_i_inexact_floor_divide’, which is where
it all happens:

--8<---------------cut here---------------start------------->8---
static void
scm_i_inexact_floor_divide (double x, double y, SCM *qp, SCM *rp)
{
  if (SCM_UNLIKELY (y == 0))
    scm_num_overflow (s_scm_floor_divide);  /* or return a NaN? */
  else
    {
      double q = floor (x / y);
      double r = x - q * y;
      printf ("%s x=%f y=%f x/y=%f floor(x/y)=%f q=%f r=%f\n", __func__,
	      x, y, x/y, floor (x/y), q, r);
      *qp = scm_i_from_double (q);
      *rp = scm_i_from_double (r);
    }
}
--8<---------------cut here---------------end--------------->8---

I get this:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (euclidean/ 130. (exact->inexact 10/7))
scm_i_inexact_floor_divide x=130.000000 y=1.428571 x/y=91.000000 floor(x/y)=90.000000 q=90.000000 r=1.428571
$1 = 90.0
$2 = 1.4285714285714257
--8<---------------cut here---------------end--------------->8---

So it’s really ‘floor’ that’s messing up somehow.

Perhaps we have to just accept it, use the flag, and be done with it,
but that’s frustrating.

Thoughts?

Ludo’.
M July 20, 2021, 4:55 p.m. UTC | #2
Ludovic Courtès schreef op di 20-07-2021 om 15:55 [+0200]:
> Hi!
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > i586-gnu might have the same issue.
> 
> Please add a “Fixes …” line.

I didn't find the bug report.

> >       (substitute-keyword-arguments (package-arguments guile-2.2)
> >         ((#:configure-flags flags ''())
> > -        (let ((flags `(cons "--enable-mini-gmp" ,flags)))
> > +        ;; -fexcess-precision=standard is required when compiling for
> > +        ;; i686-linux, otherwise "numbers.test" will fail.
> > +        (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard"
> > +                              "--enable-mini-gmp" ,flags)))
> 
> Yay!  Questions:
> 
>   1. Should we make it conditional on
>        (or (string-prefix? "i686-" %host-type)
>            (string-prefix? "i586-" %host-type))

Rather, (target-x86-32?). target-x86-32? also recognises "i486-linux-gnu"
even though that's not a ‘supported’ cross-target.

>      ?  (I wonder why armhf-linux doesn’t have the same problem.)

AFAIK floats & doubles on arm don't have excess precision.

Floating-point numbers are either 32-bit or 64-bit,
unlike in x86, where the floating-point registers are 80-bit
but (sizeof) double==8 (64 bits).

(I'm ignoring MMX, SSE and the like.)

I don't know any architectures beside x86 which have excess precision.
"-fexcess-precision=standard" should be harmless on architectures
that don't have excess precision.

I'd make it unconditional, but conditional on x86-target? should work
for all ‘supported’ targets in Guix.

>   2. Is there any downside to compiling all of libguile with this flag?

I searched with "git grep -F double" and "git grep -F float".
Floating-point arithmetic doen't seem to be used much outside numbers.c.

There's vm-engine.c, but the results of the calculations are written
to some (stack?) memory (not a register), so the excess precision
would be thrown away anyway, so I don't expect a slow-down.

No code appears to be relying on excess precision.

>   3. Do we have a clear explanation of why ‘-fexcess-precision=fast’
>      (the default) would lead to failures in ‘numbers.test’?

The problem I think is that the rounding choices made in
  scm_i_inexact_floor_divide
must be consistent with those made in
  scm_i_inexact_floor_quotient
and 
  scm_i_inexact_floor_remainder
(There are tests testing whether the results agree.)

"-fexcess-precision=standard" reduces the degrees of freedom GCC has
in choosing when to round, so it is more likely the rounding choices
coincide?  It's only an untested hypothesis though.

FWIW, I think this line:

    /* in scm_i_inexact_floor_remainder */
    return scm_i_from_double (x - y * floor (x / y));

should be (for less fragility in case GCC changes its optimisations and
register allocation / spilling)

    /* in scm_i_inexact_floor_remainder */
    return scm_i_from_double (x - y * (double) floor (x / y));

even then, there's no guarantee the rounding choices for x/y
are the same in scm_i_inexact_floor_divide, scm_i_inexact_floor_remainder
and scm_i_inexact_floor_quotient.

> I looked at the GCC manual (info "(gcc) Optimize Options") and at links
> you provided earlier on IRC, but I can’t really explain how this would
> lead those tests to fail: <https://issues.guix.gnu.org/49368>;.

> I added a ‘printf’ call in ‘scm_i_inexact_floor_divide’, which is where
> it all happens:
> 
> --8<---------------cut here---------------start------------->8---
> static void
> scm_i_inexact_floor_divide (double x, double y, SCM *qp, SCM *rp)
> {
>   if (SCM_UNLIKELY (y == 0))
>     scm_num_overflow (s_scm_floor_divide);  /* or return a NaN? */
>   else
>     {
>       double q = floor (x / y);
>       double r = x - q * y;
>       printf ("%s x=%f y=%f x/y=%f floor(x/y)=%f q=%f r=%f\n", __func__,
> 	      x, y, x/y, floor (x/y), q, r);
>       *qp = scm_i_from_double (q);
>       *rp = scm_i_from_double (r);
>     }
> }
> --8<---------------cut here---------------end--------------->8---
> 
> I get this:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (euclidean/ 130. (exact->inexact 10/7))
> scm_i_inexact_floor_divide x=130.000000 y=1.428571 x/y=91.000000 floor(x/y)=90.000000 q=90.000000 r=1.428571
> $1 = 90.0
> $2 = 1.4285714285714257
> --8<---------------cut here---------------end--------------->8---
> 
> So it’s really ‘floor’ that’s messing up somehow.
> 

I dunno if 'floor' is broken.  Let me explain why this output is possible for a
well-implemented 'floor':

I want to note that printf("%f", floor(x/y))
can display 16 different strings:

  GCC can choose to round 'x' and/or 'y' by moving it from a register to stack memory.
  (doesn't apply here I think because SCM values discard the excess precision)

  GCC can choose to round the result of x/y (same reasons)

  GCC can choose to round the result of floor(x/y) (same reasons)

Likewise, printf("%f", x/y) can display 8 different strings, and the rounding
choices may be different from those made for printf("%f", floor(x/y)).

So this inconsistency (x/y=91.00... > 90.00 = floor(x/y))  doesn't really
surprise me.  A more concrete scenario: suppose the CPU is configured to round
upwards, and 'floor' is a mapping between extended-precision floating-point numbers.

Let 'x' and 'y' be two floating-point numbers, such that:

 (1) the mathematical value of 'x/y' is slightly less than 91
 (2) 'x/y' when calculated in extended precision is slightly less than 91.
     Denote this approximation as F1.
 (3) 'x/y' when calculated in double precision is 91 (or slightly larger)
     (due to the ‘rounding upwards’ mode, in ‘rounding downwards’ it might
      have been slightly less than 91 as in (2))
     Denote this approximation as F2.

Then [floor(x/y)=] floor(F1)=floor(90.999...)=90.0,
and  [x/y=] F2=91.0, thus we seemingly have x/y >= 1 + floor(x/y),
even though that's mathematically nonsense.

Thus, by choosing when to round (in-)appropriately, it is possible (on x86)
that printf("x/y=%f, floor(x/y)=%f",x/y,floor(x/y)) will output "x/y=91.0 floor(x/y)=90.0".

(Please tell if I made an error somewhere.)

Greetings,
Maxime
Efraim Flashner July 20, 2021, 6:22 p.m. UTC | #3
On Tue, Jul 20, 2021 at 03:55:49PM +0200, Ludovic Courtès wrote:
> Hi!
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > i586-gnu might have the same issue.
> 
> Please add a “Fixes …” line.
> 
> > * gnu/packages/guile.scm
> >   (guile-3.0)[arguments]<#:configure-flags>: Add
> >   "-fexcess-precision=standard" to CFLAGS.
> 
> Nitpick: the first two lines can be joined.  :-)
> 
> >       (substitute-keyword-arguments (package-arguments guile-2.2)
> >         ((#:configure-flags flags ''())
> > -        (let ((flags `(cons "--enable-mini-gmp" ,flags)))
> > +        ;; -fexcess-precision=standard is required when compiling for
> > +        ;; i686-linux, otherwise "numbers.test" will fail.
> > +        (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard"
> > +                              "--enable-mini-gmp" ,flags)))
> 
> Yay!  Questions:
> 
>   1. Should we make it conditional on
>        (or (string-prefix? "i686-" %host-type)
>            (string-prefix? "i586-" %host-type))
>      ?  (I wonder why armhf-linux doesn’t have the same problem.)
> 
> Thoughts?
> 
> Ludo’.
> 

I'd also like to mention that this bug doesn't show up on 32-bit powerpc.
Ludovic Courtès July 20, 2021, 8:51 p.m. UTC | #4
Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op di 20-07-2021 om 15:55 [+0200]:

[...]

>>   1. Should we make it conditional on
>>        (or (string-prefix? "i686-" %host-type)
>>            (string-prefix? "i586-" %host-type))
>
> Rather, (target-x86-32?). target-x86-32? also recognises "i486-linux-gnu"
> even though that's not a ‘supported’ cross-target.

Yes, makes sense.

>>      ?  (I wonder why armhf-linux doesn’t have the same problem.)
>
> AFAIK floats & doubles on arm don't have excess precision.
>
> Floating-point numbers are either 32-bit or 64-bit,
> unlike in x86, where the floating-point registers are 80-bit
> but (sizeof) double==8 (64 bits).
>
> (I'm ignoring MMX, SSE and the like.)
>
> I don't know any architectures beside x86 which have excess precision.
> "-fexcess-precision=standard" should be harmless on architectures
> that don't have excess precision.
>
> I'd make it unconditional, but conditional on x86-target? should work
> for all ‘supported’ targets in Guix.

Alright.

I’d still err on the side of making the change only for target-x86-32?,
because that’s the only case where we know it’s needed.

>>   2. Is there any downside to compiling all of libguile with this flag?
>
> I searched with "git grep -F double" and "git grep -F float".
> Floating-point arithmetic doen't seem to be used much outside numbers.c.
>
> There's vm-engine.c, but the results of the calculations are written
> to some (stack?) memory (not a register), so the excess precision
> would be thrown away anyway, so I don't expect a slow-down.
>
> No code appears to be relying on excess precision.

OK.

>>   3. Do we have a clear explanation of why ‘-fexcess-precision=fast’
>>      (the default) would lead to failures in ‘numbers.test’?
>
> The problem I think is that the rounding choices made in
>   scm_i_inexact_floor_divide
> must be consistent with those made in
>   scm_i_inexact_floor_quotient
> and 
>   scm_i_inexact_floor_remainder
> (There are tests testing whether the results agree.)
>
> "-fexcess-precision=standard" reduces the degrees of freedom GCC has
> in choosing when to round, so it is more likely the rounding choices
> coincide?  It's only an untested hypothesis though.
>
> FWIW, I think this line:
>
>     /* in scm_i_inexact_floor_remainder */
>     return scm_i_from_double (x - y * floor (x / y));
>
> should be (for less fragility in case GCC changes its optimisations and
> register allocation / spilling)
>
>     /* in scm_i_inexact_floor_remainder */
>     return scm_i_from_double (x - y * (double) floor (x / y));
>
> even then, there's no guarantee the rounding choices for x/y
> are the same in scm_i_inexact_floor_divide, scm_i_inexact_floor_remainder
> and scm_i_inexact_floor_quotient.

Makes sense.  Seems to me that this should simply be implemented
differently to avoid the inconsistency in the first place (or one could
ignore IA32 altogether…).

> I dunno if 'floor' is broken.  Let me explain why this output is possible for a
> well-implemented 'floor':
>
> I want to note that printf("%f", floor(x/y))
> can display 16 different strings:
>
>   GCC can choose to round 'x' and/or 'y' by moving it from a register to stack memory.
>   (doesn't apply here I think because SCM values discard the excess precision)
>
>   GCC can choose to round the result of x/y (same reasons)
>
>   GCC can choose to round the result of floor(x/y) (same reasons)
>
> Likewise, printf("%f", x/y) can display 8 different strings, and the rounding
> choices may be different from those made for printf("%f", floor(x/y)).
>
> So this inconsistency (x/y=91.00... > 90.00 = floor(x/y))  doesn't really
> surprise me.  A more concrete scenario: suppose the CPU is configured to round
> upwards, and 'floor' is a mapping between extended-precision floating-point numbers.
>
> Let 'x' and 'y' be two floating-point numbers, such that:
>
>  (1) the mathematical value of 'x/y' is slightly less than 91
>  (2) 'x/y' when calculated in extended precision is slightly less than 91.
>      Denote this approximation as F1.
>  (3) 'x/y' when calculated in double precision is 91 (or slightly larger)
>      (due to the ‘rounding upwards’ mode, in ‘rounding downwards’ it might
>       have been slightly less than 91 as in (2))
>      Denote this approximation as F2.
>
> Then [floor(x/y)=] floor(F1)=floor(90.999...)=90.0,
> and  [x/y=] F2=91.0, thus we seemingly have x/y >= 1 + floor(x/y),
> even though that's mathematically nonsense.
>
> Thus, by choosing when to round (in-)appropriately, it is possible (on x86)
> that printf("x/y=%f, floor(x/y)=%f",x/y,floor(x/y)) will output "x/y=91.0 floor(x/y)=90.0".

I’m no expert but that makes sense to me.

Could you send an updated patch?

If you think of a way to fix the issue in Guile itself, we can also do
that.  :-)

Thanks for the investigation & explanation!

Ludo’.
diff mbox series

Patch

diff --git a/gnu/packages/guile.scm b/gnu/packages/guile.scm
index d78c57e88c..e1f6495837 100644
--- a/gnu/packages/guile.scm
+++ b/gnu/packages/guile.scm
@@ -16,6 +16,7 @@ 
 ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org>
 ;;; Copyright © 2019 Taylan Kammer <taylan.kammer@gmail.com>
 ;;; Copyright © 2020, 2021 Efraim Flashner <efraim@flashner.co.il>
+;;; Copyright © 2021 Maxime Devos <maximedevos@telenet.be>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -316,7 +317,10 @@  without requiring the source code to be rewritten.")
     (arguments
      (substitute-keyword-arguments (package-arguments guile-2.2)
        ((#:configure-flags flags ''())
-        (let ((flags `(cons "--enable-mini-gmp" ,flags)))
+        ;; -fexcess-precision=standard is required when compiling for
+        ;; i686-linux, otherwise "numbers.test" will fail.
+        (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard"
+                              "--enable-mini-gmp" ,flags)))
           ;; XXX: JIT-enabled Guile crashes in obscure ways on GNU/Hurd.
           (if (hurd-target?)
               `(cons "--disable-jit" ,flags)