Message ID | 20210720112712.25905-1-maximedevos@telenet.be |
---|---|
State | Accepted |
Headers | show |
Series | [bug#49659,core-updates] gnu: guile: Fix failing tests on i686-linux. | expand |
Context | Check | Description |
---|---|---|
cbaines/applying patch | fail | View Laminar job |
cbaines/issue | success | View issue |
Hi! Maxime Devos <maximedevos@telenet.be> skribis: > i586-gnu might have the same issue. Please add a “Fixes …” line. > * gnu/packages/guile.scm > (guile-3.0)[arguments]<#:configure-flags>: Add > "-fexcess-precision=standard" to CFLAGS. Nitpick: the first two lines can be joined. :-) > (substitute-keyword-arguments (package-arguments guile-2.2) > ((#:configure-flags flags ''()) > - (let ((flags `(cons "--enable-mini-gmp" ,flags))) > + ;; -fexcess-precision=standard is required when compiling for > + ;; i686-linux, otherwise "numbers.test" will fail. > + (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard" > + "--enable-mini-gmp" ,flags))) Yay! Questions: 1. Should we make it conditional on (or (string-prefix? "i686-" %host-type) (string-prefix? "i586-" %host-type)) ? (I wonder why armhf-linux doesn’t have the same problem.) 2. Is there any downside to compiling all of libguile with this flag? 3. Do we have a clear explanation of why ‘-fexcess-precision=fast’ (the default) would lead to failures in ‘numbers.test’? I looked at the GCC manual (info "(gcc) Optimize Options") and at links you provided earlier on IRC, but I can’t really explain how this would lead those tests to fail: <https://issues.guix.gnu.org/49368>. I added a ‘printf’ call in ‘scm_i_inexact_floor_divide’, which is where it all happens: --8<---------------cut here---------------start------------->8--- static void scm_i_inexact_floor_divide (double x, double y, SCM *qp, SCM *rp) { if (SCM_UNLIKELY (y == 0)) scm_num_overflow (s_scm_floor_divide); /* or return a NaN? */ else { double q = floor (x / y); double r = x - q * y; printf ("%s x=%f y=%f x/y=%f floor(x/y)=%f q=%f r=%f\n", __func__, x, y, x/y, floor (x/y), q, r); *qp = scm_i_from_double (q); *rp = scm_i_from_double (r); } } --8<---------------cut here---------------end--------------->8--- I get this: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (euclidean/ 130. (exact->inexact 10/7)) scm_i_inexact_floor_divide x=130.000000 y=1.428571 x/y=91.000000 floor(x/y)=90.000000 q=90.000000 r=1.428571 $1 = 90.0 $2 = 1.4285714285714257 --8<---------------cut here---------------end--------------->8--- So it’s really ‘floor’ that’s messing up somehow. Perhaps we have to just accept it, use the flag, and be done with it, but that’s frustrating. Thoughts? Ludo’.
Ludovic Courtès schreef op di 20-07-2021 om 15:55 [+0200]: > Hi! > > Maxime Devos <maximedevos@telenet.be> skribis: > > > i586-gnu might have the same issue. > > Please add a “Fixes …” line. I didn't find the bug report. > > (substitute-keyword-arguments (package-arguments guile-2.2) > > ((#:configure-flags flags ''()) > > - (let ((flags `(cons "--enable-mini-gmp" ,flags))) > > + ;; -fexcess-precision=standard is required when compiling for > > + ;; i686-linux, otherwise "numbers.test" will fail. > > + (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard" > > + "--enable-mini-gmp" ,flags))) > > Yay! Questions: > > 1. Should we make it conditional on > (or (string-prefix? "i686-" %host-type) > (string-prefix? "i586-" %host-type)) Rather, (target-x86-32?). target-x86-32? also recognises "i486-linux-gnu" even though that's not a ‘supported’ cross-target. > ? (I wonder why armhf-linux doesn’t have the same problem.) AFAIK floats & doubles on arm don't have excess precision. Floating-point numbers are either 32-bit or 64-bit, unlike in x86, where the floating-point registers are 80-bit but (sizeof) double==8 (64 bits). (I'm ignoring MMX, SSE and the like.) I don't know any architectures beside x86 which have excess precision. "-fexcess-precision=standard" should be harmless on architectures that don't have excess precision. I'd make it unconditional, but conditional on x86-target? should work for all ‘supported’ targets in Guix. > 2. Is there any downside to compiling all of libguile with this flag? I searched with "git grep -F double" and "git grep -F float". Floating-point arithmetic doen't seem to be used much outside numbers.c. There's vm-engine.c, but the results of the calculations are written to some (stack?) memory (not a register), so the excess precision would be thrown away anyway, so I don't expect a slow-down. No code appears to be relying on excess precision. > 3. Do we have a clear explanation of why ‘-fexcess-precision=fast’ > (the default) would lead to failures in ‘numbers.test’? The problem I think is that the rounding choices made in scm_i_inexact_floor_divide must be consistent with those made in scm_i_inexact_floor_quotient and scm_i_inexact_floor_remainder (There are tests testing whether the results agree.) "-fexcess-precision=standard" reduces the degrees of freedom GCC has in choosing when to round, so it is more likely the rounding choices coincide? It's only an untested hypothesis though. FWIW, I think this line: /* in scm_i_inexact_floor_remainder */ return scm_i_from_double (x - y * floor (x / y)); should be (for less fragility in case GCC changes its optimisations and register allocation / spilling) /* in scm_i_inexact_floor_remainder */ return scm_i_from_double (x - y * (double) floor (x / y)); even then, there's no guarantee the rounding choices for x/y are the same in scm_i_inexact_floor_divide, scm_i_inexact_floor_remainder and scm_i_inexact_floor_quotient. > I looked at the GCC manual (info "(gcc) Optimize Options") and at links > you provided earlier on IRC, but I can’t really explain how this would > lead those tests to fail: <https://issues.guix.gnu.org/49368>;. > I added a ‘printf’ call in ‘scm_i_inexact_floor_divide’, which is where > it all happens: > > --8<---------------cut here---------------start------------->8--- > static void > scm_i_inexact_floor_divide (double x, double y, SCM *qp, SCM *rp) > { > if (SCM_UNLIKELY (y == 0)) > scm_num_overflow (s_scm_floor_divide); /* or return a NaN? */ > else > { > double q = floor (x / y); > double r = x - q * y; > printf ("%s x=%f y=%f x/y=%f floor(x/y)=%f q=%f r=%f\n", __func__, > x, y, x/y, floor (x/y), q, r); > *qp = scm_i_from_double (q); > *rp = scm_i_from_double (r); > } > } > --8<---------------cut here---------------end--------------->8--- > > I get this: > > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (euclidean/ 130. (exact->inexact 10/7)) > scm_i_inexact_floor_divide x=130.000000 y=1.428571 x/y=91.000000 floor(x/y)=90.000000 q=90.000000 r=1.428571 > $1 = 90.0 > $2 = 1.4285714285714257 > --8<---------------cut here---------------end--------------->8--- > > So it’s really ‘floor’ that’s messing up somehow. > I dunno if 'floor' is broken. Let me explain why this output is possible for a well-implemented 'floor': I want to note that printf("%f", floor(x/y)) can display 16 different strings: GCC can choose to round 'x' and/or 'y' by moving it from a register to stack memory. (doesn't apply here I think because SCM values discard the excess precision) GCC can choose to round the result of x/y (same reasons) GCC can choose to round the result of floor(x/y) (same reasons) Likewise, printf("%f", x/y) can display 8 different strings, and the rounding choices may be different from those made for printf("%f", floor(x/y)). So this inconsistency (x/y=91.00... > 90.00 = floor(x/y)) doesn't really surprise me. A more concrete scenario: suppose the CPU is configured to round upwards, and 'floor' is a mapping between extended-precision floating-point numbers. Let 'x' and 'y' be two floating-point numbers, such that: (1) the mathematical value of 'x/y' is slightly less than 91 (2) 'x/y' when calculated in extended precision is slightly less than 91. Denote this approximation as F1. (3) 'x/y' when calculated in double precision is 91 (or slightly larger) (due to the ‘rounding upwards’ mode, in ‘rounding downwards’ it might have been slightly less than 91 as in (2)) Denote this approximation as F2. Then [floor(x/y)=] floor(F1)=floor(90.999...)=90.0, and [x/y=] F2=91.0, thus we seemingly have x/y >= 1 + floor(x/y), even though that's mathematically nonsense. Thus, by choosing when to round (in-)appropriately, it is possible (on x86) that printf("x/y=%f, floor(x/y)=%f",x/y,floor(x/y)) will output "x/y=91.0 floor(x/y)=90.0". (Please tell if I made an error somewhere.) Greetings, Maxime
On Tue, Jul 20, 2021 at 03:55:49PM +0200, Ludovic Courtès wrote: > Hi! > > Maxime Devos <maximedevos@telenet.be> skribis: > > > i586-gnu might have the same issue. > > Please add a “Fixes …” line. > > > * gnu/packages/guile.scm > > (guile-3.0)[arguments]<#:configure-flags>: Add > > "-fexcess-precision=standard" to CFLAGS. > > Nitpick: the first two lines can be joined. :-) > > > (substitute-keyword-arguments (package-arguments guile-2.2) > > ((#:configure-flags flags ''()) > > - (let ((flags `(cons "--enable-mini-gmp" ,flags))) > > + ;; -fexcess-precision=standard is required when compiling for > > + ;; i686-linux, otherwise "numbers.test" will fail. > > + (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard" > > + "--enable-mini-gmp" ,flags))) > > Yay! Questions: > > 1. Should we make it conditional on > (or (string-prefix? "i686-" %host-type) > (string-prefix? "i586-" %host-type)) > ? (I wonder why armhf-linux doesn’t have the same problem.) > > Thoughts? > > Ludo’. > I'd also like to mention that this bug doesn't show up on 32-bit powerpc.
Maxime Devos <maximedevos@telenet.be> skribis: > Ludovic Courtès schreef op di 20-07-2021 om 15:55 [+0200]: [...] >> 1. Should we make it conditional on >> (or (string-prefix? "i686-" %host-type) >> (string-prefix? "i586-" %host-type)) > > Rather, (target-x86-32?). target-x86-32? also recognises "i486-linux-gnu" > even though that's not a ‘supported’ cross-target. Yes, makes sense. >> ? (I wonder why armhf-linux doesn’t have the same problem.) > > AFAIK floats & doubles on arm don't have excess precision. > > Floating-point numbers are either 32-bit or 64-bit, > unlike in x86, where the floating-point registers are 80-bit > but (sizeof) double==8 (64 bits). > > (I'm ignoring MMX, SSE and the like.) > > I don't know any architectures beside x86 which have excess precision. > "-fexcess-precision=standard" should be harmless on architectures > that don't have excess precision. > > I'd make it unconditional, but conditional on x86-target? should work > for all ‘supported’ targets in Guix. Alright. I’d still err on the side of making the change only for target-x86-32?, because that’s the only case where we know it’s needed. >> 2. Is there any downside to compiling all of libguile with this flag? > > I searched with "git grep -F double" and "git grep -F float". > Floating-point arithmetic doen't seem to be used much outside numbers.c. > > There's vm-engine.c, but the results of the calculations are written > to some (stack?) memory (not a register), so the excess precision > would be thrown away anyway, so I don't expect a slow-down. > > No code appears to be relying on excess precision. OK. >> 3. Do we have a clear explanation of why ‘-fexcess-precision=fast’ >> (the default) would lead to failures in ‘numbers.test’? > > The problem I think is that the rounding choices made in > scm_i_inexact_floor_divide > must be consistent with those made in > scm_i_inexact_floor_quotient > and > scm_i_inexact_floor_remainder > (There are tests testing whether the results agree.) > > "-fexcess-precision=standard" reduces the degrees of freedom GCC has > in choosing when to round, so it is more likely the rounding choices > coincide? It's only an untested hypothesis though. > > FWIW, I think this line: > > /* in scm_i_inexact_floor_remainder */ > return scm_i_from_double (x - y * floor (x / y)); > > should be (for less fragility in case GCC changes its optimisations and > register allocation / spilling) > > /* in scm_i_inexact_floor_remainder */ > return scm_i_from_double (x - y * (double) floor (x / y)); > > even then, there's no guarantee the rounding choices for x/y > are the same in scm_i_inexact_floor_divide, scm_i_inexact_floor_remainder > and scm_i_inexact_floor_quotient. Makes sense. Seems to me that this should simply be implemented differently to avoid the inconsistency in the first place (or one could ignore IA32 altogether…). > I dunno if 'floor' is broken. Let me explain why this output is possible for a > well-implemented 'floor': > > I want to note that printf("%f", floor(x/y)) > can display 16 different strings: > > GCC can choose to round 'x' and/or 'y' by moving it from a register to stack memory. > (doesn't apply here I think because SCM values discard the excess precision) > > GCC can choose to round the result of x/y (same reasons) > > GCC can choose to round the result of floor(x/y) (same reasons) > > Likewise, printf("%f", x/y) can display 8 different strings, and the rounding > choices may be different from those made for printf("%f", floor(x/y)). > > So this inconsistency (x/y=91.00... > 90.00 = floor(x/y)) doesn't really > surprise me. A more concrete scenario: suppose the CPU is configured to round > upwards, and 'floor' is a mapping between extended-precision floating-point numbers. > > Let 'x' and 'y' be two floating-point numbers, such that: > > (1) the mathematical value of 'x/y' is slightly less than 91 > (2) 'x/y' when calculated in extended precision is slightly less than 91. > Denote this approximation as F1. > (3) 'x/y' when calculated in double precision is 91 (or slightly larger) > (due to the ‘rounding upwards’ mode, in ‘rounding downwards’ it might > have been slightly less than 91 as in (2)) > Denote this approximation as F2. > > Then [floor(x/y)=] floor(F1)=floor(90.999...)=90.0, > and [x/y=] F2=91.0, thus we seemingly have x/y >= 1 + floor(x/y), > even though that's mathematically nonsense. > > Thus, by choosing when to round (in-)appropriately, it is possible (on x86) > that printf("x/y=%f, floor(x/y)=%f",x/y,floor(x/y)) will output "x/y=91.0 floor(x/y)=90.0". I’m no expert but that makes sense to me. Could you send an updated patch? If you think of a way to fix the issue in Guile itself, we can also do that. :-) Thanks for the investigation & explanation! Ludo’.
diff --git a/gnu/packages/guile.scm b/gnu/packages/guile.scm index d78c57e88c..e1f6495837 100644 --- a/gnu/packages/guile.scm +++ b/gnu/packages/guile.scm @@ -16,6 +16,7 @@ ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2019 Taylan Kammer <taylan.kammer@gmail.com> ;;; Copyright © 2020, 2021 Efraim Flashner <efraim@flashner.co.il> +;;; Copyright © 2021 Maxime Devos <maximedevos@telenet.be> ;;; ;;; This file is part of GNU Guix. ;;; @@ -316,7 +317,10 @@ without requiring the source code to be rewritten.") (arguments (substitute-keyword-arguments (package-arguments guile-2.2) ((#:configure-flags flags ''()) - (let ((flags `(cons "--enable-mini-gmp" ,flags))) + ;; -fexcess-precision=standard is required when compiling for + ;; i686-linux, otherwise "numbers.test" will fail. + (let ((flags `(cons* "CFLAGS=-g -O2 -fexcess-precision=standard" + "--enable-mini-gmp" ,flags))) ;; XXX: JIT-enabled Guile crashes in obscure ways on GNU/Hurd. (if (hurd-target?) `(cons "--disable-jit" ,flags)