mbox series

[bug#61493,0/2] gnu: hwloc: Skip failing test on non-x86 systems.

Message ID cover.1676319305.git.simon@simonsouth.net
Headers show
Series gnu: hwloc: Skip failing test on non-x86 systems. | expand

Message

Simon South Feb. 13, 2023, 8:56 p.m. UTC
Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
systems (and specifically on AArch64), allowing the package to build
successfully on these machines.

An additional, bonus patch removes a pair of obsolete comments from the hwloc
package definitions.

I've tested these changes on x86-64 and AArch64 and generally, things seem
fine.

- On x86-64, of hwloc's 136 dependents the only seven[0] that fail to build
  appear to be existing failures, according to ci.guix.gnu.org.

- On AArch64, the package builds fine; many of its dependents fail (in fact I
  am still waiting for builds to complete) but again, none of the failures
  I've investigated appear to be new.

----------

Here's some background information regarding the fix in case it's useful:

One of hwloc's primary functions is to provide information about the host
computer's processor topology, in terms of NUMA nodes, CPU clusters and so on.
At start-up it it tries to collect this information by querying a sequence of
"topology backends" that each implement a different strategy for detecting the
host system's configuration.

The first source of information is the operating system, so on most Guix
machines the "Linux" backend runs first.  This tries to pull information from
the /sys filesystem tree but since that's inaccessible from within build
containers, this always fails during hwloc's tests.

For x86 machines specifically, hwloc provides an architecture-specific,
fallback backend that can obtain the same information by querying the hardware
directly.  This normally succeeds within the build environment, and so hwloc
passes its tests without issue on x86 and x86-64 machines.

But those are the only platforms for which an architecture-specific topology
backend is provided: On other systems, once the Linux backend fails, hwloc has
nothing else to try and so any tests that rely on the host system's topology
having been detected will fail.

My patch fixes the build on these machines by skipping the one (other) test
that relies on this information being available, only on non-x86 systems where
the unavailability of /sys means certain failure.

For reference, the backends mentioned above are implemented in hwloc's
hwloc/topology-linux.c and hwloc/topology-x86.c.

--
Simon South
simon@simonsouth.net

[0] combinatorial-blas, cube, elemental, elpa-openmpi, python-dolfin-adjoint,
    scorep-openmpi and superlu-dist.


Simon South (2):
  gnu: hwloc: Remove obsolete comments.
  gnu: hwloc: Skip failing test on non-x86 systems.

 gnu/packages/mpi.scm | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)


base-commit: 5b1eab43f011983d9ee560d6935409b6b39706ff

Comments

Ludovic Courtès Feb. 27, 2023, 2:52 p.m. UTC | #1
Hi Simon,

Simon South <simon@simonsouth.net> skribis:

> Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
> systems (and specifically on AArch64), allowing the package to build
> successfully on these machines.
>
> An additional, bonus patch removes a pair of obsolete comments from the hwloc
> package definitions.
>
> I've tested these changes on x86-64 and AArch64 and generally, things seem
> fine.
>
> - On x86-64, of hwloc's 136 dependents the only seven[0] that fail to build
>   appear to be existing failures, according to ci.guix.gnu.org.
>
> - On AArch64, the package builds fine; many of its dependents fail (in fact I
>   am still waiting for builds to complete) but again, none of the failures
>   I've investigated appear to be new.

It’s a clear improvement according to <https://qa.guix.gnu.org/issue/61493>.

> ----------
>
> Here's some background information regarding the fix in case it's useful:
>
> One of hwloc's primary functions is to provide information about the host
> computer's processor topology, in terms of NUMA nodes, CPU clusters and so on.
> At start-up it it tries to collect this information by querying a sequence of
> "topology backends" that each implement a different strategy for detecting the
> host system's configuration.
>
> The first source of information is the operating system, so on most Guix
> machines the "Linux" backend runs first.  This tries to pull information from
> the /sys filesystem tree but since that's inaccessible from within build
> containers, this always fails during hwloc's tests.
>
> For x86 machines specifically, hwloc provides an architecture-specific,
> fallback backend that can obtain the same information by querying the hardware
> directly.  This normally succeeds within the build environment, and so hwloc
> passes its tests without issue on x86 and x86-64 machines.
>
> But those are the only platforms for which an architecture-specific topology
> backend is provided: On other systems, once the Linux backend fails, hwloc has
> nothing else to try and so any tests that rely on the host system's topology
> having been detected will fail.
>
> My patch fixes the build on these machines by skipping the one (other) test
> that relies on this information being available, only on non-x86 systems where
> the unavailability of /sys means certain failure.
>
> For reference, the backends mentioned above are implemented in hwloc's
> hwloc/topology-linux.c and hwloc/topology-x86.c.

Interesting, thanks for explaining!

Ludo’.
Ludovic Courtès Feb. 27, 2023, 3:37 p.m. UTC | #2
Hi again,

Simon South <simon@simonsouth.net> skribis:

> Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
> systems (and specifically on AArch64), allowing the package to build
> successfully on these machines.

I forwarded this to Brice Goglin, a colleague of mine also hwloc
co-maintainer, and they kindly opened an issue usptream:

  https://github.com/open-mpi/hwloc/pull/570

Feel free to comment there!

Ludo’.