[bug#54635,v2,5/5] gnu: Add wfmash.
Commit Message
* gnu/packages/bioinformatics.scm (wfmash): New variable.
---
gnu/packages/bioinformatics.scm | 51 +++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
Comments
Arun Isaac schreef op do 31-03-2022 om 12:58 [+0530]:
> + (("-march=native") ""))
This is also wrong for x86 systems because it makes the build non-
reproducible. Also, has upstream been informed about some of the
compiler flags being architecture-specific?
Greetings,
Maxime.
On Thu, Mar 31, 2022 at 01:34:29PM +0200, Maxime Devos wrote:
> Arun Isaac schreef op do 31-03-2022 om 12:58 [+0530]:
> > + (("-march=native") ""))
>
> This is also wrong for x86 systems because it makes the build non-
> reproducible. Also, has upstream been informed about some of the
> compiler flags being architecture-specific?
I'm pretty sure upstream is aware of it, and the -mcx16 flag. That whole
phase doesn't need to be non-x86_64 only, upstream prefers it that way
to get fater results but IMO it would be fine to move it into a snippet.
Efraim Flashner schreef op do 31-03-2022 om 15:18 [+0300]:
> > Arun Isaac schreef op do 31-03-2022 om 12:58 [+0530]:
> > > + (("-march=native") ""))
> >
> > This is also wrong for x86 systems because it makes the build non-
> > reproducible. Also, has upstream been informed about some of the
> > compiler flags being architecture-specific?
>
> I'm pretty sure upstream is aware of it, and the -mcx16 flag. That
> whole phase doesn't need to be non-x86_64 only, upstream prefers it
> that way to get fater results
wfmash could be written to detect CPU features at runtime and there is
also --tune. Also, upstream preferring march=native does not make the
build reproducible.
> but IMO it would be fine to move it into a snippet.
It does not have to be in a snippet, it just needs to be reproducible
(so no march=native, whether on x86 or not).
Upstream seems to be aware of the non-x86
(https://github.com/ekg/wfmash/issues/125) but they do not seem to be
aware of the problems with march=native.
Greetings,
Maxime
On Thu, Mar 31, 2022 at 03:07:49PM +0200, Maxime Devos wrote:
> Efraim Flashner schreef op do 31-03-2022 om 15:18 [+0300]:
> > > Arun Isaac schreef op do 31-03-2022 om 12:58 [+0530]:
> > > > + (("-march=native") ""))
> > >
> > > This is also wrong for x86 systems because it makes the build non-
> > > reproducible. Also, has upstream been informed about some of the
> > > compiler flags being architecture-specific?
> >
> > I'm pretty sure upstream is aware of it, and the -mcx16 flag. That
> > whole phase doesn't need to be non-x86_64 only, upstream prefers it
> > that way to get fater results
>
> wfmash could be written to detect CPU features at runtime and there is
> also --tune. Also, upstream preferring march=native does not make the
> build reproducible.
>
> > but IMO it would be fine to move it into a snippet.
>
> It does not have to be in a snippet, it just needs to be reproducible
> (so no march=native, whether on x86 or not).
>
> Upstream seems to be aware of the non-x86
> (https://github.com/ekg/wfmash/issues/125) but they do not seem to be
> aware of the problems with march=native.
I will let them know. I also added a patch which runs a test suite based
on the github action.
On Thu, Mar 31, 2022 at 03:07:49PM +0200, Maxime Devos wrote:
> Efraim Flashner schreef op do 31-03-2022 om 15:18 [+0300]:
> > > Arun Isaac schreef op do 31-03-2022 om 12:58 [+0530]:
> > > > + (("-march=native") ""))
> > >
> > > This is also wrong for x86 systems because it makes the build non-
> > > reproducible. Also, has upstream been informed about some of the
> > > compiler flags being architecture-specific?
> >
> > I'm pretty sure upstream is aware of it, and the -mcx16 flag. That
> > whole phase doesn't need to be non-x86_64 only, upstream prefers it
> > that way to get fater results
>
> wfmash could be written to detect CPU features at runtime and there is
> also --tune. Also, upstream preferring march=native does not make the
> build reproducible.
>
> > but IMO it would be fine to move it into a snippet.
>
> It does not have to be in a snippet, it just needs to be reproducible
> (so no march=native, whether on x86 or not).
I suppose not, but the -mcx16 should be in the snippet, since it adds
compiler flags which only work on some architectures. And while we're at
it we can do the -march=native one too.
> Upstream seems to be aware of the non-x86
> (https://github.com/ekg/wfmash/issues/125) but they do not seem to be
> aware of the problems with march=native.
>
I've pushed these patches with a few tweaks and an additional patch to
run a test suite based on the github workflow in the repository. The
entire check phase takes ~2 minutes on my pinebook pro, so aarch64
doesn't seem to need to skip some tests like riscv64 does.
Thanks for reviewing and merging this patchset!
>> wfmash could be written to detect CPU features at runtime and there is
>> also --tune.
wfmash's speed is one of the critical features it brings to the
table. We might have rendered the package more or less unusable by
disabling CPU optimizations.
I could try talking to upstream about detecting and using CPU features
at runtime. But, I don't know too much about the topic. Any quick links
I can share with them?
> I've pushed these patches with a few tweaks and an additional patch to
> run a test suite based on the github workflow in the repository. The
> entire check phase takes ~2 minutes on my pinebook pro, so aarch64
> doesn't seem to need to skip some tests like riscv64 does.
I think it is better if upstream provides us a `make check' target to
run these tests. I have asked them:
https://github.com/ekg/wfmash/issues/130 Hard-coding their CI tests
makes the package fragile. In the future, upstream could change the
tests they run.
Arun Isaac schreef op vr 01-04-2022 om 13:10 [+0530]:
> wfmash's speed is one of the critical features it brings to the
> table. We might have rendered the package more or less unusable by
> disabling CPU optimizations.
>
> I could try talking to upstream about detecting and using CPU features
> at runtime. But, I don't know too much about the topic. Any quick links
> I can share with them?
For the concept, maybe
<https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/>
, which is about combining high performance with reproducibility using
the (Guix-specific) --tune package transformation. To implement run-
time detection, the following could be useful:
* Guix article:
<https://hpc.guix.info/blog/2018/01/pre-built-binaries-vs-performance/>
* LWN article: <https://lwn.net/Articles/691932/>.
Greetings,
Maxime.
Hi Maxime,
> For the concept, maybe
> <https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/>
> , which is about combining high performance with reproducibility using
> the (Guix-specific) --tune package transformation. To implement run-
> time detection, the following could be useful:
>
> * Guix article:
> <https://hpc.guix.info/blog/2018/01/pre-built-binaries-vs-performance/>
> * LWN article: <https://lwn.net/Articles/691932/>.
This is helpful. I will read and pass it on to wfmash upstream.
Thanks,
Arun
@@ -16107,3 +16107,54 @@ (define-public ccwl
@acronym{EDSL, Embedded Domain Specific Language} in the Scheme programming
language.")
(license license:gpl3+)))
+
+(define-public wfmash
+ (package
+ (name "wfmash")
+ (version "0.8.1")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append "https://github.com/ekg/wfmash/releases/download/v"
+ version "/wfmash-v" version ".tar.gz"))
+ (sha256
+ (base32
+ "031cm1arpfckvihb28vlk69mirpnmlag81zcscfba1bac58wvr7c"))
+ (modules '((guix build utils)))
+ (snippet
+ '(begin
+ ;; Unbundle atomic-queue.
+ (delete-file-recursively "src/common/atomic_queue")
+ (substitute* "src/align/include/computeAlignments.hpp"
+ (("\"common/atomic_queue/atomic_queue.h\"")
+ "<atomic_queue/atomic_queue.h>"))))))
+ (build-system cmake-build-system)
+ (arguments
+ `(#:tests? #f ; no tests
+ #:phases
+ (modify-phases %standard-phases
+ ;; Remove compiler flags and checks specific to x86 when not
+ ;; targeting it.
+ ,@(if (target-x86-64?)
+ '()
+ '((add-after 'unpack 'remove-x86-specific-compile-flags
+ (lambda _
+ (substitute* (find-files "." "CMakeLists\\.txt")
+ (("-mcx16") "")
+ (("-march=native") ""))
+ (substitute* "src/common/dset64.hpp"
+ (("!__x86_64__") "0")))))))))
+ (inputs
+ (list atomic-queue
+ gsl
+ htslib
+ jemalloc
+ zlib))
+ (synopsis "Base-accurate DNA sequence aligner")
+ (description "@code{wfmash} is a DNA sequence read mapper based on mash
+distances and the wavefront alignment algorithm. It is a fork of MashMap that
+implements base-level alignment via the wflign tiled wavefront global
+alignment algorithm. It completes MashMap with a high-performance alignment
+module capable of computing base-level alignments for very large sequences.")
+ (home-page "https://github.com/ekg/wfmash")
+ (license license:expat)))