diff mbox series

[bug#58140,2/6] gnu: Add kaldi-for-vosk.

Message ID 20220928115755.6292-2-ngraves@ngraves.fr
State Accepted
Headers show
Series [bug#58140,1/6] gnu: Add openfst-for-vosk. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git-branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Nicolas Graves Sept. 28, 2022, 11:57 a.m. UTC
* gnu/packages/machine-learning.scm (kaldi-for-vosk): New variable.
---
 gnu/packages/machine-learning.scm | 98 +++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

Comments

M Sept. 28, 2022, 9:05 p.m. UTC | #1
>+            (add-after 'unpack 'chdir
 >+              (lambda _ (chdir "src") #t))

Trailing #t haven't been required since a long time.

On 28-09-2022 13:57, Nicolas Graves via Guix-patches via wrote:
> +                (when (not (or (string-prefix? "x86_64" system)
> +                               (string-prefix? "i686" system)))
> +                  (substitute* "makefiles/linux_openblas.mk"
> +                    (("-msse -msse2") "")))

You are testing for the system it is being compiled on ('system'), not 
the system it is being compiled for ('target').  For cross-compilation, 
usually the latter is required.  You may find
#$@(if (target-x86?) #~((substitute* ...) #~())
useful, it automatically handles cross-compilation, both 32-bit and 
64-bit, a hypothetical i786 it it ever starts existing, i586 if we ever 
lower the requirements ...

> +                     (string-append "OPENBLASROOT=\"" #$openblas "\""))

#$PACKAGE does not compose with --with-input.  I recommend doing 
something like
(dirname (dirname (search-input-file inputs "lib/libblas.so")))
instead.  Likewise for other uses of #$PACKAGE.

> +                  (setenv "OPENFST_VER" #$(package-version openfst))

I recommend referring to 'openfst' with 'this-package-input', that way 
when --with-input or --with-latest is used to change it to something 
with a different version, the version should still be detetcxed properly.

> +                (substitute* "./Makefile"
> +                  (("USE_SHARED\\?=0")
> +                   "USE_SHARED?=1")

IIUC, ?=0 means that 0 is the default, you can override it by setting 
#:make-flags.

> dialects - English, 

I think one of the special dashes (en dashes, em dashes, figure dash? 
Don't know which one) would be appropriate here.

> +(define-public python-nerd-dictation
> +  (let* ((commit "53ab129a5ee0f8b5df284e8cf2229219b732c59e")
> +         (revision "0"))
> +    (package
> +      (name "python-nerd-dictation")
> +      (version (git-version "0" revision commit))

Going by <https://github.com/ideasman42/nerd-dictation>, 'python-' is 
not part of its name, you can drop the prefix AFAICT.

> +      (synopsis "Offline speech-to-text for desktop Linux")

If it's Linux only, you can use the 'supported-systems' field for that, 
see (gnu packages linux) for examples.

> +                        #$(file-append bash-minimal "/bin/bash")

You can use 'search-input-file' to avoid depending on input labels.

> +          (let* ((out (assoc-ref %outputs "out"))

If you are using G-exps, you can replace (assoc-ref %outputs "out") with 
its G-exp equivalent #$output.

> +(define openfst-for-vosk
> +  (package
> +    (inherit openfst)
> +    (version "1.8.0")
> +    (source
> +     (origin
> +       (method url-fetch)
> +       (uri (string-append "http://www.openfst.org/twiki/pub/FST/"
> +                           "FstDownload/openfst-" version ".tar.gz"))
> +       (sha256
> +        (base32 "0h2lfhhihg63b804hrcljnkggijbjmp84i5g8q735wb09y9z2c4p"))))

Why select an older version?  Would keeping the original (and more 
up-to-date) version work?  To avoid a name conflict between the openfst 
(which would be inconvenient for "guix show", "guix install", "guix 
shell"), you can override the 'name' field.

Greetings,
Maxime.
\( Sept. 29, 2022, 6:44 a.m. UTC | #2
On Wed Sep 28, 2022 at 10:05 PM BST, Maxime Devos wrote:
> > +                     (string-append "OPENBLASROOT=\"" #$openblas "\""))
>
> #$PACKAGE does not compose with --with-input.  I recommend doing 
> something like
> (dirname (dirname (search-input-file inputs "lib/libblas.so")))
> instead.  Likewise for other uses of #$PACKAGE.

Why not ``#$(this-package-input openblas)''?

    -- (
Nicolas Graves Sept. 29, 2022, 8:20 a.m. UTC | #3
> Trailing #t haven't been required since a long time.
A big part of the code, and in particular old forms, come from the code
of the current kaldi package. Should I also change the same code chunks
for kaldi in an additional patch ?

> If it's Linux only, you can use the 'supported-systems' field for that,
> see (gnu packages linux) for examples.
I don't really know that. Ydotool probably only work on Linux, since
they rely on linux keycodes. I don't know for X. Maybe someone should
test. Should I suppose it only supports Linux by default?

> Why select an older version?  Would keeping the original (and more
> up-to-date) version work?  To avoid a name conflict between the openfst
> (which would be inconvenient for "guix show", "guix install", "guix
> shell"), you can override the 'name' field.

No, it doesn't work and that's the reason why I used this version. It
might however work with the version that's present for kaldi (1.7.3
IIRC), I can test that. But the flags aren't the same, so we probably
should do another package anyway.

I didn't change the name, but I also haven't exported the variable
(define instead of define-public). I expected the package to not be
available through "guix search" or "guix install". Is that OK?

I've done a V3, testing it before sending. 

--
Best regards,
Nicolas Graves
M Sept. 29, 2022, 9:32 a.m. UTC | #4
On 29-09-2022 10:20, Nicolas Graves wrote:
> 
>> Trailing #t haven't been required since a long time.
> A big part of the code, and in particular old forms, come from the code
> of the current kaldi package. Should I also change the same code chunks
> for kaldi in an additional patch ?

That would be nice, but not required I'd say.

>> If it's Linux only, you can use the 'supported-systems' field for that,
>> see (gnu packages linux) for examples.
> I don't really know that. Ydotool probably only work on Linux, since
> they rely on linux keycodes. I don't know for X. Maybe someone should
> test. Should I suppose it only supports Linux by default?

I think that usually 'if it works on Linux, it probably can work on 
similar-ish systems as well’ is a reasonable assumption, but perhaps 
with the keycodes, it isn't.

However, if the problem is in 'ydotool', you can mention that in the 
supported-systems of 'ydotool', 'supported-systems' has a kind of 
implicit transitivity going by the use of 
package-transitive-supported-systems in (guix ui).

>> Why select an older version?  Would keeping the original (and more
>> up-to-date) version work?  To avoid a name conflict between the openfst
>> (which would be inconvenient for "guix show", "guix install", "guix
>> shell"), you can override the 'name' field.
> 
> No, it doesn't work and that's the reason why I used this version.

In that case, I recommend adding a comment to the definition, to avoid 
the risk of someone 'helpfully' updating the package anyway, and an 
upstream report, such that upstream can address the compatibility 
problems with the new version.

> It
> might however work with the version that's present for kaldi (1.7.3
> IIRC), I can test that. But the flags aren't the same, so we probably
> should do another package anyway.
> 
> I didn't change the name, but I also haven't exported the variable
> (define instead of define-public). I expected the package to not be
> available through "guix search" or "guix install". Is that OK?

I suppose it is OK, though personally I think it might be a bit 
confusing, e.g. to people using "guix shell -D ..." ending up with a 
package version in their environment that they can't find with "guix 
search".

Greetings,
Maxime.
diff mbox series

Patch

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 3ad907e0c9..9b2e01c102 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -1596,6 +1596,104 @@  (define-public kaldi
 written in C++.")
       (license license:asl2.0))))
 
+(define kaldi-for-vosk
+  (let* ((commit "6417ac1dece94783e80dfbac0148604685d27579")
+         (revision "0")
+         (openfst openfst-for-vosk))
+    (package
+      (inherit kaldi)
+      (name "kaldi")
+      (version (git-version "0" revision commit))
+      (source
+       (origin
+         (method git-fetch)
+         (uri (git-reference
+               (url "https://github.com/alphacep/kaldi")
+               (commit commit)))
+         (file-name (git-file-name name version))
+         (sha256
+          (base32 "04xw2dpfvpla8skpk08azmgr9k97cd8hn83lj4l85q165gbzql4s"))))
+      (inputs
+       (list alsa-lib
+             ;; `(,gfortran "lib") ;; replaced by lapack
+             lapack
+             glib
+             gstreamer
+             jack-1
+             openblas
+             openfst
+             portaudio
+             python))
+      (arguments
+       (list
+        #:test-target "test"
+        #:make-flags ''("online2" "lm" "rnnlm")
+        #:phases
+        #~(modify-phases %standard-phases
+            (add-after 'unpack 'chdir
+              (lambda _ (chdir "src") #t))
+            (replace 'configure
+              (lambda* (#:key build system inputs outputs #:allow-other-keys)
+                (when (not (or (string-prefix? "x86_64" system)
+                               (string-prefix? "i686" system)))
+                  (substitute* "makefiles/linux_openblas.mk"
+                    (("-msse -msse2") "")))
+                (substitute* "makefiles/default_rules.mk"
+                  (("/bin/bash") (which "bash")))
+                (substitute* "Makefile"
+                  (("ext_depend: check_portaudio")
+                   "ext_depend:"))
+                (substitute* '("online/Makefile"
+                               "onlinebin/Makefile"
+                               "gst-plugin/Makefile")
+                  (("../../tools/portaudio/install")
+                   (assoc-ref inputs "portaudio")))
+                (substitute* "matrix/Makefile"     ;temporary test bypass
+                  (("matrix-lib-test sparse-matrix-test") ""))
+
+                ;; This `configure' script doesn't support variables passed as
+                ;; arguments, nor does it support "prefix".
+                (let ((out (assoc-ref outputs "out")))
+                  (substitute* "configure"
+                    (("check_for_slow_expf;") "")
+                    ;; This affects the RPATH and also serves as the installation
+                    ;; directory.
+                    (("KALDILIBDIR=`pwd`/lib")
+                     (string-append "KALDILIBDIR=" out "/lib"))
+                    (("OPENBLASROOT=\\\"\\$\\(rel2abs ..\\/tools\\/OpenBLAS\\/install\\)\\\"")
+                     (string-append "OPENBLASROOT=\"" #$openblas "\""))
+                    (("-L\\$OPENBLASLIBDIR -l:libopenblas.a -l:libblas.a -l:liblapack.a -l:libf2c.a")
+                     (string-append "-L$OPENBLASLIBDIR -lopenblas "
+                                    "-L" #$lapack "/lib -lblas -llapack")))
+                  (mkdir-p out) ; must exist
+                  (setenv "CONFIG_SHELL" (which "bash"))
+                  (setenv "OPENFST_VER" #$(package-version openfst))
+                  (invoke "./configure"
+                          "--use-cuda=no"
+                          "--mathlib=OPENBLAS_CLAPACK"
+                          "--shared"
+                          (string-append "--fst-root=" #$openfst)))))
+            (add-after 'configure 'optimize-build
+                       (lambda _ (substitute* "kaldi.mk" ((" -O1") " -O3"))))
+            (replace 'install
+              (lambda* (#:key outputs #:allow-other-keys)
+                (let* ((out (assoc-ref outputs "out"))
+                       (inc (string-append out "/include"))
+                       (lib (string-append out "/lib")))
+                  (mkdir-p lib)
+                  ;; The build phase installed symlinks to the actual
+                  ;; libraries.  Install the actual targets.
+                  (for-each (lambda (file)
+                              (let ((target (readlink file)))
+                                (delete-file file)
+                                (install-file target lib)))
+                            (find-files lib "\\.so"))
+                  ;; Install headers
+                  (for-each (lambda (file)
+                              (let ((target-dir (string-append inc "/" (dirname file))))
+                                (install-file file target-dir)))
+                            (find-files "." "\\.h")))))))))))
+
 (define-public gst-kaldi-nnet2-online
   (let ((commit "cb227ef43b66a9835c14eb0ad39e08ee03c210ad")
         (revision "2"))