diff mbox series

[bug#72027] gnu: Add whisper-cpp.

Message ID 0c08fdb11c2641ece7d267340877c58f144df110.1720587166.git.atai@atai.org
State New
Headers show
Series [bug#72027] gnu: Add whisper-cpp. | expand

Commit Message

Andy Tai July 10, 2024, 4:53 a.m. UTC
* gnu/packages/machine-learning.scm (whisper-cpp): New variable.

Change-Id: If191f434a3f66b16afdc702069eaed1ca22e2427
---
 gnu/packages/machine-learning.scm | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)


base-commit: e13f7d48e5b989f5dbd27c19ac81989ec6b3ec6e

Comments

Juliana Sims Aug. 23, 2024, 2:15 p.m. UTC | #1
Hi Andy,

Thanks for submitting this patch!  I'm suffering from RSI pain at the 
moment and was tasked, as I ease back into working, with packaging this 
and the associated Emacs mode to help avoid RSI issues in the future.  
Seeing there is already a patch up, I am quite excited and thankful I 
don't have to write one myself!

I had planned to review this patch, but I notice there are several 
TODOs.  So instead, I will ask: what is the status of this patch?  Is 
it in a state where it is ready for review, or is there more work to be 
done first?  If it is the latter, what (if anything) can I do to help 
out?

Thanks,
Juli
Juliana Sims Aug. 23, 2024, 4:05 p.m. UTC | #2
Well, I know I just asked if this was ready for review, but I figured 
I'd go ahead and do some "low-hanging fruit review" while I wait for 
you to get a chance to respond ;)

Fortunately, most of what I've noticed so far is minor and stylistic.  
There are some bigger, discussion-opening comments near the end.

Firstly, TODO comments are conventionally spelled as one word.  This is 
not grammatically correct, nor is it a requirement, but it is the 
convention and folks searching files for such comments will likely 
search for "TODO" rather than "TO DO".

Secondly, the synopsis should probably either drop the "Port of" or 
change "in" to "to".  I don't think this is a major issue; it just 
flows better like that imo.  I wouldn't oppose merging over this so if 
you prefer it the way it is, that's fine.

Next, the description wants expansion.  It should be three to five 
sentences.  The sentence there now is a fine start, though it needs 
ending punctuation.  Perhaps a sentence explaining how this package can 
be used ("whisper-cpp can be integrated with other programs to provide 
speech-to-text support" or something) and one explaining what makes it 
unique ("because whisper-cpp uses a leading speech recognition model, 
it is able to perform speech recognition rapidly with relatively few 
resources").  You could perhaps mention that it can run on CPU as well 
as GPU, that it offers some integrations with certain hardware, or so 
on.  Whatever you think is important or interesting; these are just 
some ideas :)

Now we enter the discussion-launching comments.

I notice that ggml is vendored.  This one is tricky.  Firstly, there is 
no standalone ggml package (yet; I saw your patch 65284).  Usually, 
Guix only asks that dependencies be unvendored if there is already a 
standalone package for the dependency, so unless and until that is 
merged there is no real issue.  Secondly, though, and this touches on 
the ggml patches, it seems that there are no formal releases of ggml 
yet, and development is happening in the repositories for whisper-cpp 
and llama-cpp as well as its own repository.  It seems these three 
versions of ggml are all slightly out of sync with each other.  It 
would be nice if upstream used git submodules to ensure their work was 
synchronized.  Alas, we as Guix can't do anything about that, and we 
must ensure the packages we offer work correctly.  The inconsistencies 
between these versions of ggml make me think packaging it separately 
would risk breakage.  With that in mind, might it be best to drop the 
standalone ggml patchset and just let llama-cpp and whisper-cpp vendor 
their versions?  While suboptimal because it results in building "the 
same package" multiple times, I would argue that the divergence in the 
code means they are not, in fact, the same package.

Finally, is this package complete?  Looking at the store directory for 
the package, I see headers and the like but no actual models.  Is this 
sufficient for using the inference?  Are client libraries or programs 
supposed to install models themselves?  Or can this package be used to 
generate models as described in the project's README?  If not, should 
it be able to?  I am admittedly fairly ignorant about the machine 
learning ecosystem, so feel free to explain as much as you think may be 
necessary.  My goal in these questions is ensuring users get what they 
expect from this package.

Relatedly, if this package is complete but requires further setup, I 
would strongly support explaining that in the package description.  As 
a user, I've encountered a few packages that require more setup and 
don't mention that they do, and I'm then frustrated and confused when I 
learn this from trying to use the package and then have no support from 
Guix in trying to make things work properly.  (Another step beyond this 
may be offering a system service which performs configuration, but that 
can be a future, separate patch.)

To circle back to what I mentioned in my first email, I would like to 
package the whisper.el Emacs mode[1].  Currently, whisper.el plans to 
install and compile whisper.cpp and its models itself; I think we as 
Guix should make this unnecessary for an imagined future emacs-whisper 
package.

Looking forward to hearing from you,
Juli

[1] https://github.com/natrys/whisper.el
Andy Tai Aug. 28, 2024, 2:53 a.m. UTC | #3
I will take a look of your review. Thanks


On Fri, Aug 23, 2024 at 7:16 AM Juliana Sims <juli@incana.org> wrote:
>
> Hi Andy,
>
> Thanks for submitting this patch!  I'm suffering from RSI pain at the
> moment and was tasked, as I ease back into working, with packaging this
> and the associated Emacs mode to help avoid RSI issues in the future.
> Seeing there is already a patch up, I am quite excited and thankful I
> don't have to write one myself!
>
> I had planned to review this patch, but I notice there are several
> TODOs.  So instead, I will ask: what is the status of this patch?  Is
> it in a state where it is ready for review, or is there more work to be
> done first?  If it is the latter, what (if anything) can I do to help
> out?
>
> Thanks,
> Juli
>
>
Andy Tai Aug. 29, 2024, 4:11 a.m. UTC | #4
Juliana, I have made some minor adjustments per your review, in a new
patch.   I think the patch is usable but not complete; integration
with ffmpeg is missing. So feel free to take over the patch and to
update it with anything you see missing such as information on ML
model.   Generally I do not package ML models together with the
software as a ML model can be tricky being free or not, so I avoid
that.

>
> On Fri, Aug 23, 2024 at 7:16 AM Juliana Sims <juli@incana.org> wrote:
> >
> > Hi Andy,
> >
> > Thanks for submitting this patch!  I'm suffering from RSI pain at the
> > moment and was tasked, as I ease back into working, with packaging this
> > and the associated Emacs mode to help avoid RSI issues in the future.
> > Seeing there is already a patch up, I am quite excited and thankful I
> > don't have to write one myself!
> >
> > I had planned to review this patch, but I notice there are several
> > TODOs.  So instead, I will ask: what is the status of this patch?  Is
> > it in a state where it is ready for review, or is there more work to be
> > done first?  If it is the latter, what (if anything) can I do to help
> > out?
> >
> > Thanks,
> > Juli
> >
> >
diff mbox series

Patch

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 8ca9a6376e..61c25b49da 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5982,6 +5982,29 @@  (define-public oneapi-dnnl-for-r-torch
        (sha256
         (base32 "1zyw5rd8x346bb7gac9a7x3saviw3zvp6aqz2z1l9sv163vmjfz6"))))))
 
+(define-public whisper-cpp
+    (package
+      (name "whisper-cpp")
+      (version "1.6.2")
+      (source
+       (origin
+         (method git-fetch)
+         (uri (git-reference
+               (url "https://github.com/ggerganov/whisper.cpp")
+               (commit (string-append "v" version))))
+         (file-name (git-file-name name version))
+         (sha256
+          (base32 "01q4j602wkvsf9vw0nsazzgvjppf4fhpy90vqnm9affynyxhi0c4"))))
+      (build-system cmake-build-system)
+      (properties '((tunable? . #true))) ;use AVX512, FMA, etc. when available
+      (home-page "https://github.com/ggerganov/whisper.cpp")
+      (synopsis "Port of OpenAI's Whisper model in C/C++ ")
+      (description "This package is a high-performance inference of OpenAI's
+Whisper automatic speech recognition (ASR) model, implemented in plain C/C++
+without dependencies")
+      (license license:expat)))
+
+
 (define-public python-gguf
   (package
     (name "python-gguf")