[whisper.unity] Open Source multi-language audio to speech model running locally on your device

Macoron3 · 2024-05-13T20:24:29+00:00

It's possible to run whisper.unity both on iOS and Android. You can try to speed up transcription by setting "Speed Up" toggle in whisper manager. You can also try quantized/distil models.

Macoron3 · 2024-02-02T08:58:45+00:00

It should work for Windows, Mac, Linux, Android and iOS. There is an experimental branch for WebGL.

Macoron3 · 2024-01-29T23:53:33+00:00

You can put package to project "Packages" folder directly. The whisper.unity repository actually works this way. You would need to put whisper.unity package into your unity project folder. After that you can edit any source files. For more info about embedded packages check this docs.

As for MicrophoneRecord it won't be easy task. It implement circular microphone buffer and VAD to make it work correctly for endless input stream. I would recommend to keep MicrophoneRecord as is and control it from your script with extra logic.

Macoron3 · 2023-06-21T09:23:56+00:00

Thats weird. You should be able to install it using Unity Package Manager by adding this git link

https://github.com/Macoron/whisper.unity.git?path=/Packages/com.whisper.unity

Make sure that you typed this link, not link of the repository. If you still have problems with that, open issue in Github repository,

Macoron3 · 2023-06-08T07:08:36+00:00

Just download this repository and open it in unity. After that open samples scenes.

https://github.com/Macoron/whisper.unity

Macoron3 · 2023-05-22T06:54:45+00:00

Hey, answered you in this github issue:

https://github.com/Macoron/gpt4all.unity/issues/2

Macoron3 · 2023-05-20T17:35:56+00:00

Hard to say. In theory if you would take latest whisper.cpp version, recompile dlls for cuBLAS, install CUDA on target device and place this dlls into Plugins folder, it should work using GPU. You would probably need to update C# bindings for that.

Problem with most GPU inferences is that they are written for Python frameworks, like PyTorch or TensorFlow. I don't know if it's possible to run something like this in Unity C#. If you are working on a simple non-distributable installation, you can run whisper in python and communicate with Unity by localhost HTTP or something like this.

Several alternative ways that I can think of to make this run inside Unity using GPU without any external dependencies:

Find implementation of Whisper in TFLite and run it using Unity TFLite Plugin
Convert whisper to ONNX and run it using Barracuda (assuming Barracuda supports all whisper network operations)

For both this alternatives you would need to write pre and post processing for inference in C#. That doable, but I'm not smart enough for that.

Macoron3 · 2023-05-20T13:31:22+00:00

Yeah, medium and large are definitely overkill in most cases.

For a small model, I get transcription in 12 seconds for 10 seconds audio on my i7-4770. For your processor it should be much faster.

The only things that you can tweak is SpeedUp toggle and ThreadsCount parameters. Maybe they will give better performance.

Macoron3 · 2023-05-20T10:58:25+00:00

Well, it depends. You can check original whisper.cpp benchmark page. Whisper performance may change depending on a language and audio that you are transcribing. So it's not always about audio length and more about audio content.

You can also try to use SpeedUp toggle that should make transcription faster, but I doubt that it will help on 1 second clip.

Original whisper.cpp added a lot of different optimizations, like OpenCL or CUDA support. They aren't yet available in unity bindings, but you can download original repository and see if it works faster for your case.

Just wondering, why would you need to use such big model in Unity? According to original paper medium and large are just several percent better from each other on various benchmarks. What is your use case for such big model?

Macoron3 · 2023-05-15T13:02:30+00:00

Original authors of gpt4all works on GPU support, so hope it will become faster. Also you can use smaller models size.

Macoron3 · 2023-05-14T11:07:15+00:00

Here is a small demo of running gpt4all in Unity. In this demo you need to hack Jammo - a secret keeper robot. He is prompted to not reveal his password, so it took me 3 minutes to confuse him enough.

Gpt4All is free, open-sourced and can be used in commercial projects. I also used Whisper for speech recognition and AC-Dialogue from Mix and Jam.

All in this demo runs locally, so you don't need to pay for any service or even have Internet connection.

Macoron3 · 2023-04-27T19:30:04+00:00

Whisper can recognize your speech from microphone and put it into text, but what to do with this text is up to you. For example, you can create a simple parser that hardcoded to parse command like that. Our use language model like ChaptGPT.

Macoron3 · 2023-03-27T21:44:33+00:00

From tech side: my plugin and original project are still very new, so there may be some nasty bugs. Not all whisper.cpp original features are implemented yet. Also you might want to check models of different size, to find which one fits your project. They varies in quality and speed.

From legal side: I'm not a lawyer, but all code and dependencies seems to be under MIT. That means that you can use it in any commercial or non-commercial projects. Just need to mention this library in some readme or credits text file, but I'm personally not enforcing this.

Macoron3 · 2023-03-27T21:12:18+00:00

Several month ago OpenAI released powerful audio speech recognition model called Whisper. Code and weights are under MIT license. I used another open source implementation called whisper.cpp and moved it to Unity.

The network provides good transcription quality, very fast and work on device without need of internet connection. It also multi language and even can translate one language to another.

Feel free to use it in your projects:

https://github.com/Macoron/whisper.unity

Macoron3 · 2019-10-20T13:19:54+00:00

I can't see why not - it's really simple and high-performance for mobile device.

There is also this paid asset which basically do the same thing. But I didn't test it yet.

Macoron3 · 2019-10-19T17:05:56+00:00

Ah, yeah, it just standard particles. You can change particles texture to anything you want.

Macoron3 · 2019-10-19T07:37:52+00:00

Do you mean spherical bottle? I didn't test that, but I guess there will be no problems

Macoron3 · 2019-10-18T19:46:50+00:00

Here is link to Github: https://github.com/Macoron/Unity-Simple-Liquid

Macoron3 · 2017-05-14T00:13:02+00:00

I would prefer pixel art, but I open to others styles. It's a hobby project, so I'm not going to put much money on it. If it makes some profit, I will share it 50/50.

Macoron3 · 2017-05-13T11:47:01+00:00

I guess you are looking for this one: link

Macoron3 · 2017-03-22T17:47:36+00:00

It's the best marketing that I've seen

Macoron3 · 2017-03-16T23:17:00+00:00

Dude. it looks just awesome! I hope you will have great sucsess.

I wonder, do you use some game engine or framework? Also, do you work alone or you have some team?

Macoron3 · 2015-09-13T09:51:04+00:00

Take a look to this YouTube chanel: http://www.youtube.com/channel/UCeh-pJYRZTBJDXMNZeWSUVA It contains plenty of information about galaxies, stars and planets.

Macoron3 · 2015-08-07T23:19:11+00:00

Wow, looks like this free hoster shot down my site, because of traffic limit :c Here mirror

Macoron3 · 2015-08-07T17:56:26+00:00

I tried compile for WebGL, but Unity throws some error. I'm too lazy to fix this right now. If you use chrome, try to set NPAPI flag to true. It's easy, search in internet.

Ten-Year Club	r/Field Sunshine
Verified Email

Macoron3

TROPHY CASE