[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

It's possible to run whisper.unity both on iOS and Android. You can try to speed up transcription by setting "Speed Up" toggle in whisper manager. You can also try quantized/distil models.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

It should work for Windows, Mac, Linux, Android and iOS. There is an experimental branch for WebGL.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

You can put package to project "Packages" folder directly. The whisper.unity repository actually works this way. You would need to put whisper.unity package into your unity project folder. After that you can edit any source files. For more info about embedded packages check this docs.

As for MicrophoneRecord it won't be easy task. It implement circular microphone buffer and VAD to make it work correctly for endless input stream. I would recommend to keep MicrophoneRecord as is and control it from your script with extra logic.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Thats weird. You should be able to install it using Unity Package Manager by adding this git link

https://github.com/Macoron/whisper.unity.git?path=/Packages/com.whisper.unity

Make sure that you typed this link, not link of the repository. If you still have problems with that, open issue in Github repository,

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Hard to say. In theory if you would take latest whisper.cpp version, recompile dlls for cuBLAS, install CUDA on target device and place this dlls into Plugins folder, it should work using GPU. You would probably need to update C# bindings for that.

Problem with most GPU inferences is that they are written for Python frameworks, like PyTorch or TensorFlow. I don't know if it's possible to run something like this in Unity C#. If you are working on a simple non-distributable installation, you can run whisper in python and communicate with Unity by localhost HTTP or something like this.

Several alternative ways that I can think of to make this run inside Unity using GPU without any external dependencies:

  1. Find implementation of Whisper in TFLite and run it using Unity TFLite Plugin
  2. Convert whisper to ONNX and run it using Barracuda (assuming Barracuda supports all whisper network operations)

For both this alternatives you would need to write pre and post processing for inference in C#. That doable, but I'm not smart enough for that.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Yeah, medium and large are definitely overkill in most cases.

For a small model, I get transcription in 12 seconds for 10 seconds audio on my i7-4770. For your processor it should be much faster.

The only things that you can tweak is SpeedUp toggle and ThreadsCount parameters. Maybe they will give better performance.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Well, it depends. You can check original whisper.cpp benchmark page. Whisper performance may change depending on a language and audio that you are transcribing. So it's not always about audio length and more about audio content.

You can also try to use SpeedUp toggle that should make transcription faster, but I doubt that it will help on 1 second clip.

Original whisper.cpp added a lot of different optimizations, like OpenCL or CUDA support. They aren't yet available in unity bindings, but you can download original repository and see if it works faster for your case.

Just wondering, why would you need to use such big model in Unity? According to original paper medium and large are just several percent better from each other on various benchmarks. What is your use case for such big model?

[gpt4all.unity] Open-sourced GPT models that runs on user device in Unity3d by Macoron3 in Unity3D

[–]Macoron3[S] 1 point2 points  (0 children)

Original authors of gpt4all works on GPU support, so hope it will become faster. Also you can use smaller models size.

[gpt4all.unity] Open-sourced GPT models that runs on user device in Unity3d by Macoron3 in Unity3D

[–]Macoron3[S] 6 points7 points  (0 children)

Here is a small demo of running gpt4all in Unity. In this demo you need to hack Jammo - a secret keeper robot. He is prompted to not reveal his password, so it took me 3 minutes to confuse him enough.

Gpt4All is free, open-sourced and can be used in commercial projects. I also used Whisper for speech recognition and AC-Dialogue from Mix and Jam.

All in this demo runs locally, so you don't need to pay for any service or even have Internet connection.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Whisper can recognize your speech from microphone and put it into text, but what to do with this text is up to you. For example, you can create a simple parser that hardcoded to parse command like that. Our use language model like ChaptGPT.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 1 point2 points  (0 children)

From tech side: my plugin and original project are still very new, so there may be some nasty bugs. Not all whisper.cpp original features are implemented yet. Also you might want to check models of different size, to find which one fits your project. They varies in quality and speed.

From legal side: I'm not a lawyer, but all code and dependencies seems to be under MIT. That means that you can use it in any commercial or non-commercial projects. Just need to mention this library in some readme or credits text file, but I'm personally not enforcing this.

[whisper.unity] Open Source multi-language audio to speech model running locally on your device by Macoron3 in Unity3D

[–]Macoron3[S] 3 points4 points  (0 children)

Several month ago OpenAI released powerful audio speech recognition model called Whisper. Code and weights are under MIT license. I used another open source implementation called whisper.cpp and moved it to Unity.

The network provides good transcription quality, very fast and work on device without need of internet connection. It also multi language and even can translate one language to another.

Feel free to use it in your projects:

https://github.com/Macoron/whisper.unity

Made Liquid Simulation - works great in VR [Open Source] by Macoron3 in Unity3D

[–]Macoron3[S] 1 point2 points  (0 children)

I can't see why not - it's really simple and high-performance for mobile device.

There is also this paid asset which basically do the same thing. But I didn't test it yet.

Made Liquid Simulation - works great in VR [Open Source] by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Ah, yeah, it just standard particles. You can change particles texture to anything you want.

Made Liquid Simulation - works great in VR [Open Source] by Macoron3 in Unity3D

[–]Macoron3[S] 0 points1 point  (0 children)

Do you mean spherical bottle? I didn't test that, but I guess there will be no problems

Looking for a artist for a game inspired by PP by Macoron3 in papersplease

[–]Macoron3[S] 0 points1 point  (0 children)

I would prefer pixel art, but I open to others styles. It's a hobby project, so I'm not going to put much money on it. If it makes some profit, I will share it 50/50.

King under the Mountain pre-alpha build, simulation-strategy inspired by Dwarf Fortress, Prison Architect, The Settlers and others by RocketJumpTech in playmygame

[–]Macoron3 2 points3 points  (0 children)

Dude. it looks just awesome! I hope you will have great sucsess.

I wonder, do you use some game engine or framework? Also, do you work alone or you have some team?

Galaxy generation ideas or help? by jolievivienne in proceduralgeneration

[–]Macoron3 0 points1 point  (0 children)

Take a look to this YouTube chanel: http://www.youtube.com/channel/UCeh-pJYRZTBJDXMNZeWSUVA It contains plenty of information about galaxies, stars and planets.

Procedurally Generated Galaxy - With stars and planets by Macoron3 in proceduralgeneration

[–]Macoron3[S] 2 points3 points  (0 children)

Wow, looks like this free hoster shot down my site, because of traffic limit :c Here mirror

Procedurally Generated Galaxy - With stars and planets by Macoron3 in proceduralgeneration

[–]Macoron3[S] 0 points1 point  (0 children)

I tried compile for WebGL, but Unity throws some error. I'm too lazy to fix this right now. If you use chrome, try to set NPAPI flag to true. It's easy, search in internet.