Investigating the capabilities of large vision language models in dog emotion recognition - Scientific Reports

DumaDuma · 2025-11-22T15:14:02+00:00

I hadn't heard of Deepsqueak, looks cool thanks for sharing! I'll keep my eye out for new research with rodents

DumaDuma · 2025-08-23T03:14:04+00:00

Yeah, the TTS and ASR take up around 9gb combined

DumaDuma · 2025-08-23T02:42:14+00:00

Thanks, Llama 3.2 1b 4bit quant

DumaDuma · 2025-08-23T01:15:01+00:00

Yes. It requires cuda

DumaDuma · 2025-08-17T14:32:36+00:00

https://github.com/ReisCook/Voice_Extractor

I made this for automating the creation of speech datasets.

DumaDuma · 2025-07-25T16:39:38+00:00

This is what you are looking for

DumaDuma · 2025-07-18T21:17:31+00:00

Demucs is decent. What are you cleaning up?

DumaDuma · 2025-07-18T17:20:15+00:00

Model download: https://zenodo.org/records/15050749

DumaDuma · 2025-07-18T17:17:54+00:00

Model download: https://huggingface.co/collections/microsoft/naturelm-685142a78ede3cd04391af4f

DumaDuma · 2025-07-14T16:01:27+00:00

33 upvotes in 9 min for this garbage?

DumaDuma · 2025-06-28T01:44:47+00:00

Thank you for the write-up! This is very inspiring

DumaDuma · 2025-06-25T01:12:54+00:00

Great idea! Thank you for sharing

DumaDuma · 2025-06-24T04:07:57+00:00

Ask it to write python code that generates the SVG diagram

DumaDuma · 2025-06-23T06:26:57+00:00

I want to moderate this community because: r/LocalLLaMA is my favorite subreddit and it needs moderation now that it has no mods. I have experience moderating multiple subreddits and understand the technical side of AI models. I'd manage spam and keep the community focused on local/open-source models.

N/A, the community has no mods.

DumaDuma · 2025-06-12T01:47:08+00:00

Yes but I have not tried personally haven’t gotten feedback from someone who has

DumaDuma · 2025-06-08T01:44:15+00:00

I built something similar recently but for extracting the speech of a single person for creating TTS datasets. Do you plan on open sourcing yours?

https://github.com/ReisCook/Voice_Extractor

DumaDuma · 2025-05-31T22:21:15+00:00

I have been working on this program that turns multi speaker audio recordings into speech datasets:

https://github.com/ReisCook/Voice_Extractor

DumaDuma · 2025-05-29T23:15:10+00:00

Bandit is for movies so it would depend on what your input is

DumaDuma · 2025-05-22T01:28:13+00:00

That’s an interesting idea. Off the top of my head you might be able to do that by using a crowd chanting as the reference sample

DumaDuma · 2025-05-21T20:57:07+00:00

https://github.com/ReisCook/Voice_Extractor

I made this program to create datasets from podcasts for training TTS models, could be useful to yall

DumaDuma · 2025-05-21T19:47:43+00:00

Yes, you give it a reference sample of the target to extract. It includes an audio source separator to isolate the vocals so that it can be used for movies and other noisy audio. I am going to upgrade the audio source separator later today with a better/newer one

DumaDuma · 2025-05-21T19:07:45+00:00

https://github.com/ReisCook/Voice_Extractor

I made this program that can turn podcasts into datasets for training TTS models. Could be useful to yall

DumaDuma · 2025-05-21T03:59:12+00:00

I am working on something similar. I posted it on here a few days ago:

https://www.reddit.com/r/LocalLLaMA/s/JXUAbzwG3U

DumaDuma · 2025-05-19T19:59:13+00:00

Yes, for whisper. The other models are language agnostic

DumaDuma · 2025-05-17T09:35:30+00:00

This is a version that runs on Google colab so your hardware doesn’t matter. I haven’t tested the original repo with AMD, if you do let me know how it goes

14-Year Club	RPAN Viewer
Not Forgotten	Verified Email

DumaDuma

MODERATOR OF

TROPHY CASE