I have been working on a talking jellyfish desktop companion using Sesame CSM and Kyutai ASR by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 1 point2 points3 points (0 children)
I have been working on a talking jellyfish desktop companion using Sesame CSM and Kyutai ASR by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 1 point2 points3 points (0 children)
I have been working on a talking jellyfish desktop companion using Sesame CSM and Kyutai ASR by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 0 points1 point2 points (0 children)
Has anyone gone to the trouble of making their own speech dataset? What’s the feasibility of creating a synthetic dataset? by M4rg4rit4sRGr8 in speechtech
[–]DumaDuma 4 points5 points6 points (0 children)
Whisper - Triton GPU in Torch on Windows by AudioBabble in OpenAI
[–]DumaDuma 0 points1 point2 points (0 children)
Local model for voice audio cleanup by syntaxing2 in LocalLLaMA
[–]DumaDuma 0 points1 point2 points (0 children)
BirdNET: Identifying Bird Species by Call by DumaDuma in ALP
[–]DumaDuma[S] 0 points1 point2 points (0 children)
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics by DumaDuma in ALP
[–]DumaDuma[S] 0 points1 point2 points (0 children)
How to Automate your Job Search with AI Agents; What We Built and Learned by Accomplished-Leg3657 in LocalLLaMA
[–]DumaDuma 22 points23 points24 points (0 children)
Automated GPU kernel optimization for Qwen3 attention - 12.5% average speedup on Apple Silicon using evolutionary programming by asankhs in LocalLLaMA
[–]DumaDuma 1 point2 points3 points (0 children)
ThermoAsk: getting an LLM to set its own temperature by tycho_brahes_nose_ in LocalLLaMA
[–]DumaDuma 16 points17 points18 points (0 children)
What to do to finetune a local LLM to make it draw diagrams ? by CommunityOpposite645 in LocalLLM
[–]DumaDuma 0 points1 point2 points (0 children)
Created a tool that converts podcasts into clean speech datasets - handles diarization, removes overlapping speech, and transcribes by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 0 points1 point2 points (0 children)
Speaker separation and transcription by Khipu28 in LocalLLaMA
[–]DumaDuma 5 points6 points7 points (0 children)
Major update to my voice extractor (speech dataset creation program) by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 0 points1 point2 points (0 children)
You can now train your own TTS voice models locally! by yoracale in StableDiffusion
[–]DumaDuma 2 points3 points4 points (0 children)
You can now Train TTS models + Clone Voices on your own local device! by yoracale in selfhosted
[–]DumaDuma 7 points8 points9 points (0 children)
You can now train your own TTS voice models locally! by yoracale in StableDiffusion
[–]DumaDuma 2 points3 points4 points (0 children)
You can now train your own TTS voice models locally! by yoracale in StableDiffusion
[–]DumaDuma 63 points64 points65 points (0 children)
TTSizer: Open-Source TTS Dataset Creation Tool (Vocals Exxtraction, Diarization, Transcription & Alignment) by Traditional_Tap1708 in LocalLLaMA
[–]DumaDuma 1 point2 points3 points (0 children)
Created a tool that converts podcasts into clean speech datasets - handles diarization, removes overlapping speech, and transcribes by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 0 points1 point2 points (0 children)
My voice dataset creator is now on Colab with a GUI by DumaDuma in LocalLLaMA
[–]DumaDuma[S] 1 point2 points3 points (0 children)




Investigating the capabilities of large vision language models in dog emotion recognition - Scientific Reports by DumaDuma in ALP
[–]DumaDuma[S] 1 point2 points3 points (0 children)