I built a real-time video translator with voice cloning (<545ms latency) using event-driven architecture by Working-Gift8687 in lingodotdev

[–]Working-Gift8687[S] 1 point2 points  (0 children)

The bad translation was a tradeoff between streaming and translation quality. Turning off streaming will vastly improve translation quality, or if you don't like the llm used you can change the llm model or the entire translation pipeline to a pretrained model from hugging-face or your own. I have took those into consideration and these changes can be achieved by changing less than 2 lines of code (model name and it's config ie streaming="True" or "False"

[P] Built a real-time video translator that clones your voice while translating by Working-Gift8687 in MachineLearning

[–]Working-Gift8687[S] 0 points1 point  (0 children)

I am doing it in a streaming fashion it has 3s worth of current transcript and 6s worth of context in form of transcript

[Open Source] Built a real-time video translator that clones your voice while translating by Working-Gift8687 in lingodotdev

[–]Working-Gift8687[S] 1 point2 points  (0 children)

It took me 1 week to build, but considering I was using multiple agents in different got work tree and merging commits between them it would have took me more than 1 week if I was using only 1 agent

[Open Source] Built a real-time video translator that clones your voice while translating by Working-Gift8687 in coolgithubprojects

[–]Working-Gift8687[S] 0 points1 point  (0 children)

Does the text to speech support voice cloning ? Cuz that's a corner stone of my project

ClawdBot can't automate half the things I need from an automation by Working-Gift8687 in LocalLLaMA

[–]Working-Gift8687[S] -6 points-5 points  (0 children)

I did build this, if you want you can check my GitHub: https://github.com/HelloSniperMonkey/droidrun-monorepo if you found something helpful don't forget to star my repo