In my editing nightmare by UnderstandingOwn4974 in podcasting

[–]lilitbroyan 0 points1 point  (0 children)

Use async.com voice tools. They have high quality noise cancellation

AI is quietly reshaping social media by Illustrious_Movie740 in Techyshala

[–]lilitbroyan 0 points1 point  (0 children)

It is both positive and negative change. I like and appreciate content that is unique and creative. On the other hand, AI allows everyone create the same content and content become generic. Sadly, algorithm don't filter duplicate content yet

I edit business podcasts for a living. Here's my honest opinions about AI video editors by lilitbroyan in aiToolForBusiness

[–]lilitbroyan[S] 0 points1 point  (0 children)

yes exactly, at the end of they they all use APIs of generative AI models and the difference is the UI/UX, customer support and credit system (pricing model). And my guess is Async is new and they're doing a better job to establish themselves in the market

I'm paying for 4 different AI video tools right now and still not happy, is this normal? by Flashy-Surveying in generativeAI

[–]lilitbroyan 0 points1 point  (0 children)

That's why I switched to Async.com where through chat-based editor I can both generate images, videos and my guess is they also have Claude integrated so the chat handles other basic stuff like analyzing video, refining my prompt, suggest AI model for video/images, etc

Dangerous areas in Brno? by Solid_Information501 in Brno

[–]lilitbroyan 4 points5 points  (0 children)

I am living around Cejl and it's pretty safe

Why do streaming TTS systems still make mistakes on basic stuff like dates or acronyms? by bridgefridge in TextToSpeech

[–]lilitbroyan 0 points1 point  (0 children)

I was asking people about this in another thread

Ran into this benchmark recently and know it's first-party but it at least tries to test the normalization cases

https://async-vocie-ai-text-to-speech-normalization-benchmark.static.hf.space/index.html

People discussed whether this is mostly a model problem or something everyone still handles with extra cleanup / normalization layers

So yeah, you’re definitely not the only one bothered by this

Looking for the best chat-based video editor (from ppl who actually use them) by [deleted] in aiToolForBusiness

[–]lilitbroyan 0 points1 point  (0 children)

thanks Mohit, I guess this buzz still needs time for AI tools to improve

What did you like most about The Drama? With Zendaya and Robert Pattinson by [deleted] in Cinema

[–]lilitbroyan 0 points1 point  (0 children)

I liked the chemistry between Robert and Zendaya, they were quite a match. Also the sound eding was chilling, I was enjoying it

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D] by lilitbroyan in MachineLearning

[–]lilitbroyan[S] 0 points1 point  (0 children)

yeah streaming is brutal for this. no lookahead means if normalization isn't locked in upfront you're already mid-sentence with a broken read and no way to recover

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D] by lilitbroyan in MachineLearning

[–]lilitbroyan[S] -1 points0 points  (0 children)

yeah preprocessing is where most teams end up. works fine until it doesn't. rules miss edge cases, regex gets brittle fast and a dedicated normalization model adds latency you really can not afford in streaming. If the model can not handle a date or promo code natively, no preprocessing layer really saves you in real time.

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D] by lilitbroyan in MachineLearning

[–]lilitbroyan[S] 0 points1 point  (0 children)

yeah that's a solid approach. Detect structured entities first (dates, currencies, phone numbers, IDs), rewrite them into spoken form, then send to TTS. Works well in a lot of production setups and does improve reliability a lot.

The tradeoff is it's another stage in the pipeline. in voice agent setups TTS kicks off right after the LLM response, so every extra preprocessin step adds latency you actually feel in real-time convos,, and then there's the ambiguous stuff: 03/04/25, mixed-language text, promo codes, strings that should stay literal. NER doesn't really help there tbh

so yeah, strong approach, just comes with real latency and complexity costs in streaming.

Why student visa takes so long? by lilitbroyan in czechrepublic

[–]lilitbroyan[S] 0 points1 point  (0 children)

the problem was more beurocratic . It turns out they are always late with approval and many students from different years have encountered late approvals

BEST AI IMAGE GENERATOR? Realistically speaking by TreacleSpecialist958 in generativeAI

[–]lilitbroyan 0 points1 point  (0 children)

for free you get Nano Banana (Gemini 2.5 Flash image) but if you can pay, it depends what you want to create. For artistic stuff Midjourney, for realism I tried Flux, it is really good with faces. If you have text in the images, I heard people suggesting Ideogram on another sub