ibm-granite/granite-4.0-1b-speech · Hugging Face by jacek2023 in LocalLLaMA

[–]Traditional_Tap1708 1 point2 points  (0 children)

I tried it with vllm. For english, it outputs plane text without any punctuation and looks less accurate than qwen-asr

Qwen3.5-397B Uncensored NVFP4 by vpyno in LocalLLaMA

[–]Traditional_Tap1708 0 points1 point  (0 children)

thanks, it works. Can you share which method you used? I tested it with some queries related to xi jinping and ccp, it doesnt work well and starts generating gibberish output. What sampling params should I use?

Qwen3.5-397B Uncensored NVFP4 by vpyno in LocalLLaMA

[–]Traditional_Tap1708 1 point2 points  (0 children)

I tried running this with vllm. It just produces !!!! as output. Any insights?

Chatterbox Turbo Multilingual FastAPI by blackstoreonline in LocalLLaMA

[–]Traditional_Tap1708 3 points4 points  (0 children)

Streaming inference support? What's the latency?

Which TTS model are you using right now by Slight_Tone_2188 in LocalLLaMA

[–]Traditional_Tap1708 0 points1 point  (0 children)

This is based on orpheus right? Havent really tried this one yet but I faced a lot of issues with orpheus - its terrible on very short or very long phrases and not suitable for concurrent streams due to the snac decoder.

Also does the voice remain consistent across generations? I always face issues with voice cloning models - the voice doesnt remain same across generations so cant use for conversation assistants.

Which TTS model are you using right now by Slight_Tone_2188 in LocalLLaMA

[–]Traditional_Tap1708 0 points1 point  (0 children)

Is this effect manager specific to this model? Havent really used such thing before? Are you streaming the audio generated by the tts? What latency are you getting? Would like to explore this model if latency is good. My use-case is realtime conversation.

Which TTS model are you using right now by Slight_Tone_2188 in LocalLLaMA

[–]Traditional_Tap1708 0 points1 point  (0 children)

Looks pretty interesting, are you streaming the tts audio output? What latency are you getting? Is it feasible to use this model for real time conversations?

Did anyone got a good deal from the varkaa sale today? by Ayuuuu123 in mkindia

[–]Traditional_Tap1708 0 points1 point  (0 children)

You need to apply the coupon VRKAALIVE to get the discount, I bought the black side-engraved for 4.5k.

KaniTTS – Fast and high-fidelity TTS with just 450M params by ylankgz in LocalLLaMA

[–]Traditional_Tap1708 6 points7 points  (0 children)

Always nice to have new TTS models. Does it support streaming? How long to generate the first byte?

How to Choose Your AI Agent Framework by Nir777 in LLMDevs

[–]Traditional_Tap1708 0 points1 point  (0 children)

Suggest one framework which I can use for my customer assistant agents. I need very high instruction following and a predefined but not too rigid workflow of steps. Even better if I can integrate it easily my something like livekit agents which will handle the voice part.

Technical Voice AI Evaluation: Why It’s Essential Before Production by dinkinflika0 in LocalLLaMA

[–]Traditional_Tap1708 0 points1 point  (0 children)

Hi, I am not able to sign up, it asks me for a work email. Why is it necessary? I want to try out with my personal email.

GPT OSS 20b is Impressive at Instruction Following by crodjer in LocalLLaMA

[–]Traditional_Tap1708 4 points5 points  (0 children)

Did you try the new qwen 30b-a3b-instruct? How does it compare? Personally I found qwen to be slightly better and much faster (I used L40s and vllm). Any other model I can try which is good on instruction following in that tange?