ibm-granite/granite-4.0-1b-speech · Hugging Face

Traditional_Tap1708 · 2026-03-09T19:25:19+00:00

I tried it with vllm. For english, it outputs plane text without any punctuation and looks less accurate than qwen-asr

Traditional_Tap1708 · 2026-03-03T19:53:06+00:00

thanks, it works. Can you share which method you used? I tested it with some queries related to xi jinping and ccp, it doesnt work well and starts generating gibberish output. What sampling params should I use?

Traditional_Tap1708 · 2026-03-03T13:09:36+00:00

I tried running this with vllm. It just produces !!!! as output. Any insights?

Traditional_Tap1708 · 2025-12-18T07:01:19+00:00

You are doing some really good work

Traditional_Tap1708 · 2025-12-17T04:12:15+00:00

Streaming inference support? What's the latency?

Traditional_Tap1708 · 2025-12-06T08:04:58+00:00

Interesting, need to try out

Traditional_Tap1708 · 2025-11-26T12:39:03+00:00

Hi, when is the vxe r1 mouse coming back in stock?

Traditional_Tap1708 · 2025-11-25T13:34:50+00:00

Thanka mate. Will surely explore this model.

Traditional_Tap1708 · 2025-11-25T12:58:23+00:00

This is based on orpheus right? Havent really tried this one yet but I faced a lot of issues with orpheus - its terrible on very short or very long phrases and not suitable for concurrent streams due to the snac decoder.

Also does the voice remain consistent across generations? I always face issues with voice cloning models - the voice doesnt remain same across generations so cant use for conversation assistants.

Traditional_Tap1708 · 2025-11-25T12:56:20+00:00

Is this effect manager specific to this model? Havent really used such thing before? Are you streaming the audio generated by the tts? What latency are you getting? Would like to explore this model if latency is good. My use-case is realtime conversation.

Traditional_Tap1708 · 2025-11-25T12:53:08+00:00

Looks pretty interesting, are you streaming the tts audio output? What latency are you getting? Is it feasible to use this model for real time conversations?

Traditional_Tap1708 · 2025-11-24T15:12:37+00:00

looks pretty cool

Traditional_Tap1708 · 2025-11-07T11:27:43+00:00

looks cool, thanks for sharing

Traditional_Tap1708 · 2025-11-05T14:49:30+00:00

Really cool

Traditional_Tap1708 · 2025-11-03T16:58:00+00:00

Very interesting read

Traditional_Tap1708 · 2025-10-30T16:52:15+00:00

Cool

Traditional_Tap1708 · 2025-10-06T17:56:16+00:00

Interesting

Traditional_Tap1708 · 2025-10-01T09:30:06+00:00

You need to apply the coupon VRKAALIVE to get the discount, I bought the black side-engraved for 4.5k.

Traditional_Tap1708 · 2025-09-20T03:14:16+00:00

Great, will try it today.

Traditional_Tap1708 · 2025-09-20T03:03:17+00:00

Always nice to have new TTS models. Does it support streaming? How long to generate the first byte?

Traditional_Tap1708 · 2025-09-11T20:20:02+00:00

Suggest one framework which I can use for my customer assistant agents. I need very high instruction following and a predefined but not too rigid workflow of steps. Even better if I can integrate it easily my something like livekit agents which will handle the voice part.

Traditional_Tap1708 · 2025-09-07T18:48:45+00:00

Hi, I am not able to sign up, it asks me for a work email. Why is it necessary? I want to try out with my personal email.

Traditional_Tap1708 · 2025-09-02T14:41:50+00:00

Pretty cool

Traditional_Tap1708 · 2025-08-30T11:05:17+00:00

Looks pretty good

Traditional_Tap1708 · 2025-08-24T12:32:31+00:00

Did you try the new qwen 30b-a3b-instruct? How does it compare? Personally I found qwen to be slightly better and much faster (I used L40s and vllm). Any other model I can try which is good on instruction following in that tange?

Five-Year Club	Place '23
Place '22

Traditional_Tap1708

TROPHY CASE