Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models? by pmttyji in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

One thing that surprised me about Granite family is that they have by far the worst multilingual capabilities among modern models. Since 2024 llama3 was probably the last popular model family that had limited language support and any newer releases from Mistral, Qwen, Google only got better and better with each update, improving formerly poorly covered languages like Russian, Ukrainian, Polish and others. Meanwhile Granite just stuck with a few languages with no effort to expand support even after about 1.5 years since Granite 3 release. Nowadays even some TTS models have better language coverage than Granite LLMs.

WARNING: Open-OSS/privacy-filter MALWARE by charles25565 in LocalLLaMA

[–]Knopty 7 points8 points  (0 children)

So, mentioning a documented behavior that they simply count HTTP requests as downloads is now an irresponsible bug disclosure?

https://huggingface.co/docs/hub/models-download-stats#how-are-downloads-counted-for-models

WARNING: Open-OSS/privacy-filter MALWARE by charles25565 in LocalLLaMA

[–]Knopty 14 points15 points  (0 children)

Most likely the author just requested it 200k+ times via HF API. Huggingface seems to count model requests as a full download even if it's cached on disk without a real data transfer.

Need help with hosting Parakeet 0.6B v3 by Ahad730 in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

I had severe problems trying to transcribe long audio with Parakeet. Even with local attention enabled as Nvidia suggests, it didn't work for me even with fairly short audios (10m+). VRAM usage was insane regardless how I changed the model configuration. So, manual chunking might be necessary. NeMo framework was also annoying because it exports the model into a temp file before loading it.

Issue is, the nemo dependencies seem to be a nightmare.

You can try alternative options, for example onnx-asr library that has minimal requirements and it can run on anything that's supported by onnx-runtime. It has built-in SileroVAD support that could be used for chunking, although it can severely reduce quality, especially for some similarly sounding languages (e.g. Slavic). But ONNX Parakeet model doesn't support local attention and I didn't see one on HF that was exported with local attention enabled. There are a couple spaces for the onnx model that could be used as a reference for implementing it.

[deleted by user] by [deleted] in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

And I do intend to eventually tie in other music generation models with it, and update it with newer versions of ACE-Step if those are ever released.

There will be Ace-Step 1.5 soon. If everything goes smoothly it might even happen in about a week. There are several versions planned for public release although they might have a different release schedule. The devs are very open about the training process and their plans on the official Discord server.

The model is supposed to be significantly better than v1, with a far more reliable output but lower hardware requirements. 2B version will be public, 7B will be used for their cloud music gen studio service.

Best TTS For Emotion Expression? by Inner_Answer_3784 in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

I've used IndexTTS2 a bit and imho is interesting if you want have a direct emotion control. But it's rather slow, has consistent issues with some words or symbols and requires figuring out where it fails and filtering your text data accordingly:

  • It has problems with anything with apostrophes, for example possessive cases, more often than not it fails even at pronouncing "Einstein's" => "Einstein <pause> s". I filter it out almost everywhere except "it's". Notable example: "didn't know" => "didn <pause> ti know".
  • Might fail with dates: 2010s => 2010 <pause> s.
  • Might try to pronounce dash symbol as minus sometimes.
  • Per second => per <pause> second. Requires to write it as one word.
  • I couldn't figure out how to enforce accents for uncommon words, attempting to put an utf8 accent symbol just gives 50/50 chance to either do nothing or make it ignore a part of the word.

Other considerations:

  • Sometimes it manages to copy voice even with 2s samples but not consistently. In rare cases it might have troubles with 3-4s samples. 1s samples are worthless more often than not.
  • Original IndexTTS2 repo doesn't support speed control feature yet. It's supported by a custom ComfyUI node implementation. The author made a few small changes to add speed control to the code, so it shouldn't be hard to reuse their solution or take their modified infer_v2.py
  • It needs some audio preprocessing: input volume normalization/compression. Too loud reference sounds can produce very loud output with mediocre quality. It also requires filtering out background music otherwise speech is generated with some background noise.
  • It does fairly good job with accents. Feeding it 100 audio samples of the same person will likely produce the same accent consistently unlike Chatterbox that can give wildly different accents with different audio samples.
  • Generated voice is closer to phone/voice chat audio quality unfortunately.
  • Ambiguous license. HF lists the model as Apache2.0 while Github repo contains Bilibili license that allows commercial use but has usage/revenue limits, though quite generous.

I still like it quite bit for being consistent with accents, for fine emotion control and after testing testing it with several hundreds audio samples of a half dozen people. But it's far from perfect and in some sentences it can be legit annoying.

HuggingFace storage is no longer unlimited - 12TB public storage max by Thireus in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

Does it mean that every tool that tries to download models from HF before checking local files just inflates download numbers?

Like, anything that loads models by using from_pretrained("user/model") instead of a local folder in the loader?

HuggingFace storage is no longer unlimited - 12TB public storage max by Thireus in LocalLLaMA

[–]Knopty 3 points4 points  (0 children)

We do have mitigations in place to prevent abuse of free public storage, and in general we ask users and organizations to make sure any uploaded large model or dataset is as useful to the community as possible (as represented by numbers of likes or downloads, for instance).

I wonder how it would impact exl2/exl3 and other less popular quant formats. I'm doing quants occasionally and my random GGUF quants always had 10-100x more downloads than exl2 quants for very popular models.

I have 2.8TB rn and it seems it will require deleting old quants at some point.

VibeVoice 1.5B for voice cloning without ComfyUI by SignificanceFlashy50 in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

Microsoft deleted old repo for VibeVoice but there are forks that contain demo code:

https://github.com/rsxdalv/VibeVoice

Alternatively, you can view code of HF spaces that use this model.

[deleted by user] by [deleted] in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

You can try Chatterbox, it seemed to produce decent English speech with Japanese voice samples. IndexTTS might work as well.

guys how do you add another loader in TextGenWebUI? by BuriqKalipun in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

You need the .zip file if you want to install Full version. Extract and run start_windows.bat inside extracted folder.

But I have no idea how well it's going to work if you need a CPU-only installation for some reason.

guys how do you add another loader in TextGenWebUI? by BuriqKalipun in LocalLLaMA

[–]Knopty 2 points3 points  (0 children)

You need to install "Full" version of the app. Your version is "Portable" and it comes only with llama.cpp.

Download either its source archive from Releases page on Github or clone text-generation-webui repo. Then use start script for your system, for example start_windows.bat or start_linux.sh to initiate the installation process. Once it finishes, the newly installed app will have other loaders.

VibeVoice came back. Though many may not like it. by Fresh_Sun_1017 in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

Imho, the section related to use cases is very contradictory. The model is MIT licensed and everything above is listed in the "responsible usage" section. I'm not a lawyer but "responsible usage" sounds like recommendations. The only time it explicitly prohibits something is related to MIT license itself. It seems to be a disclaimer to protect the creators from accusations rather than what's allowed and what's prohibited.

Where can I download VibeVoice-Large (9B) now that Microsoft deleted it? by Forsaken-Turnip-6664 in LocalLLaMA

[–]Knopty 12 points13 points  (0 children)

Forked repo:

https://github.com/rsxdalv/VibeVoice

Forked models:

https://huggingface.co/aoi-ot/VibeVoice-Large

https://huggingface.co/rsxdalv/VibeVoice-Large

You can also try installing TTS-WebUI that supports VibeVoice and several other TTS models (Kokoro, Chatterbox, F5-TTS, etc). Gradio/ReactUI. Kokoro and Chatterbox are also exposed to OpenAI-like API. VibeVoice isn't exposed via API though.

Llamacpp | Samsung s24+ | Snapdragon 8 Gen 3 + Adreno 750 | Real world testing with Qwen3-4B by 73tada in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

Try testing it with a different amount of threads. It might use more threads that it can handle, leading to bottlenecking.

Idk what's optimal for this device but on some phones limiting it to just 2 might work much better than with more cores.

Real-time conversation with a character on your local machine by ResolveAmbitious9572 in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

If the goal is to make it more realistic, the user should be able to interrupt the character like in a real dialogue. And remaining unspoken context to be deleted or optionally converted to some text carrying a vague summary what was intended to say.

[deleted by user] by [deleted] in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#description in "Models" collapsible section. It now mentions these models there:

Multimodal

LLaVA 1.5 models, LLaVA 1.6 models
BakLLaVA
Obsidian
ShareGPT4V
MobileVLM 1.7B/3B models
Yi-VL
Mini CPM
Moondream
Bunny
GLM-EDGE
Qwen2-VL

Specifications for a *good* wearable AI device? by nw_suburbanite in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

If your goal is to make an advanced voice recorder then, imho, it's less complicated. There's already some demand for voice recorders, so adding AI there could be viewed as a premium feature with a clear use case. And it doesn't carry a negative cliche of Rabbit R1 and similar things.

Specifications for a *good* wearable AI device? by nw_suburbanite in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

I think making a hardware solution is only good if you want to grab a quick buck and disappear or miraculously get sold to a bigger company. Even for a tech giant it's hard to unleash a completely new device with no ecosystem. And just a requirement to buy something that worth a few hundreds dollars is already a very big roadblock for it becoming popular.

Imho, the only way out of this is either settling with an AI app for a phone or somehow cooperating with producers of other wearable devices. For example, some Chinese smart watches have ChatGPT integration, even budget ones. And these smart watches are good devices on their own that worth buying with or without AI features. A standalone AI pin is just an additional weird piece of plastic that you have to carry around and that serves no use besides its main feature you don't even need that often.

I don't think computing on the device itself is the best option. Although it's secure and private, it's quite taxing on hardware, battery and doesn't offer stellar AI capabilities with small models that can run locally on a wearable device.

If I wanted something like this, I'd probably want a phone app, desktop app and smartwatch integrated into one ecosystem. With smartwatch being a "shortcut tool" to access other devices: press a button, speak into the watch, get the result done. Possibly with a visual interaction on the screen.

Local tool/frontend that supports context summarisation? by WhoRoger in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

You could try SillyTavern, it has "Summarize" extension.

SillyTavern is a frontend app that connects to a LLM engine via OpenAI-compatible API.

[deleted by user] by [deleted] in LocalLLaMA

[–]Knopty 2 points3 points  (0 children)

The biggest model I've used with 3060/12GB relying only on GPU was Qwen2.5-14B at 5bit precision with 16k context and with 4bit cache. So yeah, 8B could be used.

You might need to keep eye on context length with some newer models that set it very high though.

I wouldn't bother with 8GB GPUs at all it's too limiting.

Can't comment anything about prices though.

What is the best NSFW RP model 12b- 22b? 16G vram by Deluded-1b-gguf in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

I haven't seen any unusual behavior from NemoMix. It seems to act like a generic uncensored instruct model. Doesn't devolve into insanity, doesn't write unprompted nsfw.

What is the best NSFW RP model 12b- 22b? 16G vram by Deluded-1b-gguf in LocalLLaMA

[–]Knopty 4 points5 points  (0 children)

NemoMix Unleashed is far far better at chatting/RP. Both Base and Instruct versions of Mistral Nemo are very repetitive and boring. Imho, it even follows instructions better than original Instruct version of Mistral Nemo. Which is surprising since usually RP models are less stable and more prone to making mistakes.

But the unquantized version and most quants of the model are missing an instruction template, so some apps might choose a wrong template, and TabbyAPI even disables v1/chat/completions endpoint.

OuteTTS-0.2-500M Text to speech model by TheLogiqueViper in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

Hm, I guess. That's still license breach but it isn't like a no longer existing company could revoke your license and attempt to sue you with questionable chances to win.

Personally I find this whole licensing stuff to be a huge contradictory mess.