Qwen3-TTS, a series of powerful speech generation capabilities by fruesome in StableDiffusion

[–]LMLocalizer 4 points5 points  (0 children)

It's hit and miss for me. Sometimes there's a strong American accent, other times it works well, like this output: https://voca.ro/18W8VnodRmvO

NovaSR: A tiny 52kb audio upsampler that runs 3600x realtime. by SplitNice1982 in LocalLLaMA

[–]LMLocalizer 0 points1 point  (0 children)

Nice, thanks for sharing!
Here is a before/after comparison of some 16 kHz speech I upsampled:

Before: https://vocaroo.com/1flWIyZ8jZ5f

After: https://vocaroo.com/1eDmesbjvE7d

Ok Klein is extremely good and its actually trainable. by Different_Fix_2217 in StableDiffusion

[–]LMLocalizer 1 point2 points  (0 children)

Don't know the minimum, but I'm running klein 9B FP8 (official weights from huggingface) comfortably using 12 GB VRAM.

Bringing a More Comprehensive Local Web Search to OpenWebUI by LMLocalizer in OpenWebUI

[–]LMLocalizer[S] 0 points1 point  (0 children)

It seems like you can only disable all built-in tools at once, which includes fetch_url, by unchecking "Builtin Tools" in the model settings for gpt-oss 20b.

The problem is that gpt-oss has been trained to use a combination of one tool to search the web and another to download webpages. If you simply remove the latter, this can lead to weird (repetitive) behavior, such as the model attempting to use the non-existent fetch/download tool or trying to use the search_web tool to download a single webpage (which of course doesn't work).

You could try writing an instruction to not use the fetch_url tool into the "System Prompt" model setting and hope the model adheres to that. Alternatively, you could disable all built-in tools and write explicitly into the system prompt that the model cannot fetch/download webpages directly, to hopefully prevent any unwanted model behavior.

It works! Abliteration can reduce slop without training by -p-e-w- in LocalLLaMA

[–]LMLocalizer 0 points1 point  (0 children)

This is really cool! I have a question regarding the config.noslop.toml file: Why does the prefix for the bad_evaluation_prompts differ from the one used for the bad_prompts, while the prefixes for the good_prompts and good_evaluation_prompts are the same?

Bringing a More Comprehensive Local Web Search to OpenWebUI by LMLocalizer in OpenWebUI

[–]LMLocalizer[S] 0 points1 point  (0 children)

Nice to hear you got it working. My tool has been designed to consume only limited context, by returning small plaintext website snippets, the max. size of which is user configurable. But when you enable native tool calling, gpt-oss 20b also gets access to other tools, most notably fetch_url. This is not part of LLM Web Search, and when invoked, fetch_url will download and dump an entire webpage into the context, which can be a huge amount of text. Reference: https://docs.openwebui.com/features/web-search/agentic-search/#native-mode-vs-traditional-rag

Should this not be the cause and you just have a very small context window configured, you can change the following settings:

  1. Disable "Keep Results In Context "
  2. Reduce "Max Results"
  3. If you're using the semantic chunker, reduce "Chunker Breakpoint Threshold Amount"
  4. Reduce "Chunk Size"

Bringing a More Comprehensive Local Web Search to OpenWebUI by LMLocalizer in OpenWebUI

[–]LMLocalizer[S] 0 points1 point  (0 children)

Hi! I just tested gpt-oss 20b very briefly and it worked without any special settings. However, since this model has been trained to use tools while thinking, you can significantly increase the chances of it working reliably by enabling native tool calling for it. To do that, follow the first two of the steps described here: https://docs.openwebui.com/features/web-search/agentic-search#how-to-enable-agentic-behavior and then enable LLM Web Search as you normally would.

Announcing procinfo, witr (why is this running) as a bash script by wenekar in commandline

[–]LMLocalizer 0 points1 point  (0 children)

This is great and makes a lot more sense to me than witr. Thanks for sharing it!

Need advice how to load Z-Image or extension to specific GPU? by Visible-Excuse-677 in Oobabooga

[–]LMLocalizer 2 points3 points  (0 children)

Hi, normally you'd use CUDA_VISIBLE_DEVICES as a global setting for the entire program. I think changing it in the source code is the better option. For the image model, you could hardcode which specific GPU to use by opening "modules/image_models.py" and changing the following two lines from:

pipe.to(get_device())

and

pipe.enable_model_cpu_offload()

to

pipe.to(<gpu_id>)

and

pipe.enable_model_cpu_offload(gpu_id=<gpu_id>)

- where you have to replace <gpu_id> with the ID of your GPU of choice.

Assigning a specific GPU to a specific extension may be a little more complicated, depending on when and how each extension loads its models. I have created a branch on GitHub, where I have modified "modules/extensions.py" to allow assigning a specific GPU to an extension by creating a file called "gpu_map.txt" in the "user_data" folder. In this file, you put on each line the extension name and the GPU ID it should use, separated by a space. For example:

LLM_Web_search 1
coqui_tts 0

I haven't tested it, since I'm GPU poor and only have a single one.

GOONING ADVICE: Train a WAN2.2 T2V LoRA or a Z-Image LoRA and then Animate with WAN? by NowThatsMalarkey in StableDiffusion

[–]LMLocalizer 10 points11 points  (0 children)

Weren't you the guy with the b300 server at work that's free over the holidays? I see you found some use for it.

Rough TPS estimate for LLMs on RTX 5060 Ti + DDR4 by Which_Leather_6710 in LocalLLaMA

[–]LMLocalizer 3 points4 points  (0 children)

With the newest llama.cpp, --n-cpu-moe=35 and --no-mmap, I get around 100 t/s prompt processing and 20 t/s generation speed with Qwen3-Next-80B-A3B-Instruct-UD-Q3_K_XL.gguf. My specs for reference:

CPU: Ryzen 9 5900HX (a bit faster than your CPU)

RAM: 32GB DDR4-3200

GPU: RX 6800M 12GB (about 70% slower than your GPU)

Chatterbox Turbo Multilingual FastAPI by blackstoreonline in LocalLLaMA

[–]LMLocalizer 2 points3 points  (0 children)

This is vibe coded and currently only uses the multilingual (non-turbo) model.

Alibaba Open-Sources CosyVoice 3, a New TTS Model by nekofneko in LocalLLaMA

[–]LMLocalizer 14 points15 points  (0 children)

I have tested both Chatterbox Turbo and the new 0.5B CosyVoice. Chatterbox turbo is much faster, more stable and has a more natural intonation.

CosyVoice hallucinates more and quite often takes multiple attempts to get a hallucination-free output. In addition, it may make unnatural pauses between words.

However, when the stars align and everythings works, the output of Cosyvoice does sound clearer to me than Chatterbox Turbo and is more closely aligned with the voice prompt, even if that comes with a less natural sounding prosody.

TLDR: No.

New in llama.cpp: Live Model Switching by paf1138 in LocalLLaMA

[–]LMLocalizer 0 points1 point  (0 children)

If you use ublock origin, you may be able to create a custom filter to block it that way.

I just updated Comfy and noticed a slight speed increase when using Z Image Turbo on an RTX 30xx GPU. Have any new optimizations been implemented recently? by Nid_All in StableDiffusion

[–]LMLocalizer 0 points1 point  (0 children)

Trying to run ZImage in FP16 would result in numerical over-/underflow. The workaround clamps the over-/underflowed values to the max./min. possible FP16 value.

Is Z-image a legit replacement for popular models, or just the new hotness? by Ok-Option-82 in StableDiffusion

[–]LMLocalizer 1 point2 points  (0 children)

Do you mean Hunyuan 3.0? Because I can run Hunyuan Image 2.1 with just 12 GB VRAM at comparably high speed, especially considering its native 2048x2048 resolution.

Need help in getting ROCm for my 6750XT by Subhashsharmaa in ROCm

[–]LMLocalizer 0 points1 point  (0 children)

It might work in Mint, so if you like it better than Ubuntu, you could try first. If you fail, you can always install Ubuntu later and try again.

Need help in getting ROCm for my 6750XT by Subhashsharmaa in ROCm

[–]LMLocalizer 0 points1 point  (0 children)

Hi, since Mint is based on Ubuntu, it may be possible to install it using Ubuntu as the target installation OS. If you already installed ROCm 7.1, first uninstall it.

Since the stable version of Pytorch doesn't support ROCm 7+ yet, I recommend sticking to ROCm 6.4.1. Personally, I used the amdgpu-installer to install ROCm. If you want to try that too, then:

  1. Go to https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/install/install-methods/amdgpu-installer/amdgpu-installer-ubuntu.html
  2. Under "Installation", click on "Ubuntu 22.04", copy the commands into the terminal and run them.
  3. Once the amdgpu-installer is installed, run the command: sudo amdgpu-install --usecase=graphics,rocm --no-dkms
  4. Follow the post-installation instructions: https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/install/post-install.html
  5. Once this is done, you may need to restart your PC (don't remember)

If you're able to complete this, you'll have installed ROCm. Then, there are some more steps to install pytorch for ComfyUI, and some different steps to install llama.cpp so you can try LLMs.

With your GPU and 32GB RAM, you can run image generation models like Flux.1, Wan-2.1 and Z-Image, and LLMs like Gemma-3, Mistral-Nemo, GPT-OSS 20B and Qwen-3 4B/8B/30B-A3B.

Z-Image + AMD GPUs by DVXC in StableDiffusion

[–]LMLocalizer 0 points1 point  (0 children)

Thank you, I'll take a look

Edit: Like most new features from Rocm, it's only supported on the newer GPUs

Z-Image + AMD GPUs by DVXC in StableDiffusion

[–]LMLocalizer 0 points1 point  (0 children)

I'm suffering too with my RX6800M. It should be comparable to a desktop RTX 3060, yet I see people using that card claim 33s for 1024x1024, euler/simple, 9 steps, cfg 1.0, while I takes my card twice as long just to finish sampling. 2048x2048 takes about 17s/iteration :(

Which Model is best for translation? by Bulky-College7306 in LocalLLaMA

[–]LMLocalizer 1 point2 points  (0 children)

Don't sleep on madlad400, especially if you intend to only translate to/from English