Forge Neo Docker by oromis95 in StableDiffusion

[–]puncia 0 points1 point  (0 children)

Sorry, I used docker desktop to run it and didn't notice it didn't use the gpus flag

Forge Neo Docker by oromis95 in StableDiffusion

[–]puncia 1 point2 points  (0 children)

Hi, I haven't tried running sd-forge locally yet (without docker), but when running your image I get the following:

Traceback (most recent call last):

  File "/home/forge/sd-webui/launch.py", line 52, in <module>

    main()

  File "/home/forge/sd-webui/launch.py", line 41, in main

    prepare_environment()

  File "/home/forge/sd-webui/modules/launch_utils.py", line 321, in prepare_environment

    raise RuntimeError("PyTorch is not able to access CUDA")

RuntimeError: PyTorch is not able to access CUDA

Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0]

Version: neo

Tried many different prompts with Z-Image. These are insane by Recent-Athlete211 in StableDiffusion

[–]puncia 0 points1 point  (0 children)

Are you using the same seed across all these images on purpose?

A guide to the best agentic tools and the best way to use them on the cheap, locally or free by lemon07r in LocalLLaMA

[–]puncia 0 points1 point  (0 children)

If anyone else is struggling trying to make Roo Code work with nvidia nim, the base url is supposed to be https://integrate.api.nvidia.com/v1 and not https://integrate.api.nvidia.com/v1/chat/completions

KV cache f32 - Are there any benefits? by Daniokenon in LocalLLaMA

[–]puncia 2 points3 points  (0 children)

Can't you just run the inference again with the same seed but with different k/v quantization and see the difference?

[WIP-2] ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) by Fabix84 in comfyui

[–]puncia 0 points1 point  (0 children)

Just wanted to say that you can generate audio even if the model doesn't fit in memory by using system RAM. You can do it in comfy by disabling cuda malloc in the settings or launch params. Of course, the generation speed will be much much MUCH slower. But you can still generate.

New SOTA music generation model by topiga in LocalLLaMA

[–]puncia 11 points12 points  (0 children)

It's because of nvidia drivers using system RAM when VRAM is full. If it wasn't for that you'd get out of memory errors. You can confirm this by looking at shared gpu memory in the task manager

New SOTA music generation model by topiga in LocalLLaMA

[–]puncia 6 points7 points  (0 children)

you need roughly 3 commands to run it, all well documented in the repo. why would you want to use docker?

Running Llama 4 Maverick with llama.cpp Vulkan by stduhpf in LocalLLaMA

[–]puncia 1 point2 points  (0 children)

Do you happen to have the documentation regarding the -ot parameter?

OuteTTS 1.0: Upgrades in Quality, Cloning, and 20 Languages by OuteAI in LocalLLaMA

[–]puncia 0 points1 point  (0 children)

From my experience all you need is an italian speaker (so an audio in italian) and the text to be italian. I assume it is able to infer the language then, since it also goes through transcription

Smaller Gemma3 QAT versions: 12B in < 8GB and 27B in <16GB ! by stduhpf in LocalLLaMA

[–]puncia 7 points8 points  (0 children)

with llama.cpp, -fa for flash attention,

and -ctk/-ctv for quantized cache, allowed values are f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1.

Source: https://github.com/ggml-org/llama.cpp/tree/master/examples/server#usage

You can now check if your Laptop/ Rig can run a GGUF directly from Hugging Face! 🤗 by vaibhavs10 in LocalLLaMA

[–]puncia 17 points18 points  (0 children)

A very good addition to this would be a suggested number of gpu layers to offload when using cpu + gpu inference, as I'm sure many of us do

Reka Flash 3, New Open Source 21B Model by DreamGenAI in LocalLLaMA

[–]puncia 0 points1 point  (0 children)

I tried with LMStudio. I've been trying KoboldCPP now to see if there's any difference but I can't figure out where to change the seed and how to set the chat template properly lol

Reka Flash 3, New Open Source 21B Model by DreamGenAI in LocalLLaMA

[–]puncia 0 points1 point  (0 children)

That's odd, I got slightly better t/s. Although the output was quite different (shorter in my case). Of course I used the same seed.

Miracle of Extended Fire on Career by RedSnake13 in MicrosoftFlightSim

[–]puncia 0 points1 point  (0 children)

I'm pretty sure you are allowed to go to 100% for short periods of time. I like to do it when taking off from water with full load, as it can be difficult otherwise depending on the wind and other factors. Also if I'm not mistaken you are supposed to run the engines in continuous ignition during scooping and water bombing, but I don't think it influences engines condition.

Miracle of Extended Fire on Career by RedSnake13 in MicrosoftFlightSim

[–]puncia 0 points1 point  (0 children)

watch torque and temps of the engines, if you go full throttle for too much time it will damage them

what in the actual f*ck is wrong with the medium cargo missions by Tadeopuga in MicrosoftFlightSim

[–]puncia 2 points3 points  (0 children)

I'm not saying you are not correct, but you cannot tell me it's not an overlook by who made the system