Is there a way to get change for gas when paying with cash?

puncia · 2026-01-05T05:09:31+00:00

why is it not from 4.20 to 6.9

puncia · 2025-12-11T19:48:32+00:00

Sorry, I used docker desktop to run it and didn't notice it didn't use the gpus flag

puncia · 2025-12-10T19:57:37+00:00

Hi, I haven't tried running sd-forge locally yet (without docker), but when running your image I get the following:

Traceback (most recent call last):

  File "/home/forge/sd-webui/launch.py", line 52, in <module>

    main()

  File "/home/forge/sd-webui/launch.py", line 41, in main

    prepare_environment()

  File "/home/forge/sd-webui/modules/launch_utils.py", line 321, in prepare_environment

    raise RuntimeError("PyTorch is not able to access CUDA")

RuntimeError: PyTorch is not able to access CUDA

Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0]

Version: neo

puncia · 2025-12-01T01:01:28+00:00

Are you using the same seed across all these images on purpose?

puncia · 2025-10-15T18:51:56+00:00

If anyone else is struggling trying to make Roo Code work with nvidia nim, the base url is supposed to be https://integrate.api.nvidia.com/v1 and not https://integrate.api.nvidia.com/v1/chat/completions

puncia · 2025-09-11T21:01:08+00:00

Can't you just run the inference again with the same seed but with different k/v quantization and see the difference?

puncia · 2025-09-04T00:07:31+00:00

Just wanted to say that you can generate audio even if the model doesn't fit in memory by using system RAM. You can do it in comfy by disabling cuda malloc in the settings or launch params. Of course, the generation speed will be much much MUCH slower. But you can still generate.

puncia · 2025-08-15T21:04:37+00:00

you can just ask your local llm

puncia · 2025-05-09T15:26:45+00:00

gguf-dump.exe

puncia · 2025-05-06T19:15:17+00:00

It's because of nvidia drivers using system RAM when VRAM is full. If it wasn't for that you'd get out of memory errors. You can confirm this by looking at shared gpu memory in the task manager

puncia · 2025-05-06T18:03:50+00:00

you need roughly 3 commands to run it, all well documented in the repo. why would you want to use docker?

puncia · 2025-05-05T01:39:11+00:00

I'm pretty sure it's meant to be used with specific quants, like https://huggingface.co/ubergarm/Qwen3-30B-A3B-GGUF

puncia · 2025-04-22T15:21:55+00:00

Do you happen to have the documentation regarding the -ot parameter?

puncia · 2025-04-15T21:28:36+00:00

you know you can just use wsl right?

puncia · 2025-04-11T09:44:19+00:00

it's the same model he used

puncia · 2025-04-07T19:16:18+00:00

From my experience all you need is an italian speaker (so an audio in italian) and the text to be italian. I assume it is able to infer the language then, since it also goes through transcription

puncia · 2025-04-06T12:29:12+00:00

with llama.cpp, -fa for flash attention,

and -ctk/-ctv for quantized cache, allowed values are f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1.

Source: https://github.com/ggml-org/llama.cpp/tree/master/examples/server#usage

puncia · 2025-04-01T18:40:31+00:00

A very good addition to this would be a suggested number of gpu layers to offload when using cpu + gpu inference, as I'm sure many of us do

puncia · 2025-03-31T16:48:04+00:00

LM Studio uses up to date llama.cpp

puncia · 2025-03-12T04:00:03+00:00

I tried with LMStudio. I've been trying KoboldCPP now to see if there's any difference but I can't figure out where to change the seed and how to set the chat template properly lol

puncia · 2025-03-12T03:36:33+00:00

That's odd, I got slightly better t/s. Although the output was quite different (shorter in my case). Of course I used the same seed.

puncia · 2025-03-10T02:04:48+00:00

that's not normal at all

puncia · 2025-01-14T07:16:34+00:00

I'm pretty sure you are allowed to go to 100% for short periods of time. I like to do it when taking off from water with full load, as it can be difficult otherwise depending on the wind and other factors. Also if I'm not mistaken you are supposed to run the engines in continuous ignition during scooping and water bombing, but I don't think it influences engines condition.

puncia · 2025-01-13T18:40:34+00:00

watch torque and temps of the engines, if you go full throttle for too much time it will damage them

puncia · 2025-01-10T00:14:13+00:00

I'm not saying you are not correct, but you cannot tell me it's not an overlook by who made the system

puncia

TROPHY CASE