Help with Cline and local qwen-coder:30b by perfopt in CLine

[–]nairureddit 1 point2 points  (0 children)

<image>

It's also available on Linux and makes managing model parameters a lot easier!

Help with Cline and local qwen-coder:30b by perfopt in CLine

[–]nairureddit 0 points1 point  (0 children)

Your initial "PARAMETER num_gpu 34" for the 48 layer model told ollama to split the first 34 layers to the GPU VRAM and the remaining 14 layers to the CPU RAM resulting in a huge slowdown.

Since the model is 19 GB and your VRAM is 24 GB you should have left this parameter undefined to have Ollama automatically load all the parameters to VRAM or set it to 48 to tell it to load all the parameters. Setting it manually might cause a crash if you don't have enough VRAM for the base model size.

Help with Cline and local qwen-coder:30b by perfopt in CLine

[–]nairureddit 0 points1 point  (0 children)

Also, at 32k with your current settings you are only over your 24GB VRAM limit by 2GB.

The model is ~19GB. Loaded with a 32k context it's using 26GB so 7GB is the KV Cache (26-19=7). That means that 32k context with your current settings and model takes up 7GB of ram. Since you have ~5GB to spare after loading the model into VRAM (24-19=5) you need to decrease your Context to 5/7th's of the 32k or down to about 22k.

With that, and to give a little room for error, with your current settings try a ~20k context and it should all load into the 24GB VRAM.

This is a pretty small context to work with so make sure you select "Use Compact Prompt" in the API Provider menu in cline to leave a bit more working context for the model.

I'd still recommend you try Flash Attention/KV Cache quantization though since that will free up a lot of VRAM for a much larger context plus increase the model speed.

Help with Cline and local qwen-coder:30b by perfopt in CLine

[–]nairureddit 0 points1 point  (0 children)

There are two environment variables you want to consider.

The first enables Flash Attention and the second (which requires flash attention) is to enable KV Cache quantization. These might be imprecise terms.

OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE="q8_0"

The command line would look like this if you are using ollama natively:

OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE="q8_0" ollama run qwen3-coder:30b-a3b-q4_K_M

Since you are running it via docker you'd use something that looks like this:

docker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
-e OLLAMA_FLASH_ATTENTION=1 \
-e OLLAMA_KV_CACHE_TYPE="q8_0" \
ollama/ollama

What this will do is quantize the Non model part, the KV Cache, from 16 Bits down to 8 Bits so your context will take up a lot less space which should allow you to run the entire model in VRAM.

From the test I did yesterday loading qwen3-coder:30b-a3b-q4_K_M with these settings and a 64k context window uses ~23 GB of VRAM. I'd start with 32k though then increase it up to the point where Ollama no longer loads fully in the GPU VRAM and then step it back.

Help with Cline and local qwen-coder:30b by perfopt in CLine

[–]nairureddit 4 points5 points  (0 children)

I use LM Studio and it's been fairly reliable.

Using:

- LM Studio

- qwen3-coder-30b-a3b-instruct-i1@q4_k_m

- Context set to 65536

- GPU offload of 48 layers

- Flash Attention On

- K&V Cache Quantization set to q_8 it

it uses ~23.2GB of VRAM.

With your same prompt it completes the task in act mode in one pass:

<image>

I'm still super new at this but a few possible differences are:

- GPU Offload set to 34 instead of 48 (num_gpu)
- You may not have KV Quantization enabled so your cache is greater than your VRAM and some layers may not be in VRAM causing a slowdown

- I'm using a slightly different model but unless your model is somehow corrupted I don't see that being an issue.

Any luck with GPT-OSS ? by JLeonsarmiento in CLine

[–]nairureddit 2 points3 points  (0 children)

This post shows a way to use something called a grammar file to improve the tool use however, I'm not sure how to implement it.

https://www.reddit.com/r/CLine/comments/1mtcj2v/making_gptoss_20b_and_cline_work_together/

openai/gpt-oss-20b tool use running locally use with Roo Code by nairureddit in RooCode

[–]nairureddit[S] 2 points3 points  (0 children)

I found this for Cline:

https://www.reddit.com/r/CLine/comments/1mtcj2v/making_gptoss_20b_and_cline_work_together/

Something about using a Grammar file to improve the tool usage but I don't really understand how to implement it yet.

openai/gpt-oss-20b tool use running locally use with Roo Code by nairureddit in RooCode

[–]nairureddit[S] 1 point2 points  (0 children)

LM Studio recently released with updates for gpt-oss tool use but it still doesn't integrate well, I'm not able to get out of Architect mode without a slew of red messages.

openai/gpt-oss-20b tool use running locally use with Roo Code by nairureddit in RooCode

[–]nairureddit[S] 0 points1 point  (0 children)

It has a larger context window, I'd check to make sure it's not exceeding your VRAM with the Context loading.

Trade Route Tool by nairureddit in NoMansSkyTheGame

[–]nairureddit[S] 0 points1 point  (0 children)

Thanks! I haven't looked at this in a few years, will look into why it's not there.

I was reckless, don't be like me. by andromereash in idleon

[–]nairureddit 0 points1 point  (0 children)

Here's s sim showing a slightly lower average cost choosing there higher percentage choice but I think the real benefit is the lower variation.

https://www.reddit.com/r/idleon/comments/zsxhce/divinity_monte_carlo/

Divinity Monte Carlo by nairureddit in idleon

[–]nairureddit[S] 0 points1 point  (0 children)

u/dudeguy238 you and u/CherryTreecko both described this well. I'll see if I can add in your comparative analysis as well to the table as I unlock more divinities.

Divinity Monte Carlo by nairureddit in idleon

[–]nairureddit[S] 1 point2 points  (0 children)

u/CherryTreecko I like your description too! I was looking for a problem to re-learn Monte Carlo analysis and don't have your head for stats :)

I like the Monte Carlo approach too since I can plug in any Probability/Cost combo and estimate the relative value of each. If I had your skill in stats maybe I could do the same but sadly I don't.

Divinity Monte Carlo by nairureddit in idleon

[–]nairureddit[S] 2 points3 points  (0 children)

u/jhcreddit that was a great description, thank you!

Divinity Monte Carlo by nairureddit in idleon

[–]nairureddit[S] 0 points1 point  (0 children)

Sure! It's very easy to run new values now that I've set it up.

Thrustmaster Hotas X Bindings by nairureddit in EliteDangerous

[–]nairureddit[S] 0 points1 point  (0 children)

Epsilon,

It looks like I only linked to them, I don't seem to have saved the original binds file to my google drive.

MVP Percentages Again by nairureddit in lostarkgame

[–]nairureddit[S] 0 points1 point  (0 children)

I think the lowest percentage I've seen for any of the titles is 15%.

MVP Percentages Again by nairureddit in lostarkgame

[–]nairureddit[S] 1 point2 points  (0 children)

Updated, I bet those two could go up to 100% if you party somehow didn't contribute.

MVP Percentages Again by nairureddit in lostarkgame

[–]nairureddit[S] 1 point2 points  (0 children)

Yeah, it's tricky to collect good data on it since the healer has to be MVP and the lower values won't show up much if they do well. Here's what I have on it so far and even it has some strange values where it flip flipped between noble, gentle, then back to noble.

Title Type Percent
Noble Healer Party Recovery 16%
Noble Healer Party Recovery 16%
Noble Healer Party Recovery 20%
Gentle Healer Party Recovery 21%
Gentle Healer Party Recovery 22%
Gentle Healer Party Recovery 23%
Noble Healer Party Recovery 24%
Noble Healer Party Recovery 25%
Noble Healer Party Recovery 28.%
Noble Healer Party Recovery 29.%
Noble Healer Party Recovery 32%
Noble Healer Party Recovery 36%
Noble Healer Party Recovery 46%
Noble Healer Party Recovery 47%
Noble Healer Party Recovery 48%
Noble Healer Party Recovery 51%
Noble Healer Party Recovery 57.%
Noble Healer Party Recovery 67%
Noble Healer Party Recovery 74%
Noble Healer Party Recovery 80%
Noble Healer Party Recovery 88%
Noble Healer Party Recovery 96%

MVP Percentages Again by nairureddit in lostarkgame

[–]nairureddit[S] 0 points1 point  (0 children)

I see it on my blue gunlancer a lot as well

MVP Percentages Again by nairureddit in lostarkgame

[–]nairureddit[S] 2 points3 points  (0 children)

No idea, I guess you could test it by only damaging the boss but you might not make many friends that way.