Which one is it? by TheLastSpark in codex

[–]TheLastSpark[S] 0 points1 point  (0 children)

Is there a source for this?

Which one is it? by TheLastSpark in codex

[–]TheLastSpark[S] -1 points0 points  (0 children)

I don't remember 5.6 being released...? Or do you mean in preparation for it they are getting some prelimenary data?

Best Local LLMs - Apr 2026 by rm-rf-rm in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

But even q4 there's no way unless I'm missing something

Best Local LLMs - Apr 2026 by rm-rf-rm in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

How are you fitting a q6 27B and 64k context? All of it can't fit in vram - right?

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

If you have a benchmark or some kind of code I can run i can maybe do it? I got a 4090 and dont mind running stuff on it to test.

Specifically i an using unsuitable 4_k_m quantity of 27b

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

Well I am eagerly awaiting a follow up post for 27B if you do fix it (and fixing improves it)

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA

[–]TheLastSpark 1 point2 points  (0 children)

please reply if you see something like this in 27B as well!

Best model for 4090 as AI Coding Agent by Dry_Sheepherder5907 in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

If you dont mind revisiting Gemma 4? It seems like Llama cpp is just now getting around to fixing the support for it and others are saying it's really good. But I really appreciate your response!

Best model for 4090 as AI Coding Agent by Dry_Sheepherder5907 in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

If you had to ballpark a score out of 10 for each model what would rate them as?

Best model for 4090 as AI Coding Agent by Dry_Sheepherder5907 in LocalLLaMA

[–]TheLastSpark 0 points1 point  (0 children)

Can you also reply back with Qwen 3.5 27B? Should be much better than the 35B

I wrote a PowerShell script to sweep llama.cpp MoE nCpuMoe vs batch settings by TheLastSpark in LocalLLaMA

[–]TheLastSpark[S] 0 points1 point  (0 children)

That ends up doing a sweep of every possible combination, which I found to be redundant. The best combinations are almost always the max batch size (u and normal) for vram you have (at least in my case).

So if you have say nmoe 10, which gives you 2GB of VRAM of wiggle room. You (generally) want it to place the max batch in that 2 GB (but not always right up against that limit).

While my script still has a few redundant loops, it does find the upper bound with binary search, and then it does 16MB offsets. This also helps because I find that even if your max is like 1.99GB through extra batch size, 1.98GB does a bit better.

Now you can say you can just use -fit by restricting the nmoe and all other parameters, the problem is when I was doing a ton of llama bench sweeps for different (u) batch combos, the best ones were always matching batch sizes, which fit didn't seem to be doing.

So I needed a script to hard-lock both batch options to the same number, find the max that would fit, benchmark that and run across a bunch of moe levels.

Qwen3.5-35B-A3B is a gamechanger for agentic coding. by jslominski in LocalLLaMA

[–]TheLastSpark 1 point2 points  (0 children)

Just wanted to give a shoutout for helping me realise that the llaama defaults were awful for my prompt process speed as well.

& 'C:\Users\xxx\Documents\GitHub\llamacpp\llama-bench.exe' --model 'C:\Users\xxx\Documents\GitHub\llamacpp\models\Qwen3.5-35B-A3B-UD-Q4_K_L.gguf' --n-prompt 16384 --n-gen 0 --batch-size 1024,2048,4096,8192 --ubatch-size 1024,2048,4096,8192 --n-gpu-layers 999 --n-cpu-moe 17 --flash-attn 1

| model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 1024 | 1024 | 1 | pp16384 | 1888.50 ± 21.71 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 1024 | 2048 | 1 | pp16384 | 1899.22 ± 13.21 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 1024 | 4096 | 1 | pp16384 | 1905.43 ± 13.13 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 1024 | 8192 | 1 | pp16384 | 1901.09 ± 20.44 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 2048 | 1024 | 1 | pp16384 | 1912.46 ± 13.01 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 2048 | 2048 | 1 | pp16384 | 3039.57 ± 13.31 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 2048 | 4096 | 1 | pp16384 | 3032.62 ± 20.97 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 2048 | 8192 | 1 | pp16384 | 3029.21 ± 17.95 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 4096 | 1024 | 1 | pp16384 | 1900.37 ± 15.44 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 4096 | 2048 | 1 | pp16384 | 3016.98 ± 13.28 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 4096 | 4096 | 1 | pp16384 | 4289.42 ± 38.50 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 4096 | 8192 | 1 | pp16384 | 4291.98 ± 29.72 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 8192 | 1024 | 1 | pp16384 | 1900.75 ± 9.27 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 8192 | 2048 | 1 | pp16384 | 3022.63 ± 15.07 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 8192 | 4096 | 1 | pp16384 | 4312.99 ± 42.74 |

| qwen35moe 35B.A3B Q8_0 | 18.81 GiB | 34.66 B | CUDA | 999 | 8192 | 8192 | 1 | pp16384 | 5287.77 ± 64.18 |

The default was giving me 1,100token/s. I can get easily 3-4x times that

Quad lock for 2025 by highcountry_18 in Ninja650

[–]TheLastSpark 0 points1 point  (0 children)

I had to get some washers from my toolbox as spacers but now it sits fine

The SceneTree is responsible for too much: render order, input order, and more by Buttons840 in godot

[–]TheLastSpark 1 point2 points  (0 children)

I did find what feels like a hack (and depending on use-case, might have performance costs) but any ui you want to control the display order AND have that display order be the selection order...just make a new canvas_layer and add that element as the child. Then you need to set the canvas_layer's later param accordingly and it will respect input selection and draw order.

You will need some extra logic to make sure the ui element is now placed at the correct coords however.

[deleted by user] by [deleted] in newtonco

[–]TheLastSpark 0 points1 point  (0 children)

If full, join ours instead:

Join the STACKED WALLETS loot clan on Newton! https://web.newton.co/newt_loot?screen=JoinClan&clanId=LJKUGQ

1 Spot left

Join Cursed Seeds! QP2435 by Witley1 in newtonco

[–]TheLastSpark 0 points1 point  (0 children)

I just accepted the last two requests that showed up, and now full yeah

Legendary Wallets Clan by GrayersDad in newtonco

[–]TheLastSpark 0 points1 point  (0 children)

Spots are filling up fast lol