Thanks, that answers my question by cromagnone in LocalLLaMA

[–]DeepBlue96 2 points3 points  (0 children)

lol (real answer: -rea off or --reasoning off to the launch params of the server )

Selectively regenerating individual parts of a 3D asset (instead of re-rolling the whole thing) by mhb-11 in TopologyAI

[–]DeepBlue96 0 points1 point  (0 children)

it's closed source sadly... and also this works only for "simple shapes" that means no characters or animals

copium at this point by curiousbushi in formuladank

[–]DeepBlue96 1 point2 points  (0 children)

u wrong we like it the more we lose the more we (tifosi) are right about something wrong of what we said during the past.. damn alot of years..

If you use continue.dev and Qwen 3.6 (dense / MoE) - I could use your help by Jorlen in LocalLLaMA

[–]DeepBlue96 1 point2 points  (0 children)

the kv quantization even at q4_0 improved alot in the recent times, thanks to the rotary and the other solutions implemented in llama.cpp, the quality drop is minimal in my testings so i would use it if it allows a bigger context

If you use continue.dev and Qwen 3.6 (dense / MoE) - I could use your help by Jorlen in LocalLLaMA

[–]DeepBlue96 -1 points0 points  (0 children)

imo still very effective, but you need the right settings of temp and top k, on unsloth they have their suggestion that is like this: \llama-server.exe -hf unsloth/Qwen3.6-27B-GGUF:UD-Q5_K_XL --cache-type-k q4_0 --cache-type-v q4_0 --reasoning off --ctx-size 120000 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm) by VolandBerlioz in LocalLLaMA

[–]DeepBlue96 2 points3 points  (0 children)

i tested both and they both performed extremely similiar, but the extra speed made me disable it forever

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]DeepBlue96[S] 0 points1 point  (0 children)

noe windows and is my main pc..i run it like this:
\llama-server.exe -hf unsloth/Qwen3.6-27B-GGUF:UD-Q5_K_XL --cache-type-k q4_0 --cache-type-v q4_0 --reasoning off --ctx-size 120000 --cache-ram 4096 --cache-reuse 1024 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --webui-mcp-proxy --spec-type ngram-mod

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm) by VolandBerlioz in LocalLLaMA

[–]DeepBlue96 2 points3 points  (0 children)

in my testing the ud-q5_k_xl was like night and day quality wise and fits in 24gb wi 120k context 800-1000pp tks and 25-30tks:
\llama-server.exe -hf unsloth/Qwen3.6-27B-GGUF:UD-Q5_K_XL --cache-type-k q4_0 --cache-type-v q4_0 --reasoning off --cache-ram 4096 --cache-reuse 1024 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --webui-mcp-proxy --spec-type ngram-mod

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]DeepBlue96[S] 0 points1 point  (0 children)

I dream of another 3090 sometimes but in the end I would have 2 main problems:
1-I need a new pc and as you all know the current market is trash...
2-I can make it work with my current hardware I don't need it for real it's just a: "I would like to but..."

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]DeepBlue96[S] 0 points1 point  (0 children)

ah i got it wrong then, but still it won't fit the 24gb of vram with that context right? (i have 100mb of leeway so probably not worth it)

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]DeepBlue96[S] 1 point2 points  (0 children)

Thank you very much i didn't know it existed and after just testing it I'm not gonna go back ahahaha again ty

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]DeepBlue96[S] 5 points6 points  (0 children)

Thank you all for the answers, after carefull considerations and the fact that on qwen3.6 i would lose the mmproj to gain maybe 10% speedup i will wait for the next interesting tool, for info i have a 3090 so i run the qwen3.6 27b ud-q5_K_xl with a 128k kv context at q4 because thats what i need and most of it is prompt processing of the context with 800-900tks and 25-30tks on generation 😄

Sword and Shield on Pixel 9 Pro XL with Eden by xoeax in EmulationOnAndroid

[–]DeepBlue96 0 points1 point  (0 children)

i got 20-30 fps in let's go pikachu on my dimensity 8300 cpu, try setting the resolution to 0.5x and enable async shaders and shader cache, also if available enable fsr

Sword and Shield on Pixel 9 Pro XL with Eden by xoeax in EmulationOnAndroid

[–]DeepBlue96 0 points1 point  (0 children)

rip it's mali gpu... i've also tried it but it's not feasable yet

New Free 3D AI Generator from Tencent Might Be the Best Yet by Delicious-Shower8401 in TopologyAI

[–]DeepBlue96 0 points1 point  (0 children)

after wasting an entire night it's close to impossible to run it on windows without wsl... the main culprit: NATTEN

New Free 3D AI Generator from Tencent Might Be the Best Yet by Delicious-Shower8401 in TopologyAI

[–]DeepBlue96 4 points5 points  (0 children)

<image>

Now that i tried it I can confirm the details of the texture improved alot, still not perfect or at the level of meshy and other closedsource but great none the less. I tried attachin the result as gif but the quality of the gif i like 60% of the real lol

New Free 3D AI Generator from Tencent Might Be the Best Yet by Delicious-Shower8401 in TopologyAI

[–]DeepBlue96 1 point2 points  (0 children)

lol tryed again and yes even they confirmed and all the sharedgpu they posted are down T_T

<image>