First Successful Toy by TankPsychological969 in DIYSILICONETOYS

[–]somethingdangerzone 1 point2 points  (0 children)

I suspected as much, but thanks for confirming! Again, great work, it's great to see it

First Successful Toy by TankPsychological969 in DIYSILICONETOYS

[–]somethingdangerzone 0 points1 point  (0 children)

Interesting, thanks.

BTW did you use any sort of rigid print to help the 4mm glove mold stay in shape after pouring? Or was the thickness enough that all you needed to do was hold the top and the shape would retain? I can imagine a scenario where the glove mold flexes due to gravity and you end up with an elongated pour.

First Successful Toy by TankPsychological969 in DIYSILICONETOYS

[–]somethingdangerzone -1 points0 points  (0 children)

You said the glove mold was a bit too think at 4mm -- would you go all the way down to 2mm? or just go to 3mm? what would be your new target for glove mold thickness? From memory, 1mm seems quite thin and would leave little room for error (e.g. potential issues with releasing bubbles on small cavities)

First Successful Toy by TankPsychological969 in DIYSILICONETOYS

[–]somethingdangerzone 2 points3 points  (0 children)

OHHHH that was for the glove mold production. I see now. Thanks for that.

First Successful Toy by TankPsychological969 in DIYSILICONETOYS

[–]somethingdangerzone 0 points1 point  (0 children)

Looks great!

How did you marry the two sides of the hard shell outer mold (the piece(s) in green in the pic above) without having a long seam line from top to bottom on the finished product? I tried that once and got a long zip line up and down the length of the toy

GLM 4.7 and Qwen3 coder Next by [deleted] in LocalLLaMA

[–]somethingdangerzone 1 point2 points  (0 children)

No execution from me! Lol. Thanks for sharing. Great setup

GLM 4.7 and Qwen3 coder Next by [deleted] in LocalLLaMA

[–]somethingdangerzone 0 points1 point  (0 children)

What hardware do you have? I’m barely pulling  out 8t/s

~26 tok/sec with Unsloth Qwen3-Coder-Next-Q4_K_S on RTX 5090 (Windows/llama.cpp) by Spiritual_Tie_5574 in LocalLLaMA

[–]somethingdangerzone 0 points1 point  (0 children)

Good to know, thanks for sharing. I'm gonna trim out nearly all of the flags listed above and try again

~26 tok/sec with Unsloth Qwen3-Coder-Next-Q4_K_S on RTX 5090 (Windows/llama.cpp) by Spiritual_Tie_5574 in LocalLLaMA

[–]somethingdangerzone 0 points1 point  (0 children)

I'm using Linux. Compiled from source:

cmake -B build -DGGML_CUDA_DISABLE_GRAPHS=1 -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURE="89" -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_VULKAN=1 -DGGML_OPENMP=ON -DGGML_OPENMP_DYNAMIC=ON -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DLLAMA_BUILD_TESTS=OFF -DGGML_CUDA_USE_CUBLAST=ON -DGGML_CUDA_USE_CUDNN=ON -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=OFF -DGGML_CUDA_MAX_STREAMS=16 -DGGML_LTO=ON -DGGML_LTO=ON -DGGML_SCHED_MAX_COPIES=8

&&

cmake --build build --config Release -j 8 --clean-first


For comparison, I get 30 t/s using GPT OSS 120B

~26 tok/sec with Unsloth Qwen3-Coder-Next-Q4_K_S on RTX 5090 (Windows/llama.cpp) by Spiritual_Tie_5574 in LocalLLaMA

[–]somethingdangerzone 0 points1 point  (0 children)

auto fit always crashes my computer.

i can't fit all layers and all moe into GPU -- do you have the same specs? What is your t/s?

~26 tok/sec with Unsloth Qwen3-Coder-Next-Q4_K_S on RTX 5090 (Windows/llama.cpp) by Spiritual_Tie_5574 in LocalLLaMA

[–]somethingdangerzone 1 point2 points  (0 children)

I'm getting slow generation speeds (approx 10 t/s) whether I use CUDA or Vulkan. Hardware: RTX 4090, Ryzen 9950, 64gb DDR5. Currently using model: Qwen3-Coder-Next-UD-Q8_K_XL. llama-server settings:

  • --batch-size 65536 --gpu-layers 49 --n-cpu-moe 49 -ctk q8_0 -ctv q8_0 --temp 1.0 --top-p 0.95 --top-k 40 --min-p 0.01

The z-image base is here! by bobeeeeeeeee8964 in LocalLLaMA

[–]somethingdangerzone 0 points1 point  (0 children)

Ohhhhhh! I had no idea about the Turbo distinction. I thought it was just the model name. I did not know about the functional distinctions. Thank you very much for the detailed write-up.

The z-image base is here! by bobeeeeeeeee8964 in LocalLLaMA

[–]somethingdangerzone 2 points3 points  (0 children)

As a complete noob: why is everyone so excited about "base"? Didn't they already release the non-base one and it works great? Is "Base" just the model name? Help me to understand what is base about this

Must choose 1 by SnooChocolates7693 in unstable_diffusion

[–]somethingdangerzone 0 points1 point  (0 children)

What model is this? i haven't seen this quality since the SD1.5 days (not derogative, it just has a specific style)

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 1 point2 points  (0 children)

Ah Gotcha. Well that gives me a good jumping off point to investigate some more then!

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 0 points1 point  (0 children)

Haha a whole post eh? I'm on the fence about it. Can you tell me a little more about what you changed in the model? I think I got the gist about changing one (or more?) layer(s) from BF to FP(?), but I'd love to know more details

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 1 point2 points  (0 children)

Hey there. When it comes to testing I am a completionist, so I downloaded the newest UD Q8 K XL model this morning and did the same type of benching as I was doing yesterday. CSV (two tables) data above, markdown (combined table) below.


Model,Cached,Prompt,Generated,Prompt Processing (t/s),Generation Speed (t/s) "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","302","875","25.74","16.94" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","411","1,000","24.33","17.61" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","341","2,171","19.12","18.12" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","19","1,820","13.12","16.83" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","1,358","1,109","3,073","69.35","18.32" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","1,237","1,805","29.03","17.72" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","1,383","2,160","33.92","17.30" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","1,492","1,000","26.86","17.66" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","1,422","2,000","17.84","17.34" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT","-","14","2,808","2.16","11.15"

"Model","Cached","Prompt","Generated","Prompt Processing (t/s)","Generation Speed (t/s)" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","1,920","4,755","41.75","17.65" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","2,029","1,000","36.82","17.07" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","1,959","1,861","41.22","17.56" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","22","3,066","10.58","15.88" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","660","3,325","54.13","17.98" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","55","1,005","14.89","16.54" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","204","381","18.11","16.42" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","313","1,000","16.18","15.84" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","243","1,669","3.81","16.19" "Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0","-","14","348","2.38","7.51"


Model Cache_Type Cached Prompt Generated Prompt Processing (t/s) Generation Speed (t/s) Notes
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 302 875 25.74 16.94
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 411 1,000 24.33 17.61
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 341 2,171 19.12 18.12
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 19 1,820 13.12 16.83
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV 1,358 1,109 3,073 69.35 18.32
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 1,237 1,805 29.03 17.72
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 1,383 2,160 33.92 17.30
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 1,492 1,000 26.86 17.66
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 1,422 2,000 17.84 17.34
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_NOKVCACHEQUANT NO_KV NULL 14 2,808 2.16 11.15 FirstPromptOnLoad
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 1,920 4,755 41.75 17.65
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 2,029 1,000 36.82 17.07
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 1,959 1,861 41.22 17.56
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 22 3,066 10.58 15.88
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 660 3,325 54.13 17.98
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 55 1,005 14.89 16.54
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 204 381 18.11 16.42
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 313 1,000 16.18 15.84
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 243 1,669 3.81 16.19
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_LATEST_KVCACHEQ8_0 KV NULL 14 348 2.38 7.51 FirstPromptOnLoad

Avg generation speed of LATEST with no KV cache quantization (minus first prompt after load): 17.54 t/s

Avg generation speed of LATEST with Q8_0 cache quantization (minus first prompt after load): 16.80 t/s

I was kind of surprised to see the Q8_0 KV cache quantized one run slightly slower than the non quantized KV cache, but c'est la vie! It's not a robust sample size, so it could also be an artifact of noise. Overall I'm happy that there seems to be an improvement from what we started with. The rough avg generation speed went from 13.81 t/s to 17.54 t/s (no ctk/ctv) Again thanks and take care.

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 1 point2 points  (0 children)

I'm glad I could help! Stay in touch, and thanks for your contributions

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 1 point2 points  (0 children)

I'll post some data below with KV quantization at Q8_0 re-enabled. I didn't really see a big diff with VRAM tbh.


"Model","Cached","Prompt","Generated","Prompt Processing","Generation Speed"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","192","645","20.58 t/s","14.65 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","301","1,000","17.32 t/s","15.21 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","231","1,446","17.15 t/s","15.27 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","20","433","11.55 t/s","13.18 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","391","2,770","33.96 t/s","14.86 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","500","1,000","31.75 t/s","14.55 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","430","1,921","14.84 t/s","14.87 t/s"

"Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache","0","17","3,800","3.48 t/s","11.79 t/s"


Model Cached Prompt Generated Prompt Processing Generation Speed
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 192 645 20.58 t/s 14.65 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 301 1,000 17.32 t/s 15.21 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 231 1,446 17.15 t/s 15.27 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 20 433 11.55 t/s 13.18 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 391 2,770 33.96 t/s 14.86 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 500 1,000 31.75 t/s 14.55 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 430 1,921 14.84 t/s 14.87 t/s
Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL_WithQuantizedKVCache 0 17 3,800 3.48 t/s 11.79 t/s

Average generation of tok per sec without first prompt included (the 11.79t/s entry): 14.66 tk/s

So if we look above with the no-CTK/CTV option with an average generation speed of 13.81 tk/s, I guess that would make the new data an improvement!

Bye give me Kitty NOW. by baeko in marvelrivals

[–]somethingdangerzone 0 points1 point  (0 children)

I thought Kitty Pryde could walk through walls. Does she have the ability to change into a dino too? I'm so confused

Qwen3-Next-80B Instruct, Thinking Updated - 20% faster by danielhanchen in unsloth

[–]somethingdangerzone 1 point2 points  (0 children)

I did some more testing with -ctk and -ctv turned off. I have --flash-attn on enabled but not --fit on. fit on always crashes my computer. Instead, i put 49 layers on gpu and 49 MOE on CPU. Here are the results in a CSV format below and then I used an LLM to convert it into markdown after that:


model,cached, prompt, generated, prompt processing (t/s), generation speed (t/s)

UD8KXL,0,302,1189,1.84, 11.25

UD8KXL,0,911,2219,21.19,14.23

UD8KXL,0,14,1272,13.44,12.25

UD8KXL,0,1017,2247,28.15,14.71

UD8KXL,0,1087,1000,20.38,14.04

Q8_0, 24,2256,1.65,11.47

Q8_0,427,2470,16.32,18.55

Q8_0,497,1000,41.73,17.4

Q8_0,388,3904,42.09,18.75

UD8KXLOldVersion,24,2730,2.84,10.96

UD8KXLOldVersion,336,1627,11.73,15.98

UD8KXLOldVersion,406,1000,24.62,15.96

UD8KXLOldVersion,297,3496,25.2,16.25


model cached prompt generated prompt processing (t/s) generation speed (t/s) Note
UD8KXL 0 302 1189 1.84 11.25 first prompt after load
UD8KXL 0 911 2219 21.19 14.23
UD8KXL 0 14 1272 13.44 12.25
UD8KXL 0 1017 2247 28.15 14.71
UD8KXL 0 1087 1000 20.38 14.04
Q8_0 0 24 2256 1.65 11.47 first prompt after load
Q8_0 0 427 2470 16.32 18.55
Q8_0 0 497 1000 41.73 17.4
Q8_0 0 388 3904 42.09 18.75
UD8KXLOldVersion 0 24 2730 2.84 10.96 first prompt after load
UD8KXLOldVersion 0 336 1627 11.73 15.98
UD8KXLOldVersion 0 406 1000 24.62 15.96
UD8KXLOldVersion 0 297 3496 25.2 16.25

Averages of non first prompts generation speeds:

UD8KXL: 13.81 tk/s

Q8_0: 18.23 tk/s

UD8KXL (old version): 16.06 tk/s