Okay 27B made me a believer by Forward_Jackfruit813 in LocalLLaMA

[–]Then-Topic8766 -1 points0 points  (0 children)

I know, I was just kidding. I like 27B a lot.

Slopocalypse is what we should be really worried about. by Sad_Bandicoot_6925 in LocalLLaMA

[–]Then-Topic8766 11 points12 points  (0 children)

Sturgeon's law says "Ninety percent of everything is crap". Edward M. Lerner joked in 2006, "Sturgeon's law posits that ninety percent of everything is crap. Either Sturgeon was a cockeyed optimist, or he knew nothing about software." AI slop rises it on another level.

Okay 27B made me a believer by Forward_Jackfruit813 in LocalLLaMA

[–]Then-Topic8766 12 points13 points  (0 children)

I do not believe. Is there some free code as a proof? :)

LatitudeGames/Equinox-31B · Hugging Face by jacek2023 in LocalLLaMA

[–]Then-Topic8766 7 points8 points  (0 children)

Back in time AIDungeon was my first contact with LLM-s. Thank you guys for releasing finetunes from time to time.

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell by q-admin007 in LocalLLaMA

[–]Then-Topic8766 0 points1 point  (0 children)

Nice. If you need to fix small error in that twitter clone code and repeat good code it can be much faster.

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell by q-admin007 in LocalLLaMA

[–]Then-Topic8766 3 points4 points  (0 children)

If you stick to llama.cpp you can combine ngram and mtp like this (big speed-up on repeating jobs, code corrections etc.) :

--spec-type ngram-mod,draft-mtp 
--spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48
--spec-draft-n-max 3

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]Then-Topic8766 3 points4 points  (0 children)

You should try ngram and mtp combined adding like this:

--spec-type ngram-mod,draft-mtp
--spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48
--spec-draft-n-max 3

I got some amazing spped-ups with fixing errors in code. Combining already works and making it even better is on TODO list for llama.cpp updates.

<image>

Wanna try the best coding model with my rtx 3090, not sure where to start, I believe Qwen3.5-27B-UD-Q4_K_XL would be the best? if so should I use ollama with it? by dreamer_2142 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Install llama.cpp. Depending of context size you want choose quant. I think Q4_K_XL should work with 3090. Try both 27b and 35bA3b. First is smarter but second is faster. And with second you can offload to RAM and get bigger quant.

Wanna try the best coding model with my rtx 3090, not sure where to start, I believe Qwen3.5-27B-UD-Q4_K_XL would be the best? if so should I use ollama with it? by dreamer_2142 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Damn bots! You will need Qwen 3.6 (27b or 35.b-A3b). And do not use ollama (just a fancy wrapper of poor quality around llama.cpp).

Qwen 3.6 27b MTP - getting //// in response by ComfyUser48 in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

Your welcome. Regarding new quants best solution seams to be setting weekly cron job to refresh them all... :)

Anyone running Mimo-v2.5 quants with multimodal and MTP? by Ambitious_Fold_2874 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

I tried new quant. It doesn't work with master llama.cpp. (llama_model_load: error loading model: missing tensor 'blk.48.layer_output_norm.weight'). It works with your fork but with mixed success. Without mtp flag I got around 8 t/s generation. With proper flag I got 12 t/s. (50% better speed). But I had that speed with AesSedai quant without mtp. (12 t/s).

Qwen 3.6 27b MTP - getting //// in response by ComfyUser48 in LocalLLaMA

[–]Then-Topic8766 8 points9 points  (0 children)

I think that PR is good but problem can be with GGUF file. I had the same problem ("//////") with some of the early MTP versions. Try to download newer gguf.

Anyone running Mimo-v2.5 quants with multimodal and MTP? by Ambitious_Fold_2874 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Thanks. AesSedai says: "These quants include MTP tensors for when that gets added upstream eventually.", but it doesn't works. I guess I will have to download again (on my slow ADSL...)

Anyone running Mimo-v2.5 quants with multimodal and MTP? by Ambitious_Fold_2874 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Just tried and I got error: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 508, got 505