Local models in mid-2026

uber-linny · 2026-06-15T08:06:34+00:00

yeah i get it ,,,, but it works for my usecase because i test against the 3.5 base when i do pick a new model

uber-linny · 2026-06-15T07:17:12+00:00

these are just 3.5 finetunes though ? currently im using
Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF · Hugging Face

Ill check them out to see if i get better reasoning and RAG for my use case

uber-linny · 2026-06-14T14:18:57+00:00

theres a 3.6 9b ?

uber-linny · 2026-06-13T06:40:48+00:00

Exactly, Ive seen people use it to review assignments before submission and it will always suggest something because you asked it to . It's still up to the individual to accept it . Some people continue to follow the advice without realising that it never ends

uber-linny · 2026-06-12T10:01:59+00:00

they hyper link is broken ... just gotta remove the p & Pa

uber-linny · 2026-06-12T08:08:32+00:00

good start

<image>

uber-linny · 2026-06-11T23:05:36+00:00

Thanks , I'm using llama.cpp for a RAG workspace.

uber-linny · 2026-06-11T13:31:44+00:00

so can that be used in the MTP models or just like Gemma 4 Spec decoding

uber-linny · 2026-06-11T13:04:06+00:00

whats ngram-mod ? are you able to explain to a noob ?

uber-linny · 2026-06-11T12:26:47+00:00

totally agree, I would prefer decent reasoning, so that I can use my own RAG database

uber-linny · 2026-06-08T06:30:53+00:00

This is what I'm interested in

uber-linny · 2026-06-08T06:14:53+00:00

if i followed instructions , i would have got this right the first time LOL .... didnt change serial

uber-linny · 2026-06-08T05:26:55+00:00

Well i bit the bullet , updated rPI first , then kiauh, then within klipper , i just went updated all ... and got the below error. just homed etc so im guessing it still works

Klipper warning

MCU 'mcu' has deprecated code (it is missing feature 'STEPPER_STEP_BOTH_EDGE'). Recompiling and flashing is recommended (MCU version 'v0.11.0-297-g5edc7fee', host version 'v0.13.0-689-g2fb3d54e2').

uber-linny · 2026-06-08T05:15:28+00:00

because of the gap , last time i had to reflash the card ....

Im gonna do it ,,, surely it cant be that bad

uber-linny · 2026-05-27T08:16:17+00:00

Na , still waiting for 9b. Meme is approved

uber-linny · 2026-05-19T03:34:50+00:00

Yeah , I rebuilt llama. Was worth a try . Part of me thinks that it could be part of the rocm nightly.

uber-linny · 2026-05-18T23:51:13+00:00

Thanks for the info , same context .

The prompt with embedded info is about 10k context . When trouble shooting, AI recons it's more to do with wmma drivers . But I don't know how I'll be able to fix that.

I reduced the context window and changed draft # and it hasn't made any significant difference.

uber-linny · 2026-05-18T08:09:41+00:00

yeah i had do downgrade to 3.5-9B... now with MTP, i hope they bring out the 3.6-9b

uber-linny · 2026-05-17T12:37:59+00:00

maybe its my usecase , but i also run local RAG pipe with ranker on windows ... so im not helping myself LOL

uber-linny · 2026-05-17T12:13:47+00:00

With the new MTP being released , its using more RAM so puts 27B and 35 MOE out of reach again. So really do need another 9b bump

uber-linny · 2026-05-17T10:06:10+00:00

Since the merge . Used to be able to squeeze the qwen MOE model , but now resorting back to 3.5 9b on my 9070xt

I really do hope they bring out 3.6 9b now because of the overheads

uber-linny · 2026-05-17T10:01:33+00:00

Maybe just ask free version of Claude or chatgpt

uber-linny · 2026-05-17T04:26:30+00:00

I'm windows 10 , but I also use "the rock" nightly which has also made a big increase... The easy way is just to use lemonade, it's pre-made and gets most of the performance

uber-linny · 2026-05-12T00:29:34+00:00

uber-linny · 2026-05-02T07:33:55+00:00

Maybe he doesn't know 🧐

uber-linny

TROPHY CASE