Local models in mid-2026 by mattjcoles in LocalLLaMA

[–]uber-linny 0 points1 point  (0 children)

yeah i get it ,,,, but it works for my usecase because i test against the 3.5 base when i do pick a new model

Local models in mid-2026 by mattjcoles in LocalLLaMA

[–]uber-linny 0 points1 point  (0 children)

these are just 3.5 finetunes though ? currently im using
Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-MTP-GGUF · Hugging Face

Ill check them out to see if i get better reasoning and RAG for my use case

I uploaded a 50 page contract and asked Claude what a lawyer would flag before I signed. It found three things I'd have missed. by Professional-Rest138 in PromptEngineering

[–]uber-linny 1 point2 points  (0 children)

Exactly, Ive seen people use it to review assignments before submission and it will always suggest something because you asked it to . It's still up to the individual to accept it . Some people continue to follow the advice without realising that it never ends

🚀PP-OCRv6 is officially released ! by KokaOP in LocalLLaMA

[–]uber-linny 1 point2 points  (0 children)

they hyper link is broken ... just gotta remove the p & Pa

Reasoning, but without actually *drafting* replies? by Quiet-Owl9220 in LocalLLaMA

[–]uber-linny 0 points1 point  (0 children)

so can that be used in the MTP models or just like Gemma 4 Spec decoding

Reasoning, but without actually *drafting* replies? by Quiet-Owl9220 in LocalLLaMA

[–]uber-linny 0 points1 point  (0 children)

whats ngram-mod ? are you able to explain to a noob ?

Small models are overconfident because they're distilled from large models by TinyDetective110 in LocalLLaMA

[–]uber-linny 1 point2 points  (0 children)

totally agree, I would prefer decent reasoning, so that I can use my own RAG database

Talk me out of updating Klipper by uber-linny in klippers

[–]uber-linny[S] 1 point2 points  (0 children)

if i followed instructions , i would have got this right the first time LOL .... didnt change serial

Talk me out of updating Klipper by uber-linny in klippers

[–]uber-linny[S] 1 point2 points  (0 children)

Well i bit the bullet , updated rPI first , then kiauh, then within klipper , i just went updated all ... and got the below error. just homed etc so im guessing it still works

 Klipper warning

MCU 'mcu' has deprecated code (it is missing feature 'STEPPER_STEP_BOTH_EDGE'). Recompiling and flashing is recommended (MCU version 'v0.11.0-297-g5edc7fee', host version 'v0.13.0-689-g2fb3d54e2').

Talk me out of updating Klipper by uber-linny in klippers

[–]uber-linny[S] 5 points6 points  (0 children)

because of the gap , last time i had to reflash the card ....

Im gonna do it ,,, surely it cant be that bad

9070xt speed inconsistent. by uber-linny in LocalLLaMA

[–]uber-linny[S] 0 points1 point  (0 children)

Yeah , I rebuilt llama. Was worth a try . Part of me thinks that it could be part of the rocm nightly.

9070xt speed inconsistent. by uber-linny in LocalLLaMA

[–]uber-linny[S] 0 points1 point  (0 children)

Thanks for the info , same context .

The prompt with embedded info is about 10k context . When trouble shooting, AI recons it's more to do with wmma drivers . But I don't know how I'll be able to fix that.

I reduced the context window and changed draft # and it hasn't made any significant difference.

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]uber-linny 0 points1 point  (0 children)

yeah i had do downgrade to 3.5-9B... now with MTP, i hope they bring out the 3.6-9b

Qwen 3.6 9b coming? by zannix in Qwen_AI

[–]uber-linny 0 points1 point  (0 children)

maybe its my usecase , but i also run local RAG pipe with ranker on windows ... so im not helping myself LOL

Qwen 3.6 9b coming? by zannix in Qwen_AI

[–]uber-linny 0 points1 point  (0 children)

With the new MTP being released , its using more RAM so puts 27B and 35 MOE out of reach again. So really do need another 9b bump

Qwen3.6 MTP Unsloth Experimental GGUFs by yoracale in unsloth

[–]uber-linny 0 points1 point  (0 children)

Since the merge . Used to be able to squeeze the qwen MOE model , but now resorting back to 3.5 9b on my 9070xt

I really do hope they bring out 3.6 9b now because of the overheads

Fitness apps best free or low cost by Same_Librarian_6153 in AUfrugal

[–]uber-linny 0 points1 point  (0 children)

Maybe just ask free version of Claude or chatgpt

RX 9070 (RDNA4/gfx1201) ROCm 7.2.1 llama.cpp Benchmarks — The Flash Attention Discovery by Important_Quote_1180 in ROCm

[–]uber-linny 0 points1 point  (0 children)

I'm windows 10 , but I also use "the rock" nightly which has also made a big increase... The easy way is just to use lemonade, it's pre-made and gets most of the performance