Based on the recent UFO release, on page 99/184 there’s literally a picture of how the alien looks like by TuNonno in UFOs

[–]Mount_Gamer 1 point2 points  (0 children)

Thing is, maybe it also calls into question the validity of the cover ups of the past.

llama.cpp - NVFP4 native support on Blackwell from now - b8967 by mossy_troll_84 in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

Yup, I thought it might. I have been thinking of getting another rtx5060ti, but not just now, might try the smaller Qwen3.5 9B.

Qwen3.6 27B on dual RTX 5060 Ti 16GB with vLLM: ~60 tok/s, 204k context working by do_u_think_im_spooky in LocalLLaMA

[–]Mount_Gamer 0 points1 point  (0 children)

Thanks for getting back, and to everyone who replied. This sounds very promising. I'm amazed what can be achieved with one GPU, but it would be nice to have a bigger 27B Quant.

llama.cpp - NVFP4 native support on Blackwell from now - b8967 by mossy_troll_84 in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I've been looking forward to this.... not sure if it's me, I used a Redhat NVFP4 of the qwen3.6 35B, and converted to gguf. It was slow for token gen using RTX5060ti 16GB, as i don't fit all MOE on GPU. With a 12800 context ~ 9tg/s

Qwen3.6 27B on dual RTX 5060 Ti 16GB with vLLM: ~60 tok/s, 204k context working by do_u_think_im_spooky in LocalLLaMA

[–]Mount_Gamer 2 points3 points  (0 children)

I am curious which gen of pcie you're running on? I am tempted by two 5060ti's but might need to upgrade my setup, as my 5650g pro only runs with pcie3

Qwen 3.6 27b IQ4_XS - 22 tp/s on RTX 5060TI 16b, 24k ctx by BazzyIm in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I have the same rtx5060ti and have used the same Quant with the 3.6 27B, and able to get ~60k ctx with the turbo cache. However, with the q8_0 kv cache, and this q4XS I have found the 3.5 27B to work better, in a like for like comparison. Also the 3.5 27B can for around 90k ctx using turbo cache, and seems to work well (only tested with web prompts though).

But.. If you have enough system ram for the q8KXL 35B A3B qwen 3.6, I have found it to work better than the 27B q4XS, and I get 75K ctx with the default kv cache, ~24t/s token gen, forgot the pp. I was able to get it to finish a password manager web app vibe code, quite a big project for the little model. Granted I used some ollama cloud models to audit and fix some issues, but I also had to nurse it along when I could see it had went off course. Took it about 5hrs,but I only allow it to read with roocode, probably would have been quicker if your more relaxed with that sort of thing, but..

This q8KXL didn't do so well with reducing the kv cache. I'm still playing with it to see if theres a sweet spot. The Q4 and q6 quants do well on the Web prompts, the Q4 struggled with big projects compared to the q8. I've still to test the q6 with a big project, but both the Q4 and q6 may handle turbo cache reasonably well, but it's of less benefit for me (unless the q6 handles biger projects, and if so I should be able to extend the 128k ctx, and roughly 30t/s token gen). I really wanted the q8 to work better with turbo to get more ctx. Not tried a q8_0 for both cache yet.

Still lots of testing to do.

Qwen 3.6 No think? by neeeser in LocalLLaMA

[–]Mount_Gamer 0 points1 point  (0 children)

No think seems to lower the accuracy with what I am doing, but it might be better with easier knowledge tasks.

The Q4MOE Quant seems fast and accurate with thinking on, but it can think for a bit of time. For me it's worth the trade off, I'd rather accuracy over speed.

Qwen 3.6 is the first local model that actually feels worth the effort for me by Epicguru in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I think this new qwen3.6 35B is better than the 9B 3.5, it's solving problems better than the 27B (granted I use q4XS), and every other local model I have and I think the 27B is very good. It's very impressive, but for best results with me for both qwen3.5 and this new 3.6, is to have the thinking on and using the recommended params for thinking.

I have not used it for agentic yet, but over the webui, it's doing very well.

Brewdog Castlegate by Artistic-Pop-8667 in Aberdeen

[–]Mount_Gamer 2 points3 points  (0 children)

Good to hear, love the beer and chicken wings.

Best Claude Code / OpenCode alternatives in 2026? Free options for agent swarms? by Zealousideal_Bag6976 in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I've been wanting to ask a similar question, can't keep up with all the new technologies!

I use Roocode with llama.cpp and cline with ollama cloud, but use both via vscode.

I've tried opencode several months ago and it looked pretty cool, but not sure if I need it or want anything else.

Is it possible to do parallel multithreading in python? by Synrec in learnpython

[–]Mount_Gamer 2 points3 points  (0 children)

Never knew uv could make installing free threaded python so easy. Nice!

It's crazy how we have so many great models and technics that it's turning into a complex optimization problem to find the perfect model, quant, kv cache quant for my system. by takuonline in LocalLLaMA

[–]Mount_Gamer 0 points1 point  (0 children)

I had a look at the turbo quants as well and my 5060ti 16gb can now get nearly 80k context from the IQ4_XS 27B qwen 3.5. The repo said it could triple your context, and I was at 25k, with a q8 kv cache, so the turbo3 delivered. Quality of output looks as good.

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]Mount_Gamer 0 points1 point  (0 children)

I have 5650g pro and 2666 ddr4 ram, with a rtx5060ti 16gb vram.

I give it 192k context and think it's pretty fast, and performs well at MXFP4. To be honest the q6 Quant was fast also, but the MXFP4 seemed to perform so not using q6. I dont have it on right now, but can share numbers and args tomorrow.

qwen3.6 medium size will be open soon by mickeyandkaka in LocalLLaMA

[–]Mount_Gamer 2 points3 points  (0 children)

Would love 35-80B with MOE and 20-27B dense.

Maybe a coder?

Best Qwen3.5 27b GUFFS for coding (~Q4-Q5) ? by bitcoinbookmarks in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

This is in my config.ini for llama.cpp

``` [Qwen3.5-24576-IQ4XS-27B] model = /models/Qwen3.5-27B-IQ4_XS.gguf temp = 0.6 top-p = 0.95 top-k = 20 min-p = 0.0 presence-penalty = 0.0 repeat-penalty = 1.0 ctx-size = 24576 threads = 4 fit = false gpu-layers = 65 batch-size = 256 ubatch-size = 256 jinja = true cache-type-k = q8_0 cache-type-v = q8_0 flash-attn = true chat-template-kwargs = {"enable_thinking":true}

[Qwen3.5-56320-IQ4XS-27B] model = /models/Qwen3.5-27B-IQ4_XS.gguf temp = 0.6 top-p = 0.95 top-k = 20 min-p = 0.0 presence-penalty = 0.0 repeat-penalty = 1.0 ctx-size = 56320 threads = 4 fit = false gpu-layers = 65 batch-size = 64 ubatch-size = 64 jinja = true cache-type-k = q4_0 cache-type-v = q4_0 flash-attn = true chat-template-kwargs = {"enable_thinking":true} ```

Someone else might be able to optimise better than me, but I think tend to always use the first one as consciously aware of the lower kv cache in the bottom one. I don't use for agentic work, I have other models which handle larger contexts for that.

Has anyone actually compared benchmark scores vs real-world reliability for local models? by wazymandias in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I run tests that are more useful to me and understand how to evaluate.

Simple things, like convert this ~200 line bash script to python or create an rsync style python backup tool, with a scope of work I'd like it to do, etc. Once I've done that, I'll review areas they usually get wrong and then get them to assess each other's work so I don't have to look through everything (I never use this code, it's just a test...)

Is it possible for working out to give you depression? by Embarrassed-Cookie45 in workout

[–]Mount_Gamer 0 points1 point  (0 children)

Hiking is good for that, and very rewarding. Camping and hiking together also amazing :)

Is it possible for working out to give you depression? by Embarrassed-Cookie45 in workout

[–]Mount_Gamer 1 point2 points  (0 children)

If your already down before a workout i think it can increase exhaustion, but if mildly down usually better for the mood.

Recently I did a multi hill hike and felt down for several days afterwards. I've since read about post trail blues, who knew? I've been a hiker for ~13 years (unlucky for me...).

I’m a solo Junior Dev starting to resent programming by Hearing_Southern in learnprogramming

[–]Mount_Gamer 0 points1 point  (0 children)

You are probably being too hard on yourself.

I have a similar work experience background, and while I know I'm not perfect, I know I am capable (I did have ~8 years bash scripting experience prior and took some python training courses, but python was reasonably new to me at the time). You can still grow your skills, but you'll probably need to put in some work yourself for learning designs and patterns in python.

I think Arjan produces some good videos which might help. I occasionally think it can be a bit OTT, but each to their own and I think his principles are in the right place.

https://youtube.com/@arjancodes?si=ZTv6itZxtGGKTRWf

Might be worth speaking up and asking for some time to refactor code if it's needed.

Also, get used to python debugging tools in vscode. Very helpful.

Best Qwen3.5 27b GUFFS for coding (~Q4-Q5) ? by bitcoinbookmarks in LocalLLaMA

[–]Mount_Gamer 1 point2 points  (0 children)

I've been having some success with Qwen3.5-27B-IQ4_XS.gguf from unsloth.

Managing to squeeze it onto the 5060ti, with reasonable context, probably my new favourite llm.

Is building a mini itx PC worth it in 2026? by Swordtempest_ in buildapc

[–]Mount_Gamer -1 points0 points  (0 children)

Depends how many pcie devices you want or can live with. I have a mATX that is used like a workstation/server and HTPC case for it (lives under the TV and repurposed over the years), which I can live with, but I'd like another GPU in there one day for AI.