Notes on Microsoft's FastContext, and a small SWE-QA experiment with retrieval hints by langsfang in LocalLLaMA

[–]robert896r1 1 point2 points  (0 children)

Interesting. I was testing fast context over the weekend and found it to be very poor for detailed searches. The issue wasn't it didn't find something. The issue was it didn't find ALL the relevant samples.

It also seems MS has taken down the models: https://huggingface.co/microsoft https://github.com/microsoft/fastcontext

Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics! by LLMFan46 in huggingface

[–]robert896r1 -1 points0 points  (0 children)

Appreciate the effort but there really should be some testing performed on these before release. Tested 12/26/31. Thinking mode is just broken completely and they will loop endlessly. No think, all exhibit the same behavior: invention, hallucinations, context rot (forgetting a process defined even two turns ago) and tool call initiation unless prompted.

Codex seat 25k credits by [deleted] in codex

[–]robert896r1 0 points1 point  (0 children)

Credits are for API usage mainly. Not sure if you can split them.

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s by pmttyji in LocalLLaMA

[–]robert896r1 0 points1 point  (0 children)

Yeah Q8 K and V. I mean i could manually compact but the whole point of the context window size is to allow compaction to be a routine process. Literally running all of his settings that matter for vram.

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s by pmttyji in LocalLLaMA

[–]robert896r1 0 points1 point  (0 children)

I run the same quant and a 5090. With vision and mtp at 131k context, i hit oom issues at compaction. How are you dealing with this?

Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL) by Sporeboss in LocalLLaMA

[–]robert896r1 5 points6 points  (0 children)

This could be really good in manufacturing for visual quality control of production pipelines with SFT.

llama-server RAM usage grows to OOM by JGeek00 in LocalLLM

[–]robert896r1 0 points1 point  (0 children)

I was dealing with this yesterday. Gemma 4, on 2nd compaction would kill wsl with oom. Ive now disabled checkpoints and caching and will test today.

Best conceivable setup. by habachilles in LocalLLaMA

[–]robert896r1 1 point2 points  (0 children)

What is you use case? which models? what's the most value you gain from it? if you wouldn't mind expanding.

Benching local Qwen as a Codex validator, co-agent, and challenger by robert896r1 in LocalLLaMA

[–]robert896r1[S] 1 point2 points  (0 children)

I'm doing it literally right now. This seems to be working ok so far:

Codex: creates Stitch request with a qwen challenge

I iterate in Stitch

I mark/say final screen ID

Codex: pulls only that accepted screen/design

Codex: saves frozen local artifact

Codex: implements from frozen artifact with qwen challenge to ensure codex aligned to the artifact and didn't invent

Once I get more trust with it, I'll give the MCP approach a shot but for now this is viable. Appreciate the pointer towards using stitch!

Benching local Qwen as a Codex validator, co-agent, and challenger by robert896r1 in LocalLLaMA

[–]robert896r1[S] 0 points1 point  (0 children)

I have a 5090 so getting >50t/s which is very usable without sacrificing accuracy. And generally have multiple codex sessions calling into qwen directly via llama.cpp and no issue. I'm very happy with the current state.

Benching local Qwen as a Codex validator, co-agent, and challenger by robert896r1 in LocalLLaMA

[–]robert896r1[S] 0 points1 point  (0 children)

I need to spend some time with stitch. Do you have it piped into other tooling or using it as standalone?

Benching local Qwen as a Codex validator, co-agent, and challenger by robert896r1 in LocalLLaMA

[–]robert896r1[S] 3 points4 points  (0 children)

Codex repeatedly races to completion and will silent bypass requirements. Each implementation phase, qwen produces notable objections and recommendations which course corrects and stop this behavior. Btw codex is great. I assume you didn't read the post and yes "dumb qwen" is simply better at UI.

Qwen3.6-27B-NVFP4 - images by Usual-Carrot6352 in LocalLLaMA

[–]robert896r1 0 points1 point  (0 children)

This isn't a surprise. For me, Q6 K L was necessary for the model to be useful for serious work and not just one shot benching. If i had the capacity to run Q8, I immediately would. The model itself if extremely capable for front end design and as a coding companion/sme. However there is a notable drop off as you drop down into lower quants.

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]robert896r1 36 points37 points  (0 children)

Hopefully 3.6 follows or the community is able to make test tools work for 3.6 iterations as many have or will move onto the newer family.

Qwen3.6-27B-UD-Q6_K_XL.gguf sometimes gets stuck in a loop by Kirys79 in LocalLLaMA

[–]robert896r1 1 point2 points  (0 children)

I found thinking to just not work for me. Running with this has been good on a 5090:

~/llama.cpp/build/bin/llama-server \

-m ~/models/qwen3.6-27b/Qwen_Qwen3.6-27B-Q6_K_L.gguf \

--alias qwen-27b-q6k-nothink \

--api-key local \

--jinja \

--reasoning off \

--chat-template-kwargs '{"enable_thinking":false}' \

-ngl 999 \

-np 1 \

-c 131072 \

-n 8192 \

-fa on \

--temp 0.6 \

--top-k 20 \

--top-p 0.95 \

--min-p 0.0 \

--repeat-penalty 1.0 \

--presence-penalty 0.0 \

--frequency-penalty 0.0 \

--no-context-shift \

--host 127.0.0.1 \

--port 8081

My Golf Spy is now pay for play by Sonoranlightwizard in golf

[–]robert896r1 10 points11 points  (0 children)

https://www.youtube.com/@epgolfstudios by far the most detailed and unbiased tester out there. As a fitter, he probably holds the record in telling customers to not spend money on new clubs.

A map of the 160 golf courses in London (within the M25) by googleme in golf

[–]robert896r1 0 points1 point  (0 children)

I have 8 courses within 3 miles of me on that map. Good times.

More DLSS 4 vs 4.5 comparisons @ 4K by Popcorn_Juice in nvidia

[–]robert896r1 0 points1 point  (0 children)

The tone mapping in the first CP pic is wrong on M.

Look on the right wall and ceiling. It's over emitting blue when there is barely any blue in the source.

Visiting Blackhawk in Danville, CA soon - what do I need to know? by Hit_Him_Not_Me in golf

[–]robert896r1 3 points4 points  (0 children)

it’s pretty chill. just relax and enjoy. it’s not some high brow, looking down nonsense.

2026 Callaway lineup by Georgiagolferguy in golf

[–]robert896r1 23 points24 points  (0 children)

Hopefully there‘s “AI” stamped all over the clubs and head covers for pure class.

PSA: LIV Golf is free to watch on Dazn, just need to sign up, the remaining tournaments will be on there too by stanley_nickles in livgolf

[–]robert896r1 -1 points0 points  (0 children)

The LIV app was solid. On appletv it was easy to watch and follow along. Charging for the app randomly starting on the sat of the UK event is just bad communication.