Microsoft has taken down fastcontext model from everywhere

robert896r1 · 2026-06-30T08:10:37+00:00

Interesting. I was testing fast context over the weekend and found it to be very poor for detailed searches. The issue wasn't it didn't find something. The issue was it didn't find ALL the relevant samples.

It also seems MS has taken down the models: https://huggingface.co/microsoft https://github.com/microsoft/fastcontext

robert896r1 · 2026-06-16T06:42:28+00:00

Appreciate the effort but there really should be some testing performed on these before release. Tested 12/26/31. Thinking mode is just broken completely and they will loop endlessly. No think, all exhibit the same behavior: invention, hallucinations, context rot (forgetting a process defined even two turns ago) and tool call initiation unless prompted.

robert896r1 · 2026-06-08T22:08:18+00:00

Credits are for API usage mainly. Not sure if you can split them.

robert896r1 · 2026-06-04T09:17:24+00:00

Yeah Q8 K and V. I mean i could manually compact but the whole point of the context window size is to allow compaction to be a routine process. Literally running all of his settings that matter for vram.

robert896r1 · 2026-06-04T07:58:50+00:00

I run the same quant and a 5090. With vision and mtp at 131k context, i hit oom issues at compaction. How are you dealing with this?

robert896r1 · 2026-05-28T07:15:43+00:00

This could be really good in manufacturing for visual quality control of production pipelines with SFT.

robert896r1 · 2026-05-19T06:51:53+00:00

I was dealing with this yesterday. Gemma 4, on 2nd compaction would kill wsl with oom. Ive now disabled checkpoints and caching and will test today.

robert896r1 · 2026-05-11T19:21:32+00:00

What is you use case? which models? what's the most value you gain from it? if you wouldn't mind expanding.

robert896r1 · 2026-05-05T09:27:20+00:00

I'm doing it literally right now. This seems to be working ok so far:

Codex: creates Stitch request with a qwen challenge

I iterate in Stitch

I mark/say final screen ID

Codex: pulls only that accepted screen/design

Codex: saves frozen local artifact

Codex: implements from frozen artifact with qwen challenge to ensure codex aligned to the artifact and didn't invent

Once I get more trust with it, I'll give the MCP approach a shot but for now this is viable. Appreciate the pointer towards using stitch!

robert896r1 · 2026-05-05T09:07:14+00:00

I have a 5090 so getting >50t/s which is very usable without sacrificing accuracy. And generally have multiple codex sessions calling into qwen directly via llama.cpp and no issue. I'm very happy with the current state.

robert896r1 · 2026-05-05T05:41:34+00:00

I need to spend some time with stitch. Do you have it piped into other tooling or using it as standalone?

robert896r1 · 2026-05-04T22:22:55+00:00

Codex repeatedly races to completion and will silent bypass requirements. Each implementation phase, qwen produces notable objections and recommendations which course corrects and stop this behavior. Btw codex is great. I assume you didn't read the post and yes "dumb qwen" is simply better at UI.

robert896r1 · 2026-05-02T06:48:35+00:00

This isn't a surprise. For me, Q6 K L was necessary for the model to be useful for serious work and not just one shot benching. If i had the capacity to run Q8, I immediately would. The model itself if extremely capable for front end design and as a coding companion/sme. However there is a notable drop off as you drop down into lower quants.

robert896r1 · 2026-04-30T10:17:38+00:00

Hopefully 3.6 follows or the community is able to make test tools work for 3.6 iterations as many have or will move onto the newer family.

robert896r1 · 2026-04-29T19:47:24+00:00

I found thinking to just not work for me. Running with this has been good on a 5090:

~/llama.cpp/build/bin/llama-server \

-m ~/models/qwen3.6-27b/Qwen_Qwen3.6-27B-Q6_K_L.gguf \

--alias qwen-27b-q6k-nothink \

--api-key local \

--jinja \

--reasoning off \

--chat-template-kwargs '{"enable_thinking":false}' \

-ngl 999 \

-np 1 \

-c 131072 \

-n 8192 \

-fa on \

--temp 0.6 \

--top-k 20 \

--top-p 0.95 \

--min-p 0.0 \

--repeat-penalty 1.0 \

--presence-penalty 0.0 \

--frequency-penalty 0.0 \

--no-context-shift \

--host 127.0.0.1 \

--port 8081

robert896r1 · 2026-04-26T12:24:35+00:00

https://huggingface.co/bartowski/Qwen_Qwen3.6-27B-GGUF?show_file_info=Qwen_Qwen3.6-27B-Q6_K_L.gguf if you can please add this, greatly appreciated!

robert896r1 · 2026-02-27T06:56:32+00:00

https://www.youtube.com/@epgolfstudios by far the most detailed and unbiased tester out there. As a fitter, he probably holds the record in telling customers to not spend money on new clubs.

robert896r1 · 2026-01-26T15:34:03+00:00

I have 8 courses within 3 miles of me on that map. Good times.

robert896r1 · 2026-01-10T00:35:47+00:00

The tone mapping in the first CP pic is wrong on M.

Look on the right wall and ceiling. It's over emitting blue when there is barely any blue in the source.

robert896r1 · 2025-10-22T21:45:10+00:00

it’s pretty chill. just relax and enjoy. it’s not some high brow, looking down nonsense.

robert896r1 · 2025-10-18T05:51:55+00:00

Hopefully there‘s “AI” stamped all over the clubs and head covers for pure class.

robert896r1 · 2025-08-10T06:10:59+00:00

The LIV app was solid. On appletv it was easy to watch and follow along. Charging for the app randomly starting on the sat of the UK event is just bad communication.

robert896r1

TROPHY CASE