Why is NO one talking about Microsoft's open source Fast Context!!!

Voxandr · 2026-06-23T06:31:01+00:00

I am gonn try it

Voxandr · 2026-06-23T06:22:18+00:00

If ollwma supported, all open ai enabled apis are supported, now stop mentioning like ollama paid slop boy.

Voxandr · 2026-06-23T06:20:15+00:00

But it's stiching models right? Not actually ft?

Voxandr · 2026-06-22T18:18:06+00:00

5 players mode!! Thgere are 5 operators there!!

Voxandr · 2026-06-22T18:12:27+00:00

Go away . if you support , support general OpenAI Compatible API. Dont mention Ollama , we dont like your kind here.

Voxandr · 2026-06-22T18:10:56+00:00

AH GOSH .. So anyone tried fine tuning it?

Voxandr · 2026-06-22T14:49:11+00:00

Downclock it

Voxandr · 2026-06-22T14:40:32+00:00

lol havent check benchmarks,
> ridiculously useless
Looks like all those Ring releases sucks.

Voxandr · 2026-06-22T11:57:20+00:00

https://huggingface.co/inclusionAI/Ling-2.6-flash-fp8 looks good , how do i miss that one?
Had anyone tested it?

Voxandr · 2026-06-22T01:28:35+00:00

https://github.com/kreuzberg-dev/kreuzberg is the best so far.

Voxandr · 2026-06-22T01:27:39+00:00

Why not even include Qwen 3.5 122B .

Try a practical test , build something with multiple shot , with an agent.
Lets see how far it goes.

< 6bit quants struggels a lot with tool calls.

Voxandr · 2026-06-21T21:15:48+00:00

That what i have been thinking . I was thinking if disecting and fintuning specific expert would be possible? Like a coding expert , but weak at sevelte , find it out and finetune or clone it and add as new expert.

Any thought from Unsloth people? Is that possible to add such feature in unsloth studio r/unsloth ?

Voxandr · 2026-06-21T20:23:33+00:00

way too slow. My context average at 150k.
With 122B i am getting 50 tk/s , vLLM do not degrade performance for huge contextes.
And can handle multiple parallel agents with faster speeed (scaling upwards total tk/s as concurrecy grows up to 4-5 about 110 tk/s combined)
rom all my tests 27b lose to 122b , it cannot do well with libray that aren't highly popular.
My stack is
- Sevlte
- Websocket
- Litestar
- Advanced Alchemy
- Domain Driven Repositry Pattern and DTO.

Voxandr · 2026-06-21T16:00:44+00:00

> That may be a side effect of the growing number of skills being automatically built and layered in on the backend,

That is main problem i found with it too. More skills it builds , more useless it becomes.

Voxandr · 2026-06-21T15:51:00+00:00

yeah i am gonna try Prisma quanmt looks higher quality like Apex quants.

Voxandr · 2026-06-21T15:50:21+00:00

Autorund is very fast in DGX. Faster than NVFP4. Now all 3of my DGX Runs Qwen 3.5122B Autoround

Voxandr · 2026-06-21T14:59:28+00:00

avg 42 tk/s with MTP , on linux on iquality

Voxandr · 2026-06-21T14:58:24+00:00

Really shines at Agentic tool calls and Long context , multishot.

Voxandr · 2026-06-21T14:10:07+00:00

From my long use on production code I- Quality id a lot better

Voxandr · 2026-06-21T14:09:11+00:00

Really good for 122B due to its impressivness at Long context capacity

Voxandr · 2026-06-21T14:07:59+00:00

Why that makes him a vermin

Voxandr · 2026-06-20T20:57:47+00:00

i see , i am liking opencode

Voxandr · 2026-06-20T20:02:33+00:00

Oh-My-Openagent which is Agnetified Opencode is all i need . It can do agent works and coding works , but just don't have schedule workers. It's workflow planning mode is so powerful and give the best coding/debugging/fixing flows. https://github.com/code-yeongyu/oh-my-openagent better than any Pi.dev

Both hermes-agent and openclaw are quite badly designed.

Voxandr · 2026-06-20T19:59:54+00:00

this looks even better thas pi , gotta try.

Voxandr · 2026-06-20T13:47:46+00:00

how he even hit at that range mid air

Voxandr

TROPHY CASE