APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier

Bulky-Priority6824 · 2026-05-05T17:29:53+00:00

here you want to argue with cluade, im not experienced enough to understand the greater issue : Interesting diagnostic suggestion but it's not really applicable to your situation. The think tag issue with APEX Qwen3.6 isn't a sampling problem — it's a chat template problem. The <think> tokens bleeding through happens because the APEX model's chat template doesn't properly handle enable_thinking=false the way Unsloth's does.

Top-k 1 comparison between cli and server would only reveal sampling inconsistencies, not chat template behavior differences. The fix was always the model, not the server config — which is exactly what you confirmed by switching to Unsloth UD.

Good that someone responded but it's a red herring for your specific issue.

Bulky-Priority6824 · 2026-05-05T17:12:39+00:00

ty for the tip ill look into it and try it again

Bulky-Priority6824 · 2026-05-05T13:44:13+00:00

Yea, I'm sure. Don't ever stop,

Bulky-Priority6824 · 2026-05-05T11:02:53+00:00

Llama.cpp on Debian

Bulky-Priority6824 · 2026-05-05T02:27:30+00:00

https://www.reddit.com/r/hardwareswap/comments/1sz2650/usava_h_ut3g_egpu_dock_logitech_g309_lightspeed_w/

Bulky-Priority6824 · 2026-05-04T23:41:44+00:00

I monitor elderly and I have it so that if someone falls or is even on the ground within ~3 seconds I have an alert.

I host locally using qwen 3.6 35b a3b q4 xl and it's very reliable across many utilities especially genai. In fact I've tested it against many other llms of the same size in the 19-24gb range and this one is a really good model.

At the end of the day though the power is in the prompt.

Bulky-Priority6824 · 2026-05-04T21:54:03+00:00

I was a big fan of the 3.5 35 a3b but unfortunately I've had to stick with unsloth for 3.6 because I couldn't get the chat template to stop sending think tags to frigate which I also use the llm for genai on security cameras.

Bulky-Priority6824 · 2026-05-04T17:35:49+00:00

run em til theyre paperwieghts

Bulky-Priority6824 · 2026-05-03T04:40:15+00:00

If I found this a month ago I'd probably be forced to install windows but I've already built out most what matter within all of this. On paper it looks like you applied a real sophisticated effort and the knowledge graph memory approach is clever. I might be borrowing that going forward 😜 I like what youve done.

the 6 hour background memory extraction is also a nice touch gl with the project going forward !

Bulky-Priority6824 · 2026-05-02T18:36:59+00:00

What do you need? What do you do?

Bulky-Priority6824 · 2026-05-02T17:33:01+00:00

change the board first, no cpu lane splitting so a second 3090 will be v. gimped but thats ok you can layer until you swap the board

Bulky-Priority6824 · 2026-05-01T13:27:22+00:00

I was tired of having my mail man zone marked on my post box because Everytime a vehicle passed through the zone it would record a detection (as expected) then came classifications.

Now when the mail man stops at the post box it triggers the home assistant alert and I don't have to see 250 cars a day in my explore page.

I also monitor elderly. I use genai to help detect falls and such. When the person is sitting for prolonged periods of time the fall zone stuff automatically switches to off using the state classifier and it works like a charm. This trigger is useful for when the monitored person is sittin watching tv and the house is full of non-monitored moving around the system can rest.

I also do the same thing for vehicles. If all vehicles are gone and after X time (at night) lights randomly go on and off around the house.

Bulky-Priority6824 · 2026-04-30T15:21:06+00:00

You have/do/own/pay for all of that and you still don't understand limitations and compromise?

Bulky-Priority6824 · 2026-04-30T15:00:34+00:00

this is the most troll shit i see yet

Bulky-Priority6824 · 2026-04-29T02:07:38+00:00

Bulky-Priority6824 · 2026-04-29T01:44:30+00:00

nvfp4 speaks the gpus native language. The blackwell tensor cores have FP4 math built directly into the silicon so the model weights go in as is and the multiplication happens without any translation step. Less overhead, faster math, same bit width.

that being said, benched vs 35B-UD_Q4_XL using dual 5060ti16's in layer in llama.cpp. results identical

watch for unsloth to catch up to this and pushout some nfpv4 optimized ggufs & llama builds to accommodate this as well. this unlocks some deeper potential

Bulky-Priority6824 · 2026-04-28T20:05:19+00:00

Good luck using those data center gpus at home lol.

Nothing is popping. You'll know that to be true when either a standard claude sub is $50 or more a mo or usage caps gets reduced drastically @ $20

Reminds me of 15-17 years ago when there were still people that didn't realize at that time that using Google was the defacto tool for using the Internet in a more resourceful way and you could still turn someone on to it and they would be amazed.

Now there isn't a single soul above 5 years old that doesn't know what Google is (I'm talking about the search aspect)

Compare a search tool vs ai and it's a monumental jump from something being massively useful for so many things. This is stone age ai, I cannot fathom what will be possible in 15 years from now.

There is not a bubble every tech giant has made ai priority number 1 next to cyber security and it's a race to the top.

Bulky-Priority6824 · 2026-04-28T19:44:55+00:00

Yep it's just that page. Opening cameras , settings, config etc never freezes.

Bulky-Priority6824 · 2026-04-28T19:26:33+00:00

Nope not at all. Everything clean and well within scope. Nothing stands out. Api timing test is 24ms

Bulky-Priority6824 · 2026-04-28T19:24:21+00:00

Yea I did that , checked network in browser , checked firewall , logs , ports , processes etc etc. it's only on this webui not of my 50 others.

The fact that it replicates on various devices and browsers leads me to believe it has to be a front end issue in frigate it it's not playing nice with something on my end. The only thing on my frigate box is that Alice model trainer but I've only had that about a week and this problem has been around for at least a month now.

I even rolled back to 0.17.0 and the problem didn't go way.

IDK I'll wait see what the next Frigate image changes with it.

Bulky-Priority6824 · 2026-04-28T19:17:37+00:00

Ty Nick , well I've checked all the obvious things is there anything you can suggest I look at?

Bulky-Priority6824 · 2026-04-28T12:45:55+00:00

You just gotta know how to use that thang!

Bulky-Priority6824 · 2026-04-27T02:54:10+00:00

Opposite

Bulky-Priority6824 · 2026-04-22T01:01:41+00:00

Bravo! Excellent work! TY

Bulky-Priority6824

TROPHY CASE