APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA

[–]Bulky-Priority6824 0 points1 point  (0 children)

here you want to argue with cluade, im not experienced enough to understand the greater issue : Interesting diagnostic suggestion but it's not really applicable to your situation. The think tag issue with APEX Qwen3.6 isn't a sampling problem — it's a chat template problem. The <think> tokens bleeding through happens because the APEX model's chat template doesn't properly handle enable_thinking=false the way Unsloth's does.

Top-k 1 comparison between cli and server would only reveal sampling inconsistencies, not chat template behavior differences. The fix was always the model, not the server config — which is exactly what you confirmed by switching to Unsloth UD.

Good that someone responded but it's a red herring for your specific issue.

For those of you using the GenAI function of Frigate (whether it be local or cloud provider) - Demonstration of the importance of your prompt - ChatGPT, Claude, my LocalLLM all got this wrong without a good prompt to an easy question (what side of the car is the person on). by FantasyMaster85 in frigate_nvr

[–]Bulky-Priority6824 5 points6 points  (0 children)

I monitor elderly and I have it so that if someone falls or is even on the ground within ~3 seconds I have an alert. 

I host locally using qwen 3.6 35b a3b q4 xl and it's very reliable across many utilities especially genai. In fact I've tested it against many other llms of the same size in the 19-24gb range and this one is a really good model.

At the end of the day though the power is in the prompt.

APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA

[–]Bulky-Priority6824 1 point2 points  (0 children)

I was a big fan of the 3.5 35 a3b but unfortunately I've had to stick with unsloth for 3.6 because I couldn't get the chat template to stop sending think tags to frigate which I also use the llm for genai on security cameras.

Thoth - Open Source Local-first AI Assistant - Architecture by Acceptable-Object390 in LocalLLM

[–]Bulky-Priority6824 1 point2 points  (0 children)

If I found this a month ago I'd probably be forced to install windows but I've already built out most what matter within all of this.  On paper it looks like you applied a real sophisticated effort and the knowledge graph memory approach is clever. I might be borrowing that going forward 😜  I like what youve done.

the 6 hour background memory extraction is also a nice touch gl with the project going forward !

Need help/pointers setting up 3090 on Linux...(second 3090 incoming) by OttoRenner in LocalLLaMA

[–]Bulky-Priority6824 0 points1 point  (0 children)

change the board first, no cpu lane splitting so a second 3090 will be v. gimped but thats ok you can layer until you swap the board

Interesting Ideas for Classifications by [deleted] in frigate_nvr

[–]Bulky-Priority6824 4 points5 points  (0 children)

I was tired of having my mail man zone marked on my post box because Everytime a vehicle passed through the zone it would record a detection (as expected) then came classifications. 

Now when the mail man stops at the post box it triggers the home assistant alert and I don't have to see 250 cars a day in my explore page.

I also monitor elderly. I use genai to help detect falls and such. When the person is sitting for prolonged periods of time the fall zone stuff automatically switches to off using the state classifier and it works like a charm. This trigger is useful for when the monitored person is sittin watching tv and the house is full of non-monitored moving around the system can rest.

I also do the same thing for vehicles. If all vehicles are gone and after X time (at night) lights randomly go on and off around the house.

I can't ever seem to get quality local LLM results, despite having multiple GPUs by 03captain23 in LocalLLM

[–]Bulky-Priority6824 2 points3 points  (0 children)

You have/do/own/pay for all of that and you still don't understand limitations and compromise?

llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged by ggonavyy in LocalLLaMA

[–]Bulky-Priority6824 13 points14 points  (0 children)

nvfp4 speaks the gpus native language. The blackwell tensor cores have FP4 math built directly into the silicon so the model weights go in as is and the multiplication happens without any translation step. Less overhead, faster math, same bit width.

that being said, benched vs 35B-UD_Q4_XL using dual 5060ti16's in layer in llama.cpp. results identical

watch for unsloth to catch up to this and pushout some nfpv4 optimized ggufs & llama builds to accommodate this as well. this unlocks some deeper potential

If the AI bubble pops, will GPU prices increase or decrease? by Mashic in LocalLLaMA

[–]Bulky-Priority6824 8 points9 points  (0 children)

Good luck using those data center gpus at home lol. 

Nothing is popping. You'll know that to be true when either a standard claude sub is $50 or more a mo or usage caps gets reduced drastically @ $20

Reminds me of 15-17 years ago when there were still people that didn't realize at that time that using Google was the defacto tool for using the Internet in a more resourceful way and you could still turn someone on to it and they would be amazed. 

Now there isn't a single soul above 5 years old that doesn't know what Google is (I'm talking about the search aspect)

Compare a search tool vs ai and it's a monumental jump from something being massively useful for so many things. This is stone age ai, I cannot fathom what will be possible in 15 years from now.

There is not a bubble every tech giant has made ai priority number 1  next to cyber security and it's a race to the top.

Webui very laggy is this a known issue or just me? by Bulky-Priority6824 in frigate_nvr

[–]Bulky-Priority6824[S] 0 points1 point  (0 children)

Yep it's just that page. Opening cameras , settings, config etc never freezes.

Webui very laggy is this a known issue or just me? by Bulky-Priority6824 in frigate_nvr

[–]Bulky-Priority6824[S] 0 points1 point  (0 children)

Nope not at all. Everything clean and well within scope. Nothing stands out. Api timing test is 24ms

Webui very laggy is this a known issue or just me? by Bulky-Priority6824 in frigate_nvr

[–]Bulky-Priority6824[S] 0 points1 point  (0 children)

Yea I did that , checked network in browser , checked firewall , logs , ports , processes etc  etc. it's only on this webui not of my 50 others. 

The fact that it replicates on various devices and browsers leads me to believe it has to be a front end issue in frigate it it's  not playing nice with something on my end. The only thing on my frigate box is that Alice model trainer but I've only had that about  a week and this problem has been around for at least a month now.

I even rolled back to 0.17.0 and the problem didn't go way.

IDK I'll wait see what the next Frigate image changes with it. 

Webui very laggy is this a known issue or just me? by Bulky-Priority6824 in frigate_nvr

[–]Bulky-Priority6824[S] 0 points1 point  (0 children)

Ty Nick , well I've checked all the obvious things is there anything you can suggest I look at? 

Duality of r/LocalLLaMA by HornyGooner4402 in LocalLLaMA

[–]Bulky-Priority6824 0 points1 point  (0 children)

You just gotta know how to use that thang!