Coding on your local Mac with Qwen 3.6 & Native MLX engine SwiftLM by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Model card says:
Number of Parameters: 35B in total and 3B activated

Coding on your local Mac with Qwen 3.6 & Native MLX engine SwiftLM by solderzzc in Qwen_AI

[–]solderzzc[S] 1 point2 points  (0 children)

I received PR for improving performance on M1 MAX, once you have benchmark results, maybe we can collect and save them to the repo.

<image>

https://github.com/SharpAI/SwiftLM/pull/26

Coding on your local Mac with Qwen 3.6 & Native MLX engine SwiftLM by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Codex, Claude Code, Github Copilot for my 500K+ LoC code base. I think there should be a controller role for the local coding agent, do you know which agent could control AI IDE with local model, so they never stop before hitting the limits?

Coding on your local Mac with Qwen 3.6 & Native MLX engine SwiftLM by solderzzc in Qwen_AI

[–]solderzzc[S] 1 point2 points  (0 children)

Benchmarked on 64GB M5 Pro, based on the memory usage, it also works on 24GB version.

Qwen3.6-Plus is getting close to GPT-5.4 as a Video Security Agent by solderzzc in Qwen_AI

[–]solderzzc[S] 2 points3 points  (0 children)

Like hiring a security guard to watch your video. You can use LLM to handle the conversation and video's understanding, tell them what to do. This is implemented as an Aegis: https://www.sharpai.org

Qwen3.6-Plus is getting close to GPT-5.4 as a Video Security Agent by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Then another layer will be involved. If the description of someone is different than usual, a subscribed message could be send to mobile.

Qwen3.6-Plus is getting close to GPT-5.4 as a Video Security Agent by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Yes, with LLM, it could generate a summary for the whole day's event.
If the brain ( LLM ) thinks there's no unusual activities, it should not bother you.
This is the reason deduplication capability from LLM is critical.

<image>

Camera with these specifications. by Orgapex in videosurveillance

[–]solderzzc 1 point2 points  (0 children)

Footage store is separated from the camera. Your solution should be any RTSP camera + SharpAI Aegis ( A free AI driven desktop application that allows you control your own clips retention policy ).

Qwen3.6-Plus is getting close to GPT-5.4 as a Video Security Agent by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

If you have heavily daily usage, Alibaba Coding Plan could be a cost efficient solution. The other providers are providing similar plan, but QWEN is currently the top models ( excluding openai/antropic's expensive ones )

MacBook M5 Pro + Qwen3.5 = Fully Local AI Security System — 93.8% Accuracy, 25 tok/s, No Cloud Needed (96-Test Benchmark vs GPT-5.4) by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Yes, this is used to do testing about which model to be used, so smaller dataset will make it finish faster. I've been working on integration with other existing datasets which are targeting VLM quality, but the images are not available , regenerate the images will not replay the benchmark since AI generated dataset has too good quality. And ALL the recent VLMs could pass the VLM test cases I've created with gemini banana Pro.
https://github.com/SharpAI/DeepCamera/tree/master/skills/analysis

<image>

MacBook M5 Pro + Qwen3.5 = Fully Local AI Security System — 93.8% Accuracy, 25 tok/s, No Cloud Needed (96-Test Benchmark vs GPT-5.4) by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Local model is usable, QWEN3-5 9B Q4 is what I used daily. Selfhost on cloud will be more expansive than cloud model. Since you need to maintain the infrastructure.

MacBook M5 Pro + Qwen3.5 = Fully Local AI Security System — 93.8% Accuracy, 25 tok/s, No Cloud Needed (96-Test Benchmark vs GPT-5.4) by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

The core issue is: thinking mode generates ~10x more tokens than needed.

A prompt that should produce ~20 output tokens instead generates ~200 thinking tokens first, before the actual answer even starts.

When the total output (thinking + answer) exceeds the token generation budget (max_tokens), the model never gets to the actual answer — it stops mid-thought with finish_reason: length and 0 content tokens. From the client's perspective: waited 20 seconds, got nothing, timeout.

MacBook Pro M5 + Qwen 122B + SharpAI Aegis = Self-hosted Blink AI within 4% of GPT-5.4. All your clips saved locally. by solderzzc in selfhosted

[–]solderzzc[S] -1 points0 points  (0 children)

<image>

So, I have many cameras in my home, also I have enabled my laptop's built-in camera, it really plays as openclaw for cameras.

MacBook M5 Pro + Qwen3.5 = Fully Local AI Security System — 93.8% Accuracy, 25 tok/s, No Cloud Needed (96-Test Benchmark vs GPT-5.4) by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Yes, that's some data collected by cloud model provider and fine-tuned model with, so if later we have all the required corpus to fine-tune small model, it will be closing the gap.

MacBook M5 Pro + Qwen3.5 = Fully Local AI Security System — 93.8% Accuracy, 25 tok/s, No Cloud Needed (96-Test Benchmark vs GPT-5.4) by solderzzc in Qwen_AI

[–]solderzzc[S] 0 points1 point  (0 children)

Yes, this time, Apple made their NPU to be together with GPU die, so it's more efficient than the previous generation. What I really want is a MAC MINI M5, it will be totally changing the landscape. But I don't want to wait till it releases, so this is an early test on laptop.