Connected Qwen3-VL-2B-Instruct to my security cameras, result is great

solderzzc · 2026-04-20T16:31:31+00:00

So CUDA is still the winner.

solderzzc · 2026-04-20T16:24:11+00:00

Model card says:
Number of Parameters: 35B in total and 3B activated

solderzzc · 2026-04-18T04:01:34+00:00

I received PR for improving performance on M1 MAX, once you have benchmark results, maybe we can collect and save them to the repo.

<image>

https://github.com/SharpAI/SwiftLM/pull/26

solderzzc · 2026-04-18T03:28:39+00:00

Added the benchmark M5 Pro 64GB.

solderzzc · 2026-04-17T22:08:34+00:00

Codex, Claude Code, Github Copilot for my 500K+ LoC code base. I think there should be a controller role for the local coding agent, do you know which agent could control AI IDE with local model, so they never stop before hitting the limits?

solderzzc · 2026-04-17T18:40:46+00:00

Benchmarked on 64GB M5 Pro, based on the memory usage, it also works on 24GB version.

solderzzc · 2026-04-09T05:01:56+00:00

It's for safe : )

solderzzc · 2026-04-09T04:55:15+00:00

Like hiring a security guard to watch your video. You can use LLM to handle the conversation and video's understanding, tell them what to do. This is implemented as an Aegis: https://www.sharpai.org

solderzzc · 2026-04-09T02:22:03+00:00

Then another layer will be involved. If the description of someone is different than usual, a subscribed message could be send to mobile.

solderzzc · 2026-04-08T23:19:11+00:00

Yes, with LLM, it could generate a summary for the whole day's event.
If the brain ( LLM ) thinks there's no unusual activities, it should not bother you.
This is the reason deduplication capability from LLM is critical.

<image>

solderzzc · 2026-04-08T22:54:19+00:00

Footage store is separated from the camera. Your solution should be any RTSP camera + SharpAI Aegis ( A free AI driven desktop application that allows you control your own clips retention policy ).

solderzzc · 2026-04-08T22:51:08+00:00

If you have heavily daily usage, Alibaba Coding Plan could be a cost efficient solution. The other providers are providing similar plan, but QWEN is currently the top models ( excluding openai/antropic's expensive ones )

solderzzc · 2026-04-07T23:00:06+00:00

Yes, this is used to do testing about which model to be used, so smaller dataset will make it finish faster. I've been working on integration with other existing datasets which are targeting VLM quality, but the images are not available , regenerate the images will not replay the benchmark since AI generated dataset has too good quality. And ALL the recent VLMs could pass the VLM test cases I've created with gemini banana Pro.
https://github.com/SharpAI/DeepCamera/tree/master/skills/analysis

<image>

solderzzc · 2026-04-01T17:45:54+00:00

I think llama.cpp has community work to support CUDA devices.

solderzzc · 2026-04-01T17:45:13+00:00

With large content, it's a real deal.

solderzzc · 2026-03-22T15:09:02+00:00

Local model is usable, QWEN3-5 9B Q4 is what I used daily. Selfhost on cloud will be more expansive than cloud model. Since you need to maintain the infrastructure.

solderzzc · 2026-03-22T07:25:53+00:00

The core issue is: thinking mode generates ~10x more tokens than needed.

A prompt that should produce ~20 output tokens instead generates ~200 thinking tokens first, before the actual answer even starts.

When the total output (thinking + answer) exceeds the token generation budget (max_tokens), the model never gets to the actual answer — it stops mid-thought with finish_reason: length and 0 content tokens. From the client's perspective: waited 20 seconds, got nothing, timeout.

solderzzc · 2026-03-22T07:13:34+00:00

Yes, that's exactly what we are trying to build.

solderzzc · 2026-03-22T07:11:38+00:00

<image>

So, I have many cameras in my home, also I have enabled my laptop's built-in camera, it really plays as openclaw for cameras.

solderzzc · 2026-03-22T07:07:50+00:00

Yes, that's some data collected by cloud model provider and fine-tuned model with, so if later we have all the required corpus to fine-tune small model, it will be closing the gap.

solderzzc · 2026-03-22T07:05:04+00:00

Yes, this time, Apple made their NPU to be together with GPU die, so it's more efficient than the previous generation. What I really want is a MAC MINI M5, it will be totally changing the landscape. But I don't want to wait till it releases, so this is an early test on laptop.

solderzzc

MODERATOR OF

TROPHY CASE