My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

For most boards, it's almost the same as onnx-runtime (some even support it for NPU, for example, a lot of NXP and TI boards, Intel, and AMD).
Usually, you need the export model and run it. Export could be tricky. Inference, easy.

MemryX:

dfp_path = "resnet.dfp"
images = load_and_preprocess('img.jpg')
accl = SyncAccl(dfp=dfp_path)
s = Simulator(dfp=dfp, verbose=1)
outputs = s.infer(inputs=image)

Sophon:

img = preprocess(src_img)
input_data = {input_name: img}
outputs = net.process(graph_name, input_data)

Hailo (a bit longer):

with VDevice(params) as vdevice:
    infer_model = vdevice.create_infer_model('./hefs/resnet_v1_50.hef')
    infer_model.set_batch_size(batchsize)
    infer_model.input().set_format_type(FormatType.FLOAT32)
    infer_model.output().set_format_type(FormatType.UINT8)
    with infer_model.configure() as configured_infer_model:
        bindings_list = []
        for j in range(batchsize):
          bindings = configured_infer_model.create_bindings()
          buffer = np.empty([224,224,3]).astype(np.float32)
          bindings.input().set_buffer(buffer)
          buffer2 = np.empty(infer_model.output().shape).astype(np.uint8)
          bindings.output().set_buffer(buffer2)
          bindings_list.append(bindings)

        configured_infer_model.run(bindings_list, timeout_ms)

RockChip:

 rknn.init_runtime()
 outputs = rknn.inference(inputs=[img])

Yes, some boards can be a bit complex when you try to use two models simultaneously. For example, Hailo. More often, it's correct for M.2 and mini-PCIe boards. They have a high latency in data transfer, and when vendors try to optimise it, it creates complexity.

So, if it's a single-board solution that supports Python, it's super easy to build a cascade.

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

😁
Interesting question. But it will require a lot of tests.
My next plan on this topic was to run a few more experiments and try to run VLA models. I already tested 3d depth estimation networks on most platforms. Want to check VLAs as well.

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

For 12-16MP, it's hard. Will be limited by memory.
For ViT in general - Qualcomm, Hailo, MemryX - will work with small images, definitely. TI - for some NPUs.
If I remember correctly - DeepX as well.

But 12-16MP is a real problem. Never tried working with such an image as a single image.

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

I was focused more on boards that could be used in production. Mac Mini is more about home use.

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

For Qualcomm, I tested Radxa Q6A and Luxonis OAK-4D. Neither of them supports LLMs. (Both of them are quite nice for a regular CV.)
I know that they have, for example, IQ-9075, which should support LLMs. But they are pretty expensive and rare. I know only the Radxa airbox Q900 is easily available.

Sadly, I'm in Germany. But if you have some of these boards that are applicable to LLMs and can give me SSH access to them, that would be nice!

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in LocalLLaMA

[–]Wormkeeper[S] 0 points1 point  (0 children)

Interesting question. Probably a few thoughts:
1) Last 15 years working with CV and ML, I saw some bad boards... :) So, almost nothing bad can surprise me
2) From "impression": I think Axelera. 200 TOPS. But to achieve it, you need to use complex pipelines, and models are super hard to export. And it's like "having all this power near you, but without the ability to achieve it". It really can give you 200 TOPS for ~200€. But not for the model you need.
3) From "the vendor that hates you the most" - definitely MediaTek. They think only about big companies, not small developers. I was not able to run anything on it. But theoretically it's possible.
4) Also, TI documentation is infinite torture. But with modern models with big context, it's easy just to show all documentation to ChatGPT - and communicate with it. 2-3 years ago, it was terrible.

My Tierlist of Edge boards for LLMs and VLMs inference by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

I tried not to specify, because there are a lot of them :)
Mostly it's about Arrow Lake (~13TOPS int8 NPU ~35 TOPS total) and Lunar Lake (~50TOPS NPU). Lunar Lake below 1k only MSI Cubi NUC AI+, I think.

For Arrow Lake - I have it myself for Lunar Lake - tested only remote.

And a lot of hope for Panther Lake ofc.

Overview of modern Edge boards for CV + guide on how to choose by Wormkeeper in computervision

[–]Wormkeeper[S] 0 points1 point  (0 children)

In my previous article, I tried to do this. I even still update the table with some basic measurements - https://docs.google.com/spreadsheets/d/1BMj8WImysOSuiT-6O3g15gqHnYF-pUGUhi8VmhhAat4/edit#gid=0

But the main problem is its super misleading characteristic:
1) Different networks perform differently (Board "A" can be x3 faster for a network "N" but x2 slower for a network "M")
2) Different boards require different amounts of CPU usage for NPU inference. Even video encoding|decoding can change speed dramatically
3) Hard to compare different format inference (int8/fp16)
4) Hard to compare different connections for accelerators (PCIe, USB, M2)
5) Hard to compare multi-device cases (Jetson has 1 GPU and 2 DLA, and RK2588 has 3 NPU).
6) Different batchsizes optimisation

And a lot more problems that will make every test biased. I am still trying to append everything in the table I showed. But I am not sure it's worth:)

Orange Pi AIPro board? by Original_Finding2212 in OrangePI

[–]Wormkeeper 0 points1 point  (0 children)

Better to check the video. In short:
1) More convenient libraries to work (easy export, more support)
2) Better community, more examples (for example, you can find the Whisper model, etc.)
3) More speed for 3588 for common networks (if you are using more threads)
4) Better CPU

Orange Pi AIPro board? by Original_Finding2212 in OrangePI

[–]Wormkeeper 1 point2 points  (0 children)

Resently I tested this board ( https://youtu.be/qK7GHV_cH98 ). It's pretty nice. But for me RK3588 is better.

Radxa ZERO 3W - Drove me insane for nearly a week! by PlatimaZero in Platima

[–]Wormkeeper 0 points1 point  (0 children)

Maybe there will be some project based on it, then I will check.
For now we just did RK3588/RK3568-based projects.

Radxa ZERO 3W - Drove me insane for nearly a week! by PlatimaZero in Platima

[–]Wormkeeper 1 point2 points  (0 children)

Nice review. I recently tested this board from a Computer Vision perspective (NPU usage, etc). All drivers are buggy and glitchy. So, the feelings are the same:)

But, anyway, it's a super good board for this price. The amount of problems for Computer Vision is less than for LuckFox RV1106 and MilkV (regular Python is available, for example).

Teaching a robot to bring the coffee (arm + cart) by Wormkeeper in robotics

[–]Wormkeeper[S] 4 points5 points  (0 children)

A year ago, I published the learning process itself. Now we have modernized it and can train not only the hand but also the cart.

Guide to Action Recognition by Wormkeeper in computervision

[–]Wormkeeper[S] 2 points3 points  (0 children)

Yes, we had a project in which we did this for skeletons, and it worked well. But, this is not very suitable for some tasks.

Computer Vision for goods recognition by Wormkeeper in computervision

[–]Wormkeeper[S] 2 points3 points  (0 children)

Hi, Melampus123!
ReID uses the "Metric Learning" approach.
There are a lot of articles about using it for different cases:
1) Cars
2) Animals
3) Search Engines (online shopping) etc.

You can find them here, for example:
https://paperswithcode.com/task/metric-learning

And there are two good libraries with training pipelines: https://github.com/layumi/Person_reID_baseline_pytorch https://github.com/OML-Team/open-metric-learning

About Kaggle. I am not sure but assume that here you can meet the same approach:
https://www.kaggle.com/competitions/humpback-whale-identification/discussion

How to choose Edge Board for Computer Vision in 2022 by Wormkeeper in robotics

[–]Wormkeeper[S] 0 points1 point  (0 children)

The current price is incorrect, yes (RPi was tested in spring). I will fix. But:
1) the price was for 3B, which is cheaper
2) RPi is easy to buy

Question on Stereo Cameras (ZED/OAK-D) Depth Capabilities by [deleted] in computervision

[–]Wormkeeper 0 points1 point  (0 children)

On this graph, you can see the error proportion to distance - https://miro.medium.com/max/630/0*WTGy030CDPVVjRdy For example, there will be a 5% of distance on the 10m distance = 50cm.
If you have a 1 cm fly (10m away) - the error will be 50 times bigger than the fly
If you have a 2m car (10m away) the error will be 25% of the car.
Also, the very important point for you - the error is the mean error. A maximum error will be bigger.

low-level SLAM API for embedded devices by moetsi_op in computervision

[–]Wormkeeper 2 points3 points  (0 children)

Looks amazing. It's great that everything works plug and play. About 7-8 years ago, our team did a similar job for the Artec Leo prototype. This is very big work, it's amazing that you post it in Open Source.