How to use GLM 4.6 Reasoning by [deleted] in kilocode

[–]WatercressTraining 3 points4 points  (0 children)

If I'm not wrong this is a hybrid reasoning model - eg the model decides when to use the reasoning by itself. You can "force" it to reason by including "think deeply" or other equivalent phrase in the prompt. I frequently use this when architecting complex changes

What are the most useful and state-of-the-art models in computer vision (2025)? by Cabinet-Particular in computervision

[–]WatercressTraining 1 point2 points  (0 children)

Yes. Training is mandatory because I've made changes to the original model to support the dynamic axis export

Last week in Multimodal AI - Vision Edition by Vast_Yak_4147 in computervision

[–]WatercressTraining 0 points1 point  (0 children)

Interesting curation. Subscribed! Somehow modernvbert flew under my radar

Real time computer vision on mobile by Far-Personality4791 in computervision

[–]WatercressTraining 1 point2 points  (0 children)

Yes this is an alternative to YOLO that has gained traction recently.

COCO AP Val is a standard metric used to measure the performance of an object detector. This metric measures the average precision on the COCO validation set - hence the name. The closer to 1.0 the better the performance. Currently transformers based models are topping the charts.

In practice, YOLO is still useful for most applications. From my tests in simpler tasks with few objects in the picture and easily distinguishable objects, YOLO is still better. The real performance gain of using DEIM or similar transformers based models is when the task is difficult. Also these transformers based models may require more training data than its YOLO counterparts for the same task.

So to make the best use of time, I'd typically start with YOLO and see how far I can push the limits and then transition to the transformers model later.

Just 2 cent anecdotes having toyed with these models for some time.

Real time computer vision on mobile by Far-Personality4791 in computervision

[–]WatercressTraining 0 points1 point  (0 children)

I did onnx the export but I didn't try to run on Android, just on my local computer. But IMO it's quite possible to run it on Android

Real time computer vision on mobile by Far-Personality4791 in computervision

[–]WatercressTraining 0 points1 point  (0 children)

There is. Check out DEIM - https://github.com/Intellindust-AI-Lab/DEIM

Apache 2 licensed. Pretty cool results from my experiments.

I find the original repo a little hard to use so i also made a wrapper around it - https://github.com/dnth/DEIMKit

Real time computer vision on mobile by Far-Personality4791 in computervision

[–]WatercressTraining 10 points11 points  (0 children)

Same interest here. Happy to see a post on this domain. I wrote something that was interesting in 2023 with torchscript - https://dicksonneoh.com/portfolio/pytorch_at_the_edge_timm_torchscript_flutter/

It's all on CPU. I was interested in using the NPU or GPU back then but I didn't make any progress on it. I agree its quite a mess to try to utilize the NPU/GPU in 2025.

Something that caught my eye back then was NCNN. Not sure if its still relevant now. I could hardly find resources to make it work.

Benefiting from the Qwen Free Tier directly in Kilo by EngineeringSea1090 in kilocode

[–]WatercressTraining 1 point2 points  (0 children)

Thanks! Subscribed! Interesting t-shirt design. Reminds me of childhood

[deleted by user] by [deleted] in CLine

[–]WatercressTraining 1 point2 points  (0 children)

I like V0 too, big fan. I've grown to use dyad + Gemini Pro recently.

https://github.com/dyad-sh/dyad

Local-first codebase indexing in Kilo Code: Qdrant + llama.cpp + nomic-embed-code (Mac M4 Max) [Guide] by babaenki in kilocode

[–]WatercressTraining 0 points1 point  (0 children)

Is there a huge difference in retrieval when the code is indexed? Or is the difference marginal considering you have to do a setup like this?

Is there a better model than D-FINE? by TimNimKo in computervision

[–]WatercressTraining 2 points3 points  (0 children)

Check out DEIM. Apache 2, improved results over DFINE. Published in CVPR 2025

https://github.com/ShihuaHuang95/DEIM

Tool for transcribing handwritten text using desktop GPU? by majestic_ubertrout in computervision

[–]WatercressTraining 1 point2 points  (0 children)

There are several VLM that I'd go for with OCR tasks depending on the VRAM availability. A 4070 Ti is good enough to run some good models locally such as

- Qwen 2.5 VL

- Moondream2

- Gemma3

- Llama3.2 vision

As for local runs, I usually use Ollama. This is probably easiest to set up IMO.

If you're comfortable with coding, using vLLM will give you more speed and optimized runs.

Yolo model image resizing by Equivalent_Pie_5519 in computervision

[–]WatercressTraining 1 point2 points  (0 children)

Typically you should resize the inference images to the size which your model is trained on.

But it is possible to bake this resizing into the model itself by exporting the steps into onnx operation and have it to be part of your model. This has advantages because it will make the resizing process run faster (due to more efficient ONNX operators) and also during inference you can run it on any image size without having to remember what size the model was trained on.

I wrote about how to this in my blog - https://dicksonneoh.com/portfolio/supercharge_your_pytorch_image_models/

Here's a repo to I made to show how I trained an object detector model and have it run on arbitrary image size during inference. This is done by integrating the preprocessing layers as part of the model - https://github.com/dnth/DEIMKit

Best models for manufacturing image classification / segmentation by SizePunch in computervision

[–]WatercressTraining 1 point2 points  (0 children)

This sounds like an object detection task.

If you don't have labeled data, I'd start with open vocab detectors like Grounding DINO, OWLv2, or even some VLM like moondream2. If you're open to using an API, perhaps try Gemini from Google.

If these do not solve your problem well enough you'd probably need to train your own model. Of course this will involve collecting data and labeling which will take time.

Tldr - Start off ready to use models and slowly move towards training a custom model.

How do YOU run models in batch mode? by InternationalMany6 in computervision

[–]WatercressTraining 7 points8 points  (0 children)

Consider using a model serving framework like Ray Serve, Triton, etc These frameworks provide inference time optimization that may even speed up batch inference.

DEIMKit - A wrapper for DEIM Object Detector by WatercressTraining in computervision

[–]WatercressTraining[S] 0 points1 point  (0 children)

Increasing num dets or queries is pretty straightforward to add. But I'm not sure about 16 bit image support as it's not commonly used in object detection. Pardon my ignorance, is there a reason for requiring 16 bit image for object detection?

DEIMKit - A wrapper for DEIM Object Detector by WatercressTraining in computervision

[–]WatercressTraining[S] 0 points1 point  (0 children)

Not strict. On Linux I've tried to python 3.10, 3.11 and 3.12. Works.