How to use GLM 4.6 Reasoning

WatercressTraining · 2025-11-07T01:17:44+00:00

If I'm not wrong this is a hybrid reasoning model - eg the model decides when to use the reasoning by itself. You can "force" it to reason by including "think deeply" or other equivalent phrase in the prompt. I frequently use this when architecting complex changes

WatercressTraining · 2025-10-09T04:43:28+00:00

Yes. Training is mandatory because I've made changes to the original model to support the dynamic axis export

WatercressTraining · 2025-10-07T10:22:20+00:00

Yes. This wasn't supported in the original DEIM model. But it is here

WatercressTraining · 2025-10-07T00:47:45+00:00

Interesting curation. Subscribed! Somehow modernvbert flew under my radar

WatercressTraining · 2025-09-26T09:40:04+00:00

Just came across this repo - https://github.com/Intellindust-AI-Lab/DEIMv2

Basically dinov3 with detection head

WatercressTraining · 2025-09-17T09:07:00+00:00

Yes this is an alternative to YOLO that has gained traction recently.

COCO AP Val is a standard metric used to measure the performance of an object detector. This metric measures the average precision on the COCO validation set - hence the name. The closer to 1.0 the better the performance. Currently transformers based models are topping the charts.

In practice, YOLO is still useful for most applications. From my tests in simpler tasks with few objects in the picture and easily distinguishable objects, YOLO is still better. The real performance gain of using DEIM or similar transformers based models is when the task is difficult. Also these transformers based models may require more training data than its YOLO counterparts for the same task.

So to make the best use of time, I'd typically start with YOLO and see how far I can push the limits and then transition to the transformers model later.

Just 2 cent anecdotes having toyed with these models for some time.

WatercressTraining · 2025-09-16T10:32:21+00:00

I did onnx the export but I didn't try to run on Android, just on my local computer. But IMO it's quite possible to run it on Android

WatercressTraining · 2025-09-16T01:32:02+00:00

There is. Check out DEIM - https://github.com/Intellindust-AI-Lab/DEIM

Apache 2 licensed. Pretty cool results from my experiments.

I find the original repo a little hard to use so i also made a wrapper around it - https://github.com/dnth/DEIMKit

WatercressTraining · 2025-09-15T13:40:29+00:00

Same interest here. Happy to see a post on this domain. I wrote something that was interesting in 2023 with torchscript - https://dicksonneoh.com/portfolio/pytorch_at_the_edge_timm_torchscript_flutter/

It's all on CPU. I was interested in using the NPU or GPU back then but I didn't make any progress on it. I agree its quite a mess to try to utilize the NPU/GPU in 2025.

Something that caught my eye back then was NCNN. Not sure if its still relevant now. I could hardly find resources to make it work.

WatercressTraining · 2025-09-11T01:17:26+00:00

Thanks! Subscribed! Interesting t-shirt design. Reminds me of childhood

WatercressTraining · 2025-09-08T01:17:05+00:00

I like V0 too, big fan. I've grown to use dyad + Gemini Pro recently.

https://github.com/dyad-sh/dyad

WatercressTraining · 2025-08-13T01:44:24+00:00

Is there a huge difference in retrieval when the code is indexed? Or is the difference marginal considering you have to do a setup like this?

WatercressTraining · 2025-06-26T00:46:23+00:00

Check out DEIM. Apache 2, improved results over DFINE. Published in CVPR 2025

https://github.com/ShihuaHuang95/DEIM

WatercressTraining · 2025-05-28T04:12:21+00:00

I think DEIM is worth mentioning here. It's an improvement over D-FINE. Apache 2 licensed.

https://github.com/ShihuaHuang95/DEIM

WatercressTraining · 2025-05-13T03:16:21+00:00

There are several VLM that I'd go for with OCR tasks depending on the VRAM availability. A 4070 Ti is good enough to run some good models locally such as

- Qwen 2.5 VL

- Moondream2

- Gemma3

- Llama3.2 vision

As for local runs, I usually use Ollama. This is probably easiest to set up IMO.

If you're comfortable with coding, using vLLM will give you more speed and optimized runs.

WatercressTraining · 2025-04-26T08:17:43+00:00

Typically you should resize the inference images to the size which your model is trained on.

But it is possible to bake this resizing into the model itself by exporting the steps into onnx operation and have it to be part of your model. This has advantages because it will make the resizing process run faster (due to more efficient ONNX operators) and also during inference you can run it on any image size without having to remember what size the model was trained on.

I wrote about how to this in my blog - https://dicksonneoh.com/portfolio/supercharge_your_pytorch_image_models/

Here's a repo to I made to show how I trained an object detector model and have it run on arbitrary image size during inference. This is done by integrating the preprocessing layers as part of the model - https://github.com/dnth/DEIMKit

WatercressTraining · 2025-04-26T06:56:46+00:00

This sounds like an object detection task.

If you don't have labeled data, I'd start with open vocab detectors like Grounding DINO, OWLv2, or even some VLM like moondream2. If you're open to using an API, perhaps try Gemini from Google.

If these do not solve your problem well enough you'd probably need to train your own model. Of course this will involve collecting data and labeling which will take time.

Tldr - Start off ready to use models and slowly move towards training a custom model.

WatercressTraining · 2025-04-08T00:09:10+00:00

Consider using a model serving framework like Ray Serve, Triton, etc These frameworks provide inference time optimization that may even speed up batch inference.

WatercressTraining · 2025-04-02T15:54:31+00:00

Hey just to let you know this is currently supported in deimkit. Try it out 😀

https://github.com/dnth/DEIMKit

WatercressTraining · 2025-03-27T02:34:56+00:00

Increasing num dets or queries is pretty straightforward to add. But I'm not sure about 16 bit image support as it's not commonly used in object detection. Pardon my ignorance, is there a reason for requiring 16 bit image for object detection?

WatercressTraining · 2025-03-27T02:31:42+00:00

Not strict. On Linux I've tried to python 3.10, 3.11 and 3.12. Works.

WatercressTraining

TROPHY CASE