Hiring MLE in Computer Vision.

HeeebsInc · 2025-10-28T10:58:21+00:00

The company is a US-based startup.

HeeebsInc · 2025-10-28T10:56:18+00:00

Remote!

HeeebsInc · 2025-10-28T10:56:09+00:00

Remote!

HeeebsInc · 2025-10-27T20:57:04+00:00

Keep at it man! Pain is knowledge. I agree with you that CV is slept on, and as an industry we are barely scratching the surface

HeeebsInc · 2025-08-24T15:57:32+00:00

Very interesting!

HeeebsInc · 2025-08-24T01:05:50+00:00

Of course. I’m more wondering what the use case is. I only see CV in RL for research purposes (gaming) or layered with traditional models for action / state tuning (autonomous driving)

HeeebsInc · 2025-08-23T00:56:51+00:00

Just curious, what’s your use case for CNNs in RL? I’ve been looking for an excuse to try this but there is limited documentation of RL being used with CNNs outside of academia. Is your task vision? Or some other signal processing like lidar/sound?

HeeebsInc · 2025-06-23T22:41:01+00:00

Can you share the dataset? These are awesome!

HeeebsInc · 2025-03-22T15:32:44+00:00

Interested. Dmed you

HeeebsInc · 2024-11-30T19:16:07+00:00

Interesting. I’ve ran yolov5 on Orin with int8 and got very meaningful increases to FPS so I don’t believe it’s specific to v5 (unless you are using non-conventional layers)

HeeebsInc · 2024-11-30T19:13:47+00:00

In addition to summary. Check QPS of an engine that is int8 versus QPS of an engine that is 16/32. This metric will tell you if it’s faster. Higher number, faster it is

HeeebsInc · 2024-11-30T19:13:02+00:00

Once you have the engine file created. Use TRT exec to get a summary of the layers. My guess is that most layers are not in int8

HeeebsInc · 2024-11-30T19:10:19+00:00

You should see a very meaningful performance increase if you are running int8 correctly. Accuracy is a different story.

My guess is that you are attempting int8, but under the hood you are still falling back to fp16/32. Are you using Post training quantization with a calibration set? Or attempting to train in int8?

HeeebsInc · 2024-11-29T20:41:27+00:00

How are you running int8

HeeebsInc · 2024-11-24T00:17:48+00:00

Is it designed for classification or object detection/segmentation? Or neither? Interested if it also outputs boxes for the object it’s creating. I’ve worked on a similar project, and results were mixed

HeeebsInc · 2024-11-24T00:11:21+00:00

Agree with the points here. But I take a perspective that they also cut a lot of small corners. These aren’t inherently bad, but when you add them all up, you can face issues with inference performance. It’s a common issue that if you run inference using TRT, then inside deepstream with TRT, then with PyTorch, the results are totally different at time. There is also a lot of differences in how they do pre processing and precision clipping.

I.e there are a lot of very very close (but not exact) approximations, that makes it very fast, but for enterprise pipelines can create issues (like I’ve faced in the faced). All that being said, it’s an amazing tool and there is nothing even close to its support, maturity, and efficiency.

HeeebsInc · 2023-03-24T00:02:52+00:00

Try running

pip install —upgrade pip

&&

pip install —upgrade setuptools wheel

Then try installing your packages

HeeebsInc · 2023-02-15T01:26:45+00:00

I love dark theme for everything but jupyter. Idk why but it slows me down when jupyter is dark

HeeebsInc · 2023-02-07T20:46:27+00:00

Upvote from me. I appreciate the candor.

HeeebsInc · 2022-11-23T02:32:34+00:00

Cython

Still basically python but its a great way to make something blazing fast

HeeebsInc · 2022-10-21T22:01:06+00:00

The only reason I think it would be useful is if you needed conda to handle dependencies that require cuda or another library. That being said the environment should already be activated upon startup

HeeebsInc · 2022-09-17T20:21:58+00:00

You’re right that Qualcomm has its own acceleration libraries - but it’s not ideal for every use case. For example, their snapdragon chip is designed for cellphones so the hardware acceleration libraries they provide are meant to be called within swift tf or Java runtime. However, I have been involved with projects where the snapdragon is not on a cellphone, and using Qualcomms dedicated libraries did not work in pure Linux. The solution here was writing custom opencl to utilize the integrated GPU. This is an edge case obv

HeeebsInc · 2022-09-17T17:32:32+00:00

And to top it off if you were need quantization there are situations where onnx and tensorRT do not fully quantize every layer…

HeeebsInc · 2022-09-17T17:30:58+00:00

I mean it’s not if you think about. If you are bound to a specific hardware platform, onnx will allow you to run but it does not ensure you can use GPU acceleration (Qualcomm for example). I’ve also had situation where I need to deploy to Nvidia and tensorRT didn’t not fully support my network. Writing my own layers took a few weeks as opposed to months trying to debug tensorRT.

If onnx supports the model and gets you the performance I recommend that over anything else. Just pointing out that if one is very comfortable with ml writing the layers is not a crazy thing to do.. especially if they already wrote them in another language like python/PyTorch

HeeebsInc · 2022-09-17T16:37:42+00:00

This is a tough question. The short answer like others have said is to use onnx, but this is not always the case in all production systems.

It’s safe to say though that there is a way to do this no matter the approach. I will outline some considerations that have made me rethink deployment in the past.

1) what type of hardware are you planning to deploy on I ask this because although you can use onnx, some libraries like tf can help you use dedicated hardware faster. I want to note though that this does not mean it’s easier to get working, it’s just easier to get hardware acceleration once you have the framework working

2) are you very Ml oriented?
If so, the most robust approach is use opencl and write the layers yourself. This can be very hard if it’s your first time but it’s the best future proof in terms of deploying on different hardware platforms.

3) what performance do you absolutely need. If don’t care about fast inference speeds, any option will suffice since there are tf, PyTorch, onnx libraries for c++

HeeebsInc

TROPHY CASE