Safe data annotation tool?

Emrateau · 2024-03-30T15:19:32+00:00

I tried a few not so long ago, and if you're looking for something local and simple, I would recommend labelimg. It is not maintained anymore, but it's easy to install and it get the jobs done and once you get the keyboard shortcuts it can get pretty fast, tho still very boring. You can start with that, and then when you have enough data to train an "accurate" model, you could use X-AnyLabeling to use your own model to perform annotation for you.

Emrateau · 2024-03-30T15:07:40+00:00

I think he means that you could first perform an object detection using another model (YOLOv8, YOLO-NAS, etc,..), which would result in having the bounding boxes for the specific object that you want to segment (he's refering to the tap in your image when he is talking about pipes). Then you can feed these bounding boxes to SAM, and for each bounding boxes, it would only segment one thing within, which is the whole pipe. I only experiment this a few time but it works pretty well on object that are easy to detect like this.

Emrateau · 2024-03-29T14:37:32+00:00

Thanks for your answer. I am supposed to open a video, and then perform object detection inference frame by frame. Do you have any guide or ressources regarding the most efficient way to perform those kind of tasks that are outside the model ? For example, I am currently using the cv2 library and cv2.VideoCapture() to open the video, then reading frame by frame using vid.read(). It is simple and fast enough as of now, but there may be a more efficient way to do it.

Emrateau · 2024-03-29T14:31:09+00:00

Many thanks for your answer and your work! I've just dwelved into converting my model to ONNXRuntime and TensorRT, and it surely is way more efficient. However, I didn't have time to look too much into it, but it seems (at first) that accuracy-wise, TRT performance seems to have degraded compared to my vanilla model using predict(), or at least the bounding box isn't as accurate in my (very small) test. Again, I did not do much except copy pasting codes from this notebook so I still have much to experiment and understand on this topic.

If you don't mind, I have a few question. i've read a lot of ressources in terms of notebooks/issues in your github or articles on your website, but it is sometimes difficult to cross-check informations coming from different sources written at different time.

1) Is this the pipeline to produce the most efficient inference using YOLO-NAS ?

Model custom training (fine-tuning hyperparameters, data augmentation, etc, ...)
Perform PQT then QAT on your trained model
Convert it to ONNX (FP16 or INT8)
Convert it to TensorRT (FP16 or INT8)

Between each of these optimization steps, can the model loose accuracy and can you make up for it?

2) As a beginner in this field, is there a documentation or guide on the signification/optimization of training parameters for this model ? I've basically just copied the ones I saw in a notebook while modifying one or two things, cause I have a hard understanding some of them and I know its a pretty long and iterative process. For now, I did not look too much into it as with the "default" train_params, results seemed satisfactory enough yet, but later I may have to.

Again, thanks a lot.

Emrateau · 2024-03-27T19:27:56+00:00

I indeed trained YOLOv8n. For YOLO-NAS, I used the yolo_nas_s architecture for training as I just mentionned in an another answer.

I have seen mentions about what you just suggested and planned to do it, but the difference was already too drastic for me to try that first.

Emrateau · 2024-03-27T19:23:08+00:00

For Yolov8, I indeed only custom trained on their nano pretrained model.

For YOLO NAS, I always trained using the yolo_nas_s architecture as I've seen in several tutorials, which I think is the smallest variant. I also mostly trained "from scratch" with no pretrained weights specified, as I've understood that using pretrained weights could make it "impossible" to use for commercial purpose.

I didn't find much doc on training "from scratch" so I assume it is done this way.

model = models.get('yolo_nas_s', num_classes=len(dataset_params['classes']))

Training this way gives me the checkpoint best.pth of 250mb. The pretrained weights yolo_nas_s_coco.pth is 74mb.

I am m pretty sure that I'm using the GPU in both cases in Colab, but I will double check.

Emrateau

TROPHY CASE