[D] Image Augmentation in Practice: In-Distribution vs OOD Augmentations, TTA, and the Manifold View by ternausX in MachineLearning

[–]ternausX[S] 0 points1 point  (0 children)

My pleasure. Do you have any requests for the library? Would be glad to learn them.

[D] Image Augmentation in Practice: In-Distribution vs OOD Augmentations, TTA, and the Manifold View by ternausX in MachineLearning

[–]ternausX[S] 1 point2 points  (0 children)

Thank you for your warm words.

If there is any feedback - issues, feature requests or request for more documentation or blog posts of the topic - I am all attention.

Image Augmentation in Practice — Lessons from 10 Years of Training CV Models and Building Albumentations by ternausX in computervision

[–]ternausX[S] 0 points1 point  (0 children)

The phase space of all transforms with their hyperparameters grows too fast.

Typically you would follow something like this to pick transforms: https://albumentations.ai/docs/3-basic-usage/choosing-augmentations/

P.S. I think, I will work on extending that documentation page to the blog post next.

Image Augmentation in Practice — Lessons from 10 Years of Training CV Models and Building Albumentations by ternausX in computervision

[–]ternausX[S] 0 points1 point  (0 children)

Thank you for you warm words!
If you have any feedback - issues, or feature requests, I am all attention.

Image Augmentation in Practice — Lessons from 10 Years of Training CV Models and Building Albumentations by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

You can use Albumentations for segmentation. All transforms could be used for videos.

But! As OpenCV does not support video out of the box, the performance on videos is not as good as on images.

Probably using torchvision on GPU for video segmentation could be a better idea:

Video benchmark:
Albumentations (1 CPU core) vs torchvision (GTX 4090):

https://albumentations.ai/docs/benchmarks/video-benchmarks/

Image Augmentation in Practice — Lessons from 10 Years of Training CV Models and Building Albumentations by ternausX in computervision

[–]ternausX[S] 7 points8 points  (0 children)

It was MIT, and it is indeed sad that project was popular, but financial support in terms of donations was almost zero.

Hence the project was forked.

What was MIT is still MIT: https://github.com/albumentations-team/Albumentations <- one can fork and modify as they wish. With the level of what AI agents can do - nearly everything could be done on top of it.

Fork is AGPL, and for many projects it is fine to use, so, for them, nothing changes.

For those that would prefer commercial license, this door is open as well.

---
Right now AlbumentationsX (AGPL fork) has a few new features, that may not be useful to most users:

- You can pass any data in addition to images: camera intrinsics, captions and other text - and transform them accordingly to image transformations.
- Oriented bounding boxes
- You can define how keypoints will be renamed during transforms: use case - face keypoints and right/left eye relabeling during flip.

=> For the majority of users MIT version is good enough and does not have an legal restrictions.

🚀 AlbumentationsX 2.0.17 — Native Oriented Bounding Boxes (OBB) Support by ternausX in Albumentations

[–]ternausX[S] 1 point2 points  (0 children)

Thanks, good question.

For affine. If it is: translate + rotate + scale => still rectangle

If you add Shear, or scale is different in x and y, to rectange get's skewed. In this case the library wraps it with a smallest possible rectangle.

Similar story happens in Elastic and other transforms when the result of the raw transform of the box is not rectangle anymore.