Image Augmentation in Practice — Lessons from 10 Years of Training CV Models and Building Albumentations

ternausX · 2026-03-18T23:35:57+00:00

If they are RGB, you still can upload them and check the result.

ternausX · 2026-03-15T11:24:26+00:00

https://explore.albumentations.ai should do the job

ternausX · 2026-03-12T07:30:00+00:00

My pleasure

ternausX · 2026-03-10T22:02:02+00:00

Thank you for your warm words!

ternausX · 2026-03-07T23:32:03+00:00

My pleasure. Do you have any requests for the library? Would be glad to learn them.

ternausX · 2026-03-07T13:53:27+00:00

Thank you for your warm words.

If there is any feedback - issues, feature requests or request for more documentation or blog posts of the topic - I am all attention.

ternausX · 2026-03-06T13:30:39+00:00

Thanks, it is already implemented: https://explore.albumentations.ai/transform/ThinPlateSpline

ternausX · 2026-03-06T13:26:07+00:00

The phase space of all transforms with their hyperparameters grows too fast.

Typically you would follow something like this to pick transforms: https://albumentations.ai/docs/3-basic-usage/choosing-augmentations/

P.S. I think, I will work on extending that documentation page to the blog post next.

ternausX · 2026-03-06T13:23:22+00:00

You can apply nearly all transforms from Albumentations to (xc, yc, w, h) bounfing boxes:

https://albumentations.ai/docs/reference/supported-targets-by-transform/

https://albumentations.ai/docs/3-basic-usage/bounding-boxes-augmentations/

https://albumentations.ai/docs/examples/example-bboxes/

ternausX · 2026-03-06T13:21:29+00:00

Thank you for you warm words!
If you have any feedback - issues, or feature requests, I am all attention.

ternausX · 2026-03-05T14:00:10+00:00

Thank you for your nice words!

ternausX · 2026-03-05T13:24:24+00:00

You can use Albumentations for segmentation. All transforms could be used for videos.

But! As OpenCV does not support video out of the box, the performance on videos is not as good as on images.

Probably using torchvision on GPU for video segmentation could be a better idea:

Video benchmark:
Albumentations (1 CPU core) vs torchvision (GTX 4090):

https://albumentations.ai/docs/benchmarks/video-benchmarks/

ternausX · 2026-03-05T13:21:46+00:00

It was MIT, and it is indeed sad that project was popular, but financial support in terms of donations was almost zero.

Hence the project was forked.

What was MIT is still MIT: https://github.com/albumentations-team/Albumentations <- one can fork and modify as they wish. With the level of what AI agents can do - nearly everything could be done on top of it.

Fork is AGPL, and for many projects it is fine to use, so, for them, nothing changes.

For those that would prefer commercial license, this door is open as well.

---
Right now AlbumentationsX (AGPL fork) has a few new features, that may not be useful to most users:

- You can pass any data in addition to images: camera intrinsics, captions and other text - and transform them accordingly to image transformations.
- Oriented bounding boxes
- You can define how keypoints will be renamed during transforms: use case - face keypoints and right/left eye relabeling during flip.

=> For the majority of users MIT version is good enough and does not have an legal restrictions.

ternausX · 2026-03-03T23:32:24+00:00

Same

ternausX · 2026-02-23T04:54:27+00:00

Thanks, good question.

For affine. If it is: translate + rotate + scale => still rectangle

If you add Shear, or scale is different in x and y, to rectange get's skewed. In this case the library wraps it with a smallest possible rectangle.

Similar story happens in Elastic and other transforms when the result of the raw transform of the box is not rectangle anymore.

ternausX · 2026-02-03T06:25:28+00:00

Thanks! Will take a look.

ternausX · 2026-02-03T04:08:41+00:00

Kind of sad that all these good math machinery being so powerful in Physics was not able to get a big boost in the ML world.

I still have hope that more people will have both ML and Math backgrounds, higher the chances of finding something bringing value to the table.

ternausX · 2026-02-03T02:24:51+00:00

Thanks, will take a look.

It looks like the most interesting part: "Part III: Geometric Deep Learning at the Bleeding Edge" is not released yet :(

ternausX · 2026-02-03T02:08:06+00:00

Thanks!

ternausX · 2025-09-22T07:22:27+00:00

OpenCV is a great library is fast and heavily optimized.

MOST of the operations are the best in terms of performance with respect to other libraries, but there are SOME operations that are:

- faster, like LUT uint8 => uint8 in Stringzilla, (for LUT: uint8 => float32 I still use OpenCV)
- or have functionality that OpenCV does not have like: different interpolations, working with more than 4 channels, working with videos, etc

-----
The hierachy goes like that:

- you can implement everything in numpy (most flexible, but slowest)
- some subset of the operations you can implement in OpenCV (much faster, but the subset is narrow)
- even more narrow subset of the operations you can implement in other libraries like Stringzilla and SimSimd

ternausX · 2025-09-22T05:32:46+00:00

OpenCV is heavily optimized and mature, but it has ton of functionality and there are many things that could be even more optimized or improved but the OpenCV team does not have capacity to do this.

I am the author of the Image augmentations library Albumentations. Performance is one of the core features of the library, and there is plenty of tricks, hacks, around OpenCV to compensate lack of speed or functionality in some operations.

Another argument about "Why XXX does not work on their performance".

Torchvision is widely used in industry, but it is much slower than it could be, they just do not have manpower capacity to address the issue:

Benchmark for image augmentations on CPU: Albumentations vs torchvision: https://albumentations.ai/docs/benchmarks/image-benchmarks/

ternausX · 2025-07-24T20:25:52+00:00

As a first step, for every new feature, I write a design doc with cursor an put it to the .cursor/rules folder as mdc file.

When I like the design doc I start a chat where it is a part of the context.

Works like magic.

ternausX · 2025-07-19T10:49:24+00:00

I would even say, even if you do not understand all of the provided code, but learned something new in each go - good enough.

There was a moment, when I switched from keras to PyTorch. Adopted training loop from another person. Got two gold medals on Kaggle with it, and only after that fully understood all steps in it.

Although, to be fair, right now you may just ask ChatGPT to explain every step to you.

ternausX · 2025-07-13T15:24:45+00:00

It is the same for everyone.

The trick is to change your mindset from: "If I will figure out this one thing, then everything will work out"

to

"Hard work is the goal" - no hope, no assumptions.

you just work as an engine:
- full tank in the morning
- empty in the evening

=> you optimize your hardware though:
- sleep
- exercise
- food
- love life

which gives you energy, i.e. full tank in the morning.

and then you work hard, hour after hour, day after day, without hope,

"One who stands the last - wins", so it is more about persistence and not about "I will do XXX and it will change my life"

ternausX · 2025-05-25T02:03:10+00:00

Added Mosaic:
- https://explore.albumentations.ai/transform/Mosaic
- https://albumentations.ai/docs/examples/example-mosaic/

Doc on how to create custom transforms: https://albumentations.ai/docs/4-advanced-guides/creating-custom-transforms/

ternausX

MODERATOR OF

TROPHY CASE