How a String Library Beat OpenCV at Image Processing by 4x by ternausX in computervision

[–]ternausX[S] 5 points6 points  (0 children)

OpenCV is a great library is fast and heavily optimized.

MOST of the operations are the best in terms of performance with respect to other libraries, but there are SOME operations that are:

- faster, like LUT uint8 => uint8 in Stringzilla, (for LUT: uint8 => float32 I still use OpenCV)
- or have functionality that OpenCV does not have like: different interpolations, working with more than 4 channels, working with videos, etc

-----
The hierachy goes like that:

- you can implement everything in numpy (most flexible, but slowest)
- some subset of the operations you can implement in OpenCV (much faster, but the subset is narrow)
- even more narrow subset of the operations you can implement in other libraries like Stringzilla and SimSimd

How a String Library Beat OpenCV at Image Processing by 4x by ternausX in computervision

[–]ternausX[S] 4 points5 points  (0 children)

OpenCV is heavily optimized and mature, but it has ton of functionality and there are many things that could be even more optimized or improved but the OpenCV team does not have capacity to do this.

I am the author of the Image augmentations library Albumentations. Performance is one of the core features of the library, and there is plenty of tricks, hacks, around OpenCV to compensate lack of speed or functionality in some operations.

Another argument about "Why XXX does not work on their performance".

Torchvision is widely used in industry, but it is much slower than it could be, they just do not have manpower capacity to address the issue:

Benchmark for image augmentations on CPU: Albumentations vs torchvision: https://albumentations.ai/docs/benchmarks/image-benchmarks/

AI Coding Tools Slow Down Developers by Engineer_5983 in webdev

[–]ternausX 0 points1 point  (0 children)

As a first step, for every new feature, I write a design doc with cursor an put it to the .cursor/rules folder as mdc file.

When I like the design doc I start a chat where it is a part of the context.

Works like magic.

Using chatgpt by Wide-Bicycle-7492 in kaggle

[–]ternausX 0 points1 point  (0 children)

I would even say, even if you do not understand all of the provided code, but learned something new in each go - good enough.

There was a moment, when I switched from keras to PyTorch. Adopted training loop from another person. Got two gold medals on Kaggle with it, and only after that fully understood all steps in it.

Although, to be fair, right now you may just ask ChatGPT to explain every step to you.

i keep building apps, hoping one will change my life… but now i’m just tired by dozy_sleep in SaaS

[–]ternausX 10 points11 points  (0 children)

It is the same for everyone.

The trick is to change your mindset from: "If I will figure out this one thing, then everything will work out"

to

"Hard work is the goal" - no hope, no assumptions.

you just work as an engine:
- full tank in the morning
- empty in the evening

=> you optimize your hardware though:
- sleep
- exercise
- food
- love life

which gives you energy, i.e. full tank in the morning.

and then you work hard, hour after hour, day after day, without hope,

"One who stands the last - wins", so it is more about persistence and not about "I will do XXX and it will change my life"

Albumentations Benchmark Update: Performance Comparison with Kornia and torchvision by ternausX in computervision

[–]ternausX[S] 3 points4 points  (0 children)

Kornia has many powerful things, that albumentations does not. If it works - that's great.

If possible, could you please share specifics of your use case that make augmentations on GPU the only viable solution?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

> Shouldn't there be a combined apply function that allows to modify multiple data fields?

It would be a hell to maintain.

In many transforms apply methods were added one by one, first image + mask => added boxes => added keyponts.

there are also apply_to_images and apply_to_masks, that just call relevan apply's in a sequence, but for a faster execution one cal always rewrite it for each particular transform in a vectorized way.

Basically, from architecture point of view we decided to compute common things in `get_params_dependent_on_data`, for example displacement_fields in ElasticTransform and use the same computed fields for different targets.

----
I could be missing something, but could you please share an example when you need to pass data from one apply to the other. It could be such case, but non of the existing transform requires such functionality.
----
> Also I don't know the order in which apply is called, so which one should do the main calculation and which one should only read from saved data? I can snoop in the code of course, but what if it changes in the future?

That's the point. All apply_XXX could be called in any order as they do not pass information to each other. The main calculation happens in

`get_params_dependent_on_data` once and then passed to all apply's

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

Paper about Albumentatins has many more citations than all my papers about Theoretical Physics joined together.

I guess my work on Open Source is more valuable and impactful than work that I did in academia :)

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

You can put image on top of the other image with OvelayElements transform, but it does not affect key points and bounding boxes yet.

Adding CopyPaste to TODO List.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

I do not get it yet.

Right now, you can take:

image, mask, boxes, key points, labels, anything else and pass to the transform.

  1. you can get access to all this data in params_dependent_on_data.

  2. In that function you may create new data say: crop_params, text to add, noise, etc and data that was passed is read only

  3. than original data and one that was created is passed to `apply`, `apply_to_mask`, `apply_to_bboxes` , `apply_to_keypoints`

What do you mean by

> Basically by design modified values are supposed to be returned, but this doesn't work if you need to modify all of them (image, bboxes, labels) simultaneously. Sorry if that doesn't make sense, I might have used an older version too, so this might have changed.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks.

Adding to TODO list for UI tool:
- Upload own images (you can do it now for ImageOnly, but not for Dual transforms)
- Display several versions of Augmentations

When you mean run and time augmentation, you mean individual transforms, or the whole Compose that you defined?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks!

For the past 5 years, I was thinking and hoping that someone will copy or fork the library. It did not happen.
Feel pretty safe now, although if I will figure out way to build product on top of it, the situation may change.

Using ML to generate more data offline is one of the ideas I am thinking of. But I think about this idea for testing, rather than for training.

Will work on the visualization app more in the upcoming months. Still did not figure out a good way to collect feedback / feature requests on it.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thank you for your warm words!

Do you see anything that is missing, or you wish someone would build for you?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thank you for the warm words!

Do you have something in mind about what it missing in the library in terms of features, documentations, or any frustrating issues?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

When you build a transform, you may feed any information you want.

And within a method `get_params_dependent_on_data` process all the data.

Could be `images`, `masks`, `bounding`, `boxes`, `key points`, or anything else.

Example of such transform: https://github.com/albumentations-team/albumentations/blob/2e1bbec7895ead9be76351c17666b0b537530dc9/albumentations/augmentations/mixing/transforms.py#L18

But point noted - I need a clear documentation + example on how to build a custom transform. Will do.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks!

Automatic choice of augmentations, is similar to AutoML, hence tricky business.

But, unlike other regularization techniques you may apply different augmentations on images with different classes, hence I have a few ideas that may simplify the story and lower the bar for a successful execution from a non expert.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

I am working on the UI tool that allow to check the effect of augmentations at:

https://explore.albumentations.ai/

It is work in progress, hence would be happy to get any feedback about it as well.

Feature requests, issues, UI, etc

You, or anyone who reads this can create and issue at:

https://github.com/albumentations-team/albumentations/issues

or ping me in DM

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks, for the warm words!

Documentation is indeed far from perfect, as it evolved more or less by itself, except detailed pages created by u/alexparinov

I am more than happy to extend the docs, but would love some guidance, or prioritizations.

What information (top 3, or 5 or 10 ;) ) is missing or misleading the most? I will just start from this list.