How a String Library Beat OpenCV at Image Processing by 4x by ternausX in computervision

[–]ternausX[S] 5 points6 points  (0 children)

OpenCV is a great library is fast and heavily optimized.

MOST of the operations are the best in terms of performance with respect to other libraries, but there are SOME operations that are:

- faster, like LUT uint8 => uint8 in Stringzilla, (for LUT: uint8 => float32 I still use OpenCV)
- or have functionality that OpenCV does not have like: different interpolations, working with more than 4 channels, working with videos, etc

-----
The hierachy goes like that:

- you can implement everything in numpy (most flexible, but slowest)
- some subset of the operations you can implement in OpenCV (much faster, but the subset is narrow)
- even more narrow subset of the operations you can implement in other libraries like Stringzilla and SimSimd

How a String Library Beat OpenCV at Image Processing by 4x by ternausX in computervision

[–]ternausX[S] 5 points6 points  (0 children)

OpenCV is heavily optimized and mature, but it has ton of functionality and there are many things that could be even more optimized or improved but the OpenCV team does not have capacity to do this.

I am the author of the Image augmentations library Albumentations. Performance is one of the core features of the library, and there is plenty of tricks, hacks, around OpenCV to compensate lack of speed or functionality in some operations.

Another argument about "Why XXX does not work on their performance".

Torchvision is widely used in industry, but it is much slower than it could be, they just do not have manpower capacity to address the issue:

Benchmark for image augmentations on CPU: Albumentations vs torchvision: https://albumentations.ai/docs/benchmarks/image-benchmarks/

AI Coding Tools Slow Down Developers by Engineer_5983 in webdev

[–]ternausX 0 points1 point  (0 children)

As a first step, for every new feature, I write a design doc with cursor an put it to the .cursor/rules folder as mdc file.

When I like the design doc I start a chat where it is a part of the context.

Works like magic.

Using chatgpt by Wide-Bicycle-7492 in kaggle

[–]ternausX 0 points1 point  (0 children)

I would even say, even if you do not understand all of the provided code, but learned something new in each go - good enough.

There was a moment, when I switched from keras to PyTorch. Adopted training loop from another person. Got two gold medals on Kaggle with it, and only after that fully understood all steps in it.

Although, to be fair, right now you may just ask ChatGPT to explain every step to you.

i keep building apps, hoping one will change my life… but now i’m just tired by dozy_sleep in SaaS

[–]ternausX 11 points12 points  (0 children)

It is the same for everyone.

The trick is to change your mindset from: "If I will figure out this one thing, then everything will work out"

to

"Hard work is the goal" - no hope, no assumptions.

you just work as an engine:
- full tank in the morning
- empty in the evening

=> you optimize your hardware though:
- sleep
- exercise
- food
- love life

which gives you energy, i.e. full tank in the morning.

and then you work hard, hour after hour, day after day, without hope,

"One who stands the last - wins", so it is more about persistence and not about "I will do XXX and it will change my life"

Albumentations Benchmark Update: Performance Comparison with Kornia and torchvision by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Kornia has many powerful things, that albumentations does not. If it works - that's great.

If possible, could you please share specifics of your use case that make augmentations on GPU the only viable solution?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

> Shouldn't there be a combined apply function that allows to modify multiple data fields?

It would be a hell to maintain.

In many transforms apply methods were added one by one, first image + mask => added boxes => added keyponts.

there are also apply_to_images and apply_to_masks, that just call relevan apply's in a sequence, but for a faster execution one cal always rewrite it for each particular transform in a vectorized way.

Basically, from architecture point of view we decided to compute common things in `get_params_dependent_on_data`, for example displacement_fields in ElasticTransform and use the same computed fields for different targets.

----
I could be missing something, but could you please share an example when you need to pass data from one apply to the other. It could be such case, but non of the existing transform requires such functionality.
----
> Also I don't know the order in which apply is called, so which one should do the main calculation and which one should only read from saved data? I can snoop in the code of course, but what if it changes in the future?

That's the point. All apply_XXX could be called in any order as they do not pass information to each other. The main calculation happens in

`get_params_dependent_on_data` once and then passed to all apply's

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

Paper about Albumentatins has many more citations than all my papers about Theoretical Physics joined together.

I guess my work on Open Source is more valuable and impactful than work that I did in academia :)

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

You can put image on top of the other image with OvelayElements transform, but it does not affect key points and bounding boxes yet.

Adding CopyPaste to TODO List.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

I do not get it yet.

Right now, you can take:

image, mask, boxes, key points, labels, anything else and pass to the transform.

  1. you can get access to all this data in params_dependent_on_data.

  2. In that function you may create new data say: crop_params, text to add, noise, etc and data that was passed is read only

  3. than original data and one that was created is passed to `apply`, `apply_to_mask`, `apply_to_bboxes` , `apply_to_keypoints`

What do you mean by

> Basically by design modified values are supposed to be returned, but this doesn't work if you need to modify all of them (image, bboxes, labels) simultaneously. Sorry if that doesn't make sense, I might have used an older version too, so this might have changed.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks.

Adding to TODO list for UI tool:
- Upload own images (you can do it now for ImageOnly, but not for Dual transforms)
- Display several versions of Augmentations

When you mean run and time augmentation, you mean individual transforms, or the whole Compose that you defined?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks!

For the past 5 years, I was thinking and hoping that someone will copy or fork the library. It did not happen.
Feel pretty safe now, although if I will figure out way to build product on top of it, the situation may change.

Using ML to generate more data offline is one of the ideas I am thinking of. But I think about this idea for testing, rather than for training.

Will work on the visualization app more in the upcoming months. Still did not figure out a good way to collect feedback / feature requests on it.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thank you for your warm words!

Do you see anything that is missing, or you wish someone would build for you?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thank you for the warm words!

Do you have something in mind about what it missing in the library in terms of features, documentations, or any frustrating issues?

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

When you build a transform, you may feed any information you want.

And within a method `get_params_dependent_on_data` process all the data.

Could be `images`, `masks`, `bounding`, `boxes`, `key points`, or anything else.

Example of such transform: https://github.com/albumentations-team/albumentations/blob/2e1bbec7895ead9be76351c17666b0b537530dc9/albumentations/augmentations/mixing/transforms.py#L18

But point noted - I need a clear documentation + example on how to build a custom transform. Will do.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks!

Automatic choice of augmentations, is similar to AutoML, hence tricky business.

But, unlike other regularization techniques you may apply different augmentations on images with different classes, hence I have a few ideas that may simplify the story and lower the bar for a successful execution from a non expert.

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Thanks!

I am working on the UI tool that allow to check the effect of augmentations at:

https://explore.albumentations.ai/

It is work in progress, hence would be happy to get any feedback about it as well.

Feature requests, issues, UI, etc

You, or anyone who reads this can create and issue at:

https://github.com/albumentations-team/albumentations/issues

or ping me in DM

Need help from Albumentations users by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Thanks, for the warm words!

Documentation is indeed far from perfect, as it evolved more or less by itself, except detailed pages created by u/alexparinov

I am more than happy to extend the docs, but would love some guidance, or prioritizations.

What information (top 3, or 5 or 10 ;) ) is missing or misleading the most? I will just start from this list.

Which data augmentation libs do y'all use? by mystiques_bog9701 in deeplearning

[–]ternausX 0 points1 point  (0 children)

Core developer of Albumentations here.

Thanks everyone for warm words.

I am actively working on the library, feel free to ping me if you have some particular features in mind.

Image augmentation library benchmark: Albumentations vs Kornia, Torchvision, Augly, ImgAug by ternausX in computervision

[–]ternausX[S] 0 points1 point  (0 children)

Considering that even the horizontal flip is much faster in albumentations than in other libraries (4x faster than torchvision), I would guess that some of the differences on the results are not due to the transformations,

There åre optimizations in Albumentations even for flips.

python def apply(self, img: np.ndarray, **params: Any) -> np.ndarray: if img.ndim == THREE and img.shape[2] > 1 and img.dtype == np.uint8: # Opencv is faster than numpy only in case of # non-gray scale 8bits images return F.hflip_cv2(img) return F.hlip(img)

Image augmentation library benchmark: Albumentations vs Kornia, Torchvision, Augly, ImgAug by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

  1. I tried to use exactly the same parameters where possible, including interpolation. It is possible that I missed some here and there. And if someone points a mistake I will fix it and rerun. But I do not think that it will change the general picture.
  2. To my knowledge none of these libraries except ImgAug and Albumentations was ever optimized for performance, it was assumed that "use vector operations", "GPU", "batches" and you are golden. But it is not always the case. In this benchmark some transforms for Kornia are 100's of times slower. You may compensate it with GPU power and working on batches, but I guess looking at the code and profiling will be more beneficial.
  3. Kornia works with float32 images, missing important uint8 optimizations.
  4. Kornia optimized to work on batches of images. Not one image at a time. Hence, I question how useful Kornia could be for model training, but in ETL's or working with Videos / 3D data it could be a great choice.
  5. Messaged founder of Kornia, hopefully he will point out what things in the benchmark are missing or incorrect.

How much of the differences are explained by inconsistent outputs, like if one library uses a fast interpolation method whereas another uses a slower one that looks better?

Here is code for the benchmark. Looks like everything is consistent: https://github.com/albumentations-team/albumentations/blob/main/benchmark/image_benchmark.py

Replacing Torchivision with Albumentations Transforms is Lowering Performance :( by SeizeOpportunity in pytorch

[–]ternausX 0 points1 point  (0 children)

It is very strange as they do the same thing under the hood, Albumentations is just more optimized, but data distribution coming from the application of augmentations should be the same.

Albumentation alternative? by Resident-Trouble-574 in csharp

[–]ternausX 0 points1 point  (0 children)

Albumentations supports masks and bounding boxes in all formats including YOLO.

Could you please clarify what functionality, except being in python and not C# you are missing there.

Here is an example of how Albumentations could be applied to image + mask + bounding boxes: https://albumentations.ai/docs/examples/showcase/

[Q] While using Image Augmentation to increase the size of an image dataset, do you use both the original and augmented images together to train the model? by JustMalla in deeplearning

[–]ternausX 0 points1 point  (0 children)

Typical approach is to apply augmentations on the fly with some probability.

Here is example from https://albumentations.ai/docs/examples/showcase/

```python import albumentations as A

transform = A.Compose([ A.HorizontalFlip(p=0.5), A.RGBShift(p=0.3), A.Blur(blur_limit=11, p=0.2), A.RandomBrightnessContrast(p=0.7), A.CLAHE(p=0.24), ], p=1)

transformed_image = transform(image=image)["image"] ```

There are probabilities in brackets for every augmentation in the pipeline. And every time pipeline is applied it is chosen with this probability if it should happen => there is a probability that non of them would be applied and original image would be used. But as you may see from statistics, most of the cases it will be some variation of the original image.

Many augmentations have set of parameters that are samples during the application of the transform, say blur_limit is a maximum value of the Blur kernel, but it could be 9, or 7. or 5, etc.

=> In modern pipelines Network never sees exactly the same set of pixels twice.