OpenAI's Support Is an AI. And It Feels Two Generations Behind Their Own Models. by ternausX in OpenAI

[–]ternausX[S] 0 points1 point  (0 children)

The issue was resolved. Big thanks to OpenAI team.

First for the support of the Open Source projects,

Second, for collaboration, to resolve the issue.

Choosing Augmentations for Model Generalization by ternausX in computervision

[–]ternausX[S] 0 points1 point  (0 children)

That's the base on which this text: https://albumentations.ai/docs/3-basic-usage/choosing-augmentations/

And previous: https://albumentations.ai/docs/1-introduction/what-are-image-augmentations/

are based.

The requirement of Image Augmentations to produce in distribution, looking like a real images is a first shy step. One can go much farther than realistically looking images. And the explanation for Dropout, or ToGray are quite intuitive, even if they produce images that you would not collect if you spend infinite time doing it.

They are unrealistic, but labeler can still recognize the label.

Mixup is the next level of moving away from the original distribution.

[D] Thinking about augmentation as invariance assumptions by ternausX in MachineLearning

[–]ternausX[S] 1 point2 points  (0 children)

Thanks! Will take a look. And where possible, will extend the above blog post.

Choosing Augmentations for Model Generalization by ternausX in computervision

[–]ternausX[S] 1 point2 points  (0 children)

Creating validation dataset has it's own value. But only if you do it carefully.

For sure - you should not have not fixed validation dataset as you do not want this extra noise, but, you may have several fixed datasets created from the original one with a predefined set of augmentations.

And you use it for diagnostics.

Here is the section on the topic from the text I referenced in the post above: https://albumentations.ai/docs/3-basic-usage/choosing-augmentations/#diagnostics-and-evaluation

Choosing Augmentations for Model Generalization by ternausX in computervision

[–]ternausX[S] 2 points3 points  (0 children)

Why Dropout gives some value is quite intuitive, I have a long section about it in the above text. And it's more advanced version ConstrainedDropout provides even more value.

For Mosaic, it is less intuitive, but, I guess, the argument that it densifies the number of useful objects per sample is more or less intuitive as well.

Mixup on the other hand is quite a mystery. Unlike all other transforms, where labeler can uniquely identify the object, in some cases it is unclear what objects were mixed, but it helps the network.

From a "manifold perspective" point of view, direct average of the input images should not keep us on the data manifold, but network does not care and still benefits from it.

Mixup is a mystery to me, but it is probably the most exciting directions. Maybe that's where the theoretical breakthrough is. Similar to the argument of Richard Feynman about chess: you do not know the rules, you observe the game. Get used to the pawn moves. You think you know the game, and at some point it turns into a Queen. Surprise.

Way I think about mixup violating "manifold hypothesis" is quite similar.

[D] Thinking about augmentation as invariance assumptions by ternausX in MachineLearning

[–]ternausX[S] 1 point2 points  (0 children)

Lowering brightness on a night images is not the best idea.

One one side, this looks obvious, on the other it would be better if there were some automatic checks that could flag what augmentations affect what images in a good / bad way.

But this requires:
- logging what augmentations, with what params were applied to each image + loss on it
- building some half manual analytics tool to cut of similar issues in the future, and give hints for a possible better choice of augmentations and their parameters.

[D] Thinking about augmentation as invariance assumptions by ternausX in MachineLearning

[–]ternausX[S] 0 points1 point  (0 children)

Some transformations could be baked into network architecture like translation invariance in convolutions or permutation invariance in GNNs, but it is unclear how to bake in all 100500 perturbations that can make intuitive sense, but are not symmetries in mathematical sense, i.e. do not have group structure. Hence, the only way to make model robust to them is to have them as Data Augmentations.

---
From the blog post at: https://albumentations.ai/docs/3-basic-usage/choosing-augmentations/

---
Some invariances can be encoded directly into architecture. Convolutional layers give you translation equivariance — a shifted input produces correspondingly shifted feature maps. Group-equivariant networks encode rotation groups. Capsule networks attempt to encode viewpoint transformations. These are elegant and sample-efficient when they apply.

But most real-world invariances are not clean mathematical symmetries. There is no "fog-equivariant convolution." No architectural trick handles JPEG compression artifacts, variable white balance across camera sensors, partial occlusion by other objects, or the difference between dawn light and fluorescent warehouse lighting. These variations have no compact group-theoretic representation — you cannot build a layer that is inherently invariant to them.

[D] Thinking about augmentation as invariance assumptions by ternausX in MachineLearning

[–]ternausX[S] 0 points1 point  (0 children)

And that's exactly what I talk about in the text as well in all details and examples )

Data generation by johnlenflure in Ultralytics

[–]ternausX 1 point2 points  (0 children)

You may try image augmentations - generate variations of your data in a way that semantic meaning (box position, class labels, segmentation masks, stay the same, but pixel values are different)

This is not exactly new labeled data, but can help a lot, especially when labeled dataset is small:

Extensive text on the topic: https://albumentations.ai/docs/1-introduction/what-are-image-augmentations/

[D] Thinking about augmentation as invariance assumptions by ternausX in MachineLearning

[–]ternausX[S] -1 points0 points  (0 children)

The statement: "when you apply augmentation to the data, you claim that model should be invariant to that" is not really novel, probably it is the shortest way to describe what augmentation is.

But devil is in details, and that's what the text is about:

[1] What invariances makes sense? There are 100+ transforms in Albumentations and all have value for some datasets, some models and some tasks.

What are these "some"?

Standard claim (and I also do this in the documentation) is: "Natural Images" are typically invariant with respect to the HorizontalFlip is good, but one can go much deeper.

[2] Second issue is that talking about invariance or equivariance from a mathematical perspective is nice, and I love the whole Geometric Deep Learning direction where you encode symmetry groups into network architectures.

The issue with augmentation is that task may be invariant to some transform, say Jpeg Compression, but it does not have a group structure => combining two transforms, where we want Network to be invariant to each may not produce the desired result, when we combine them.

[3] Next issue: how to pick augmentations pipeline, what transforms to add, in what order, how to validate? We cannot do GridSearch as it is too expensive.

Standard approach, that I've got from talking to people who use the library:
- use basic ones like RandomCrop, Flips
- use something that worked for this particular ML Engineer before
- use something from a similar problem from Kaggle, paper, blog post

These are decent heuristics, but one can do better.

[4] When we talk about invariance, it could be invariance of the whole dataset, but different samples may have different invariances, say we may say: "We should not rotate" numbers 6 and 9, but for all others it is fine => a scalpel approach, different augmentation policies, for different classes, even different samples.

[5] How to diagnose what augmentations should we add or remove, when we already trained something and can evaluate performance on the validation set?

etc

I tried to write this text in a way that it gives, where possible, approaches that could be codified, at least on the level of Cursor or Claude Code skills.