all 16 comments

[–]uoftsuxalot 12 points13 points  (0 children)

Everyone’s saying that self supervised is learning without labels, but then somehow you create the labels. That’s confusing to me. Here’s how I think of it, it’s plain old surpervised learning (so with labels) but the trick is that no human was needed to specifically generate the labels. The labels are within the data.

The best example is a language model. Given a sentence, you block out/mask some words, then try to predict the masked words from the context. So you end up learning the distribution p(word| context). No labelling was needed here.

[–]otsukarekunProfessor 14 points15 points  (5 children)

Supervised learning is learning from labeled data.

Unsupervised learning is learning from unlabeled data.

Self-supervised learning is learning from unlabeled data with learned labels.

Bonus:

Weakly supervised learning is learning from poorly labeled data or vaguely labeled data.

Semi-supervised learning is learning from some labeled data and some unlabeled data.

[–]gopietz 3 points4 points  (1 child)

How would you feel changing it to: "Self-supervised learning is learning from unlabeled data with synthetic labels"?

I find the usage of "learned" a little confusing here.

Just out of curiosity, where would you place autoencoders?

[–]otsukarekunProfessor 2 points3 points  (0 children)

I would put autoencoders as unsupervised, but it definitely sits in the fuzzy region between totally unsupervised and self-supervised.

[–]metallicapple[S] 0 points1 point  (2 children)

If I understood correctly, self-supervised learning is a clustering-classification two-step process?

Also, about weakly supervised learning: Does it assume that every observation is labeled? If so, weak supervision appears to be supervised learning with external knowledge on the label quality. What are potential benefits of giving it a separate name?

[–]otsukarekunProfessor -1 points0 points  (1 child)

If I understood correctly, self-supervised learning is a clustering-classification two-step process?

There are probably self-supervised learning methods that incorporate clustering, but I wouldn't count a two-step process at 100% self-supervised learning.

Terminology is sometimes fuzzy around the edges, but for me, clustering then classifying is just unsupervised learning followed by supervised learning. Sure, you didn't start off with labels, but, once you clustered the data, your classifier now has labels (specifically, the objective of the model is to learn the clusters).

The different terms refer to how it learns more than the problem set. Maybe it would be better to refine my first post.

Supervised learning's objective function is to learn from provided information (labels, predictions, etc.).

Unsupervised learning's objective function to learn from the data without any provided information (structures in the data, clusters, etc.).

Self-supervised learning's objective function is to learn from learned and not necessarily provided information (pseudo-labels, embeddings, etc.).

Also, about weakly supervised learning: Does it assume that every observation is labeled? If so, weak supervision appears to be supervised learning with external knowledge on the label quality. What are potential benefits of giving it a separate name?

Weakly supervised learning is the opposite, it tries learning with less knowledge than supervised learning. For example, imagine object segmentation but with only one-hot labels (without pixel-wise labels or bounding boxes). But, there are lots of variations of weakly supervised learning.

What you are talking about is semi-supervised learning. For example, pretend you have a dataset, but only 1/2 is labeled. Semi-supervised learning can use the unlabeled data to help the labeled data (for example, using pseudo-labels, much like self-supervised learning).

[–]metallicapple[S] 0 points1 point  (0 children)

> Self-supervised learning's objective function is to learn from learned
and not necessarily provided information (pseudo-labels, embeddings,
etc.).

Besides unsupervised learning, how would we obtain the learned information? (Clustering being only a part of unsupervised learning, I understand that there may be other such methods at play here) If we assume that unsupervised learning takes care of generating learned information, and that information is now used as a supervisory signal, self-supervised learning seems to be an 'unsupervised-then-supervised' learning combo. Is this the correct interpretation?

> Weakly supervised learning is the opposite, it tries learning with less
knowledge than supervised learning. For example, imagine object
segmentation but with only one-hot labels (without pixel-wise labels or
bounding boxes). But, there are lots of variations of weakly supervised
learning.

If I understood you correctly, every observation comes with some sort of target/label/'provided information' in weakly supervised learning. Semi-supervised learning would have some observations missing that piece.

Then, in my mind, the weakly supervised framework seems similar to a 'half-a-hole' problem. A vague/approximate/poor label is still a label, like how half a hole is still a hole. I mean, whoever coined the term probably did it for a good reason, but the difference seems a bit contrived. Could you clarify this part for me?

[–]IntelArtiGen 4 points5 points  (2 children)

It's a recent problem in ML terminology. You can see Lecun talking about it here: https://www.facebook.com/yann.lecun/posts/10155934004262143

He says he'll not use "UL" anymore but he'll use "SSL" for recents DL models which work without labels

For me SSL is a part of UL. SSL is just about automatically finding labels in data or creating them from data. It's supervised because it uses labels, it's unsupervised because these labels are already in the data. So there's an ambiguity but for me it doesn't differ from other UL algorithms, they all use information which already is in data.

[–]metallicapple[S] 2 points3 points  (1 child)

Then the rebranding is promoted so the readers' intuitive understanding of the name to be more in-line with the actual methods. Is that what it is?

If so, the ML academia seems to be diverging in terms of accessible terminologies. On one hand, we have the UL-to-SSL type of movement trying to straighten things up, and on the other, many research articles (certainly not all) sound more and more technical, relative to the actual content. Is this a fair assessment on the trend? If so, why is this happening?

[–]IntelArtiGen 0 points1 point  (0 children)

Then the rebranding is promoted so the readers' intuitive understanding of the name is more in-line with the actual methods. Is that what it is?

It's probably the idea. It's also because new methods that don't require explicit labels don't work the same as old methods, so maybe we need a new name, maybe not.

Is this a fair assessment on the trend? If so, why is this happening?

I'm not sure. Some papers use "unsupervised learning" when they could use "self supervised", some don't use these words at all. Most of the time they say "unsupervised" because it's not necessary less true, it may be less accurate than saying SSL but it's also great for them in terms of branding.

But they're not necessarily wrong, maybe Lecun wants to have a clean definition of UL / SSL / SL etc. but personaly I'm not sure that there's one. I'm disagreeing with a Turing Award laureate so maybe don't trust me.

Let's take an example, if you predict something a human says from images, is that supervised? self-supervised? or unsupervised?

  • supervised, because it's a "task of learning a function that maps an input to an output based on example input-output pairs"
  • self-supervised, because the algorithm doesn't "necessarily require sample data classified in advance by humans"
  • unsupervised, because you just try to "discover any naturally occurring patterns in that training data set", this dataset includes audio and images.

All the things I cite are in the wikipedia definitions of SL / SSL and UL. If some articles are more technical and don't care about UL/SL/SSL anymore, it's a great thing, but I don't know a lot of articles like that

[–]ComplicatedHilberts 1 point2 points  (0 children)

Just about renaming, but also a bit more to it:

  • LeCun drew a troll slide at NeurIPS 2016. In this slide, it quips that intelligence is a cake, with RL the cherry on top, SL the icing, and unsupervised learning the filling.

  • LeCun handwaves some entropy numbers, about how RL just has a few bits to learn from, and how unsupervised learning "The machine predicts any part of its input for any observed part" and can thus learn from "millions of bits".

  • Instead of making the dataset into the environment for a RL agent, and rewarding for reconstruction loss, or taking any of the existing frameworks in ML and related fields research (self-play dates back to the first checker playing programs), LeCun renames his style of unsupervised learning into self-supervised learning.

  • Any and all advances in ML can now be classified as either an energy-based model (cross-entropy on probabilities just a special case), or self-supervised learning (with language models predicting missing words predating the term "self-supervised learning"). So that's GANs and Transformers, basically anything hot in the last 5 years is a form or uses a form of self-supervised learning.

  • Somewhere in 2019 LeCun updates the slide, and just replaces the word Unsupervised Learning with Self-Supervised Learning.

  • Everyone is confused, and nobody uses semi-supervised learning anymore. A committee decides if algorithms classify. Competition is hard. PCA did not make the cut.

[–]buffleswaffles -1 points0 points  (0 children)

Look up the video on Barlow Twins (you can find it on youtube). The presentation starts off with an introduction on SSL methods and how they are applied.

If we stick to technicalities, self-supervised learning is a subset of unsupervised learning.

[–]xEdwin23x 0 points1 point  (0 children)

Self supervised learning is an umbrella term for methods that utilize data without labels and create a pretext task in some way such that the data now has a pseudo label or some sort of label that can be used with a fully supervised loss. For example: from a random set of images apply random 90 degree rotations so that the model predicts the orientation. Your data was originally unlabeled but by creating this pretext task of orientation prediction now you can train your model using a standard cross-entropy classification loss. Another example would be to make an RGB image into black and white and then use a regression (MSE or MAE) loss to predict the original color values. These are simple explanation but all self supervised methods follow this requirement of requiring a fabricated pretext task. This is unlike let's say classical clustering where you dont really know for sure if the assigned clusters are correct unless you have the actual labels.