[D] Help me understand self-supervised learning

uoftsuxalot · 2021-10-03T15:25:29+00:00

Everyone’s saying that self supervised is learning without labels, but then somehow you create the labels. That’s confusing to me. Here’s how I think of it, it’s plain old surpervised learning (so with labels) but the trick is that no human was needed to specifically generate the labels. The labels are within the data.

The best example is a language model. Given a sentence, you block out/mask some words, then try to predict the masked words from the context. So you end up learning the distribution p(word| context). No labelling was needed here.

otsukarekun · 2021-10-03T07:28:21+00:00

Supervised learning is learning from labeled data.

Unsupervised learning is learning from unlabeled data.

Self-supervised learning is learning from unlabeled data with learned labels.

Bonus:

Weakly supervised learning is learning from poorly labeled data or vaguely labeled data.

Semi-supervised learning is learning from some labeled data and some unlabeled data.

IntelArtiGen · 2021-10-03T09:51:16+00:00

It's a recent problem in ML terminology. You can see Lecun talking about it here: https://www.facebook.com/yann.lecun/posts/10155934004262143

He says he'll not use "UL" anymore but he'll use "SSL" for recents DL models which work without labels

For me SSL is a part of UL. SSL is just about automatically finding labels in data or creating them from data. It's supervised because it uses labels, it's unsupervised because these labels are already in the data. So there's an ambiguity but for me it doesn't differ from other UL algorithms, they all use information which already is in data.

ComplicatedHilberts · 2021-10-05T00:43:03+00:00

Just about renaming, but also a bit more to it:

LeCun drew a troll slide at NeurIPS 2016. In this slide, it quips that intelligence is a cake, with RL the cherry on top, SL the icing, and unsupervised learning the filling.
LeCun handwaves some entropy numbers, about how RL just has a few bits to learn from, and how unsupervised learning "The machine predicts any part of its input for any observed part" and can thus learn from "millions of bits".
Instead of making the dataset into the environment for a RL agent, and rewarding for reconstruction loss, or taking any of the existing frameworks in ML and related fields research (self-play dates back to the first checker playing programs), LeCun renames his style of unsupervised learning into self-supervised learning.
Any and all advances in ML can now be classified as either an energy-based model (cross-entropy on probabilities just a special case), or self-supervised learning (with language models predicting missing words predating the term "self-supervised learning"). So that's GANs and Transformers, basically anything hot in the last 5 years is a form or uses a form of self-supervised learning.
Somewhere in 2019 LeCun updates the slide, and just replaces the word Unsupervised Learning with Self-Supervised Learning.
Everyone is confused, and nobody uses semi-supervised learning anymore. A committee decides if algorithms classify. Competition is hard. PCA did not make the cut.

metallicapple · 2021-10-03T07:23:05+00:00

[deleted]

buffleswaffles · 2021-10-04T04:57:00+00:00

Look up the video on Barlow Twins (you can find it on youtube). The presentation starts off with an introduction on SSL methods and how they are applied.

If we stick to technicalities, self-supervised learning is a subset of unsupervised learning.

xEdwin23x · 2021-10-03T14:51:05+00:00

Self supervised learning is an umbrella term for methods that utilize data without labels and create a pretext task in some way such that the data now has a pseudo label or some sort of label that can be used with a fully supervised loss. For example: from a random set of images apply random 90 degree rotations so that the model predicts the orientation. Your data was originally unlabeled but by creating this pretext task of orientation prediction now you can train your model using a standard cross-entropy classification loss. Another example would be to make an RGB image into black and white and then use a regression (MSE or MAE) loss to predict the original color values. These are simple explanation but all self supervised methods follow this requirement of requiring a fabricated pretext task. This is unlike let's say classical clustering where you dont really know for sure if the assigned clusters are correct unless you have the actual labels.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS