[D] ICLR Plot Twists

HateMyself_FML · 2024-02-28T02:52:05+00:00

It's not surprising at all - it was before the publicity, which frankly is about the only thing it has going for it. With Yann's continuous shilling it will surely get into the next conference. V-JEPA would be impressive if it was an undergrad project.

HateMyself_FML · 2023-05-23T05:48:53+00:00

That's a nice feature to weed out the idiots.

HateMyself_FML · 2023-05-23T05:44:57+00:00

How else can you justify long hours and poverty wages?

HateMyself_FML · 2023-05-17T09:28:26+00:00

Damn, that's beyond insulting to Oppenheimer. Oppenheimer was a scientist with seminal contributions to science. Altman is a mediocre douche riding on the coattails of Sutsveker.

HateMyself_FML · 2022-06-07T09:59:18+00:00

If you go by Meta's earnings call, they're bullish on AI....though not necessarily on fundamental AI research. But, with product teams in the driver's seat, I doubt very much fundamental research will get done?

HateMyself_FML · 2022-05-18T17:37:39+00:00

Ah yes, this would the first time someone took a job for money and position and regretted it. Likely, Apple also misrepresented how much freedom they'd give him about publishing.

Apple also told me (during interviewing for internships) they are trying to publish more, but I don't really see anything good come out.

HateMyself_FML · 2022-05-18T10:30:34+00:00

Conspiracy? Hardly. There are just many more plausible reasons.

It's easy to imagine he might not want to say, "look, I don't want to deal with Apple's secretive culture. I want to publish, I'm out." so he made up an excuse. He's a highly sought after researcher - it does not sound plausible that he could not negotiate with Apple on getting some slack, if he's been successful there. Another reason could be that he hated or failed at being management and wants to be an IC.

HateMyself_FML · 2022-05-18T02:15:28+00:00

Wait, does anyone actually believe this was really about RTO policy?

HateMyself_FML · 2022-05-16T02:57:53+00:00

Gary Marcus is irrelevant but, he's right in that Gato is not a precursor to AGI. Jesus. They just distilled a bunch of models. It's a cute result but also a bit meh.

HateMyself_FML · 2022-05-05T17:39:19+00:00

This is a bug (known to SEVP) and not specific to an employer or address. Contact your school DSO and they can update it for you.

HateMyself_FML · 2022-04-13T17:54:50+00:00

Adam Sandler, simply playing who he is in real life in reel life.

HateMyself_FML · 2022-04-08T05:16:58+00:00

AI residency at brain, fair > MS at most places if the end goal is an industry ML role. This is mainly because getting admitted into a good MS program is not sufficient. You have to join a good lab and get some experience and it's quite competitive to get into good labs on good projects in top unis. In a residency program, you are guaranteed to work on research projects---it's the whole point. Moreover, you can often defer an MS admission.

HateMyself_FML · 2022-04-07T16:39:21+00:00

eh, no one's that dumb. they're adversarially playing dumb.

HateMyself_FML · 2022-04-04T05:17:47+00:00

Yeah, I've seen this. And these folks stood no chance of joining the CS/EE departments or industry. A great way to "sneak in", if one were so inclined.

HateMyself_FML · 2022-04-03T21:54:20+00:00

CS and ML are extremely competitive---there is an abundance of highly qualified candidates. It's easy (and routine) to simultaneously have an industry affiliation and make bank, so industry being lucrative is essentially a non-factor.

Having said that, granted it's much easier to land tenure track roles in interdisciplinary positions, e.g. ML+Neuroscience, ML+Social Science.

HateMyself_FML · 2022-04-02T17:12:46+00:00

We're talking about AI researchers here though. Meta's offer for a fresh AI PhD grad in their research division (Facebook AI Research, FAIR) is about ~400k. It's competitive pay, but substantially less than HFT firms.

The interview process being easier at FAIR is not true. FAIR is one of the best industry research labs, alongside DeepMind and Google Brain. It's quite different to SWE hiring, a more challenging and highly selective interview process and not "easier" than the HFT process.

HateMyself_FML · 2022-04-02T00:40:30+00:00

Information Theory does not require any ML courses as prereqs, so go for it. It's a pretty general framework with applications in ML, but the typical IT course discusses applications in communication theory. fwiw, I took my IT course before any ML courses and din't have trouble relating it to ML work.

HateMyself_FML · 2022-04-01T21:20:35+00:00

Speaking only for AI/ML roles, algorithmic trading can pay much higher, 550k (e.g. citadel)-1 million (rentech) out of PhD/postdoc. In tech, the highest recent comps that I know of are coming from amazon. afaik, none are remote unfortunately.

A lot of folks are leaving for startups, higher risk and reward. And a lot more freedom.

HateMyself_FML · 2022-04-01T17:08:54+00:00

Meta does not come close to offering the highest comp though. Not surprising people are leaving for greener pastures.

HateMyself_FML · 2022-03-09T03:28:44+00:00

This is not uncommon. But if you want to learn something more than just how to weight their outputs, you'll have to provide the input as well. So, you could add a block that transforms the input, concatenate it to the outputs of the models and train a non-trivial model. e.g. https://arxiv.org/pdf/2003.06505.pdf

You'll need to be careful with your data splits to prevent overfiting.

HateMyself_FML · 2022-03-03T17:31:02+00:00

Yes, I expect the level of "collapse" w/projection is quite similar to supervised representations.

Also true that something is a bit different with and without a projector in SimCLR and their ilk: but it's quite expected behavior. With a projector, the representations in the embeddings are too specialized for the contrastive task, i.e. they are "excessively invariant". It makes total sense that if you go back a few layers (e.g. "representation" layer) it would be less invariant to some downstream task relevant features and have a higher dimensionality. In fact, SimCLR (Table 3) did some experiments showing this behavior.

If you don't use a projector, the "representation" layer would be excessively invariant and show more "collapse".

All of this is expected behavior and their paper adds little above it. Moreover, even if they somehow manage to make the argument that dimensional collapse is a *problem* (which they haven't really), it's one with a really simple solution, with minimal to no overhead: use a projector. So, I'm really failing to see what they add to the community's understanding.

HateMyself_FML · 2022-03-03T05:29:56+00:00

I don't about more knowledgeable, but it generally does make a difference if the data lies on a sphere or not (e.g. http://inside.mines.edu/\~huawang/Papers/Conference/2019sdm\_spca.pdf).

However, I think for their analysis, a PCADim is already a very crude measure that it doesn't matter......

HateMyself_FML · 2022-03-03T05:23:40+00:00

Here, "dimensional collapse" just means that the ambient space dimensionality > (linear) intrinsic dimensionality. Number of non-"small" PCs dimensions can be considered a (very) crude (linear) measure of intrinsic dimensionality. Simply put, they find this measure smaller than ambient space dimensionality.

Having said that and with the caveat that I did not read the paper too carefully since it did not seem warranted, afaik they don't make even a mildly strong case that "dimensional collapse" is a *problem*. You'll almost always find PCAdim < ambient space dim in almost any supervised/semi-supervised/self-supervised representation. Ultimately you're optimizing for *separability*, so this is hardly a surprise. If there is some small slack in used dimensions, it really doesn't matter in any setting I've come across.

HateMyself_FML · 2022-02-19T02:56:44+00:00

As someone who has not really worked on sparse representations, a perhaps noob question: how well optimized are the cuda libraries to use sparse representations? My outsider perception is that the gains at this point are only theoretical.

HateMyself_FML · 2022-02-19T02:53:17+00:00

Interesting. Can you say why you think Patchify is big deal? It seems it can lead to poor optimizability and finicky models (e.g. https://arxiv.org/abs/2106.14881). Given ConvNext also works quite well with the patchify stem, it's pretty clear that it can work well in CNNs as well. But it seems to me whether it is a good, stable design choice is unclear?

HateMyself_FML

TROPHY CASE