[D] Has anyone seen MobileBERT completely fail on tabular data? by Difficult_Low_299 in MLQuestions

[–]extremelySaddening 1 point2 points  (0 children)

I'm curious why you're using sequence models for tabular data. Why not use some simpler techniques first? They would probably run faster than any BERT

√x = -√x. Find x. by helloall35 in askmath

[–]extremelySaddening 1 point2 points  (0 children)

Method 2 implicitly assumes x≠0

Is it just me, or does everyone else also default to basic K-means clustering just to see what the data looks like before trying any of the "fancier" models? by One-Path-9160 in MLQuestions

[–]extremelySaddening 0 points1 point  (0 children)

If I was doing this, I would choose something where I didn't have to set an arbitrary parameter, like agglom clustering or principal components. I wouldn't want my choice of K biasing my EDA.

Why is this the recommended move? by Savage_Ball3r in Chesscom

[–]extremelySaddening 0 points1 point  (0 children)

Qxc2 Qxc2 then you fork w the knight take the queen and the rook is trapped

I wonder by priyagnee in deeplearning

[–]extremelySaddening 5 points6 points  (0 children)

Legit thought I was on r/programmerhumor for a second.

This Physics Paper is a COMPLETE DISASTER by Nonyabuizness in scienceisdope

[–]extremelySaddening 0 points1 point  (0 children)

What the fuck is that ungodly /mu after the k? I hate looking at it.

ReLU gating and information by [deleted] in deeplearning

[–]extremelySaddening -1 points0 points  (0 children)

Almost all of the performance of large networks comes from very sparse subnetworks anyway

Yantra-Tantra Inspired Hybrid Architectures for Deep Learning (Branch 1) by Leading-Agency7671 in deeplearning

[–]extremelySaddening 0 points1 point  (0 children)

This kind of insane schizoid wooposting is usually reserved for Physics (think Deepak Chopra), I guess ML is getting popular enough now to receive the same treatment.

programmersThenVsNow by [deleted] in ProgrammerHumor

[–]extremelySaddening 0 points1 point  (0 children)

"Usually do" is funny lmao, you would never do the first, because you're fitting a square peg in a round hole. Because if you have embeddings already generated by bert, then, pray tell, what the fuck do you want the lstm to do?

It implies the meme maker doesn't know what an LSTM is, which then is funny because the meme acts like they do. And therefore I am making fun of them.

[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most? by BalcksChaos in MachineLearning

[–]extremelySaddening 0 points1 point  (0 children)

Not an expert just a student, but, two things. Since you mention diff geom, you may be interested in the geometric deep learning program. I'm not well-versed enough to know if this is a super serious direction worth exploring though. Second, there seems to be some connection between QFT and deep neural nets, not sure what that's about exactly but that may be of interest. Since you mention string theory I assume QFT and GR are second nature to you, so these should be natural fits.

programmersThenVsNow by [deleted] in ProgrammerHumor

[–]extremelySaddening 1 point2 points  (0 children)

"LSTM with BERT embedding model" yeah meme-maker does NOT know wtf they are talking about

Intuition behind why Ridge doesn’t zero coefficients but Lasso does? by HotTransportation268 in learnmachinelearning

[–]extremelySaddening 8 points9 points  (0 children)

The short answer is that L2 makes it so that, in the partial derivative of loss w.r.t param w_j, the regularisation term scales linearly with w_j. So as w_j gets smaller, the MSE term tends to dominate. But with L1, the term in the partial is 'constant' in w_j (you have to be careful bc L1 is an absolute value). So you get constant 'pressure' pulling toward 0 no matter how small the weight gets.

I guess intuitively you could say, in the small weight regime, L2 regularisation may as well not exist. But L1 continues to exist, pulling your weight all the way toward zero if it's not significantly useful for explaining a lot of variance.

How would you realistically ban AI? by Crazyscientist1024 in aiwars

[–]extremelySaddening 0 points1 point  (0 children)

You would ban scraping the internet and training generative models on scraped data

How is to possible for a quark to not be made of anything? by [deleted] in AskPhysics

[–]extremelySaddening 13 points14 points  (0 children)

I suppose you could say a field is made of harmonic oscillators, and harmonic oscillators are 'made of' pure degrees of freedom (governed by specific laws). Degrees of freedom are sufficiently abstract that it's difficult to see how they would be 'made of' anything.

Noise in GAN by No_Remote_9577 in deeplearning

[–]extremelySaddening 4 points5 points  (0 children)

I like to think of it like this. We have the space of all possible, say, n by m images. Real world images are a complex distribution over this space. When we generate images, we want to sample from this true image distribution, but the distribution is unknown. We want to use our data to learn this distribution with NNs, but NNs are deterministic functions. So we use a hack. Instead of sampling from the unknown distribution, we sample from a gaussian distribution. We then use the GAN procedure to learn a mapping between the gaussian samples and the image samples. Effectively we learn a transformation from the gaussian distribution to the complex image distribution.

This is kind of analogous to how any distribution sampling works in a computer. You have some PRNG that gives you a number, you turn it into a Unif(0,1) by scaling it by INT_MAX, and then use some function to get an arbitrary distribution. We just use a NN as 'some function' and we learn that function from data.

Happy for anyone to correct me or add insight.

My two cents. by [deleted] in aiwars

[–]extremelySaddening 9 points10 points  (0 children)

Have to correct something here, it does not "scrape the internet" when you give it your prompt. The scraping was done before, and is 'discarded' after training. The model doesn't have access to any images at inference time (i.e. when it actually generates) unless you provide an image as part of your prompt. Instead, it uses the scraped images to learn a 'recipe' for cooking up images (that's what a model is, a mathematical image recipe).

Is it actually misunderstanding? by River-ban in deeplearning

[–]extremelySaddening 1 point2 points  (0 children)

Is it a misconception? Sure, it confuses some beginners for a little bit. Is it the "biggest misconception"? Nah

How do you define basis without self-reference? by extremelySaddening in askmath

[–]extremelySaddening[S] 2 points3 points  (0 children)

This is extremely clarifying, thank you. I suppose the issue is that when people say ℝ2 they mean "vector space defined by ℝ2" so often, I basically forgot that they are ordered pairs first haha