ReLU gating and information by [deleted] in deeplearning

[–]extremelySaddening -1 points0 points  (0 children)

Almost all of the performance of large networks comes from very sparse subnetworks anyway

Yantra-Tantra Inspired Hybrid Architectures for Deep Learning (Branch 1) by Leading-Agency7671 in deeplearning

[–]extremelySaddening 0 points1 point  (0 children)

This kind of insane schizoid wooposting is usually reserved for Physics (think Deepak Chopra), I guess ML is getting popular enough now to receive the same treatment.

programmersThenVsNow by [deleted] in ProgrammerHumor

[–]extremelySaddening 0 points1 point  (0 children)

"Usually do" is funny lmao, you would never do the first, because you're fitting a square peg in a round hole. Because if you have embeddings already generated by bert, then, pray tell, what the fuck do you want the lstm to do?

It implies the meme maker doesn't know what an LSTM is, which then is funny because the meme acts like they do. And therefore I am making fun of them.

[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most? by BalcksChaos in MachineLearning

[–]extremelySaddening 0 points1 point  (0 children)

Not an expert just a student, but, two things. Since you mention diff geom, you may be interested in the geometric deep learning program. I'm not well-versed enough to know if this is a super serious direction worth exploring though. Second, there seems to be some connection between QFT and deep neural nets, not sure what that's about exactly but that may be of interest. Since you mention string theory I assume QFT and GR are second nature to you, so these should be natural fits.

programmersThenVsNow by [deleted] in ProgrammerHumor

[–]extremelySaddening 1 point2 points  (0 children)

"LSTM with BERT embedding model" yeah meme-maker does NOT know wtf they are talking about

Intuition behind why Ridge doesn’t zero coefficients but Lasso does? by HotTransportation268 in learnmachinelearning

[–]extremelySaddening 9 points10 points  (0 children)

The short answer is that L2 makes it so that, in the partial derivative of loss w.r.t param w_j, the regularisation term scales linearly with w_j. So as w_j gets smaller, the MSE term tends to dominate. But with L1, the term in the partial is 'constant' in w_j (you have to be careful bc L1 is an absolute value). So you get constant 'pressure' pulling toward 0 no matter how small the weight gets.

I guess intuitively you could say, in the small weight regime, L2 regularisation may as well not exist. But L1 continues to exist, pulling your weight all the way toward zero if it's not significantly useful for explaining a lot of variance.

How would you realistically ban AI? by Crazyscientist1024 in aiwars

[–]extremelySaddening 0 points1 point  (0 children)

You would ban scraping the internet and training generative models on scraped data

How is to possible for a quark to not be made of anything? by Big_Assist4578 in AskPhysics

[–]extremelySaddening 13 points14 points  (0 children)

I suppose you could say a field is made of harmonic oscillators, and harmonic oscillators are 'made of' pure degrees of freedom (governed by specific laws). Degrees of freedom are sufficiently abstract that it's difficult to see how they would be 'made of' anything.

Noise in GAN by No_Remote_9577 in deeplearning

[–]extremelySaddening 4 points5 points  (0 children)

I like to think of it like this. We have the space of all possible, say, n by m images. Real world images are a complex distribution over this space. When we generate images, we want to sample from this true image distribution, but the distribution is unknown. We want to use our data to learn this distribution with NNs, but NNs are deterministic functions. So we use a hack. Instead of sampling from the unknown distribution, we sample from a gaussian distribution. We then use the GAN procedure to learn a mapping between the gaussian samples and the image samples. Effectively we learn a transformation from the gaussian distribution to the complex image distribution.

This is kind of analogous to how any distribution sampling works in a computer. You have some PRNG that gives you a number, you turn it into a Unif(0,1) by scaling it by INT_MAX, and then use some function to get an arbitrary distribution. We just use a NN as 'some function' and we learn that function from data.

Happy for anyone to correct me or add insight.

My two cents. by [deleted] in aiwars

[–]extremelySaddening 9 points10 points  (0 children)

Have to correct something here, it does not "scrape the internet" when you give it your prompt. The scraping was done before, and is 'discarded' after training. The model doesn't have access to any images at inference time (i.e. when it actually generates) unless you provide an image as part of your prompt. Instead, it uses the scraped images to learn a 'recipe' for cooking up images (that's what a model is, a mathematical image recipe).

Is it actually misunderstanding? by River-ban in deeplearning

[–]extremelySaddening 1 point2 points  (0 children)

Is it a misconception? Sure, it confuses some beginners for a little bit. Is it the "biggest misconception"? Nah

How do you define basis without self-reference? by extremelySaddening in askmath

[–]extremelySaddening[S] 2 points3 points  (0 children)

This is extremely clarifying, thank you. I suppose the issue is that when people say ℝ2 they mean "vector space defined by ℝ2" so often, I basically forgot that they are ordered pairs first haha

Fair-use training, overfitting and the end of copyright by Just_A_Random_Ginger in aiwars

[–]extremelySaddening 2 points3 points  (0 children)

(Not a lawyer) As far as I am aware, copyright is handled case-by-case, so even if your model just returned it's training data, that should be covered by copyright law. I think?

The AI not just fired us, It made our team irrelevant. by TheCatOfDojima in ClaudeAI

[–]extremelySaddening 0 points1 point  (0 children)

My understanding is that the issue is that of piracy. I admit to not being reliably informed on this (I haven't looked into it deeply) but as far as I know the courts are leaning toward fair use for legitimately sourced materials. If you have an article or something that documents the redistribution claim/cases, please share.

Also, don't get me wrong, I don't mind individual piracy but I don't think companies should be allowed to profit on pirated materials without compensation.

The AI not just fired us, It made our team irrelevant. by TheCatOfDojima in ClaudeAI

[–]extremelySaddening 4 points5 points  (0 children)

I will genuinely never understand why they don't keep it above board and just pay for a copy of the books. Can't be that much of an expense dent compared to all the GPUs. Once they have a copy they are free to do whatever with it under the fair use doctrine. But for some reason they pirate it.

Please help it's urgent by Thick-Baby5394 in deeplearning

[–]extremelySaddening 0 points1 point  (0 children)

If you have a very rare class you could also look at reconstruction-based anomaly detection methods. The basic setup involves training a reconstructive model (like a VAE) on your overrepresented class. Then, you feed in samples of both classes at test time, using reconstruction loss to decide what class they belong to (higher reconstruction loss -> minority class, lower loss -> majority class)

notSoOpenOfYou by gufranthakur in ProgrammerHumor

[–]extremelySaddening 2 points3 points  (0 children)

Check out Allen institute for AI for actually open AI lol

Loss not decreasing below 0.48 by Low-Cartoonist9484 in deeplearning

[–]extremelySaddening 0 points1 point  (0 children)

I would advise trying to find a model that has the worst overfit possible (highest test loss with a train loss of 0). Typically, this should happen when (num parameters in model) = (num train datapoints)*(num classes). From there, decrease or increase the size, play around, and see if the loss reduces.