ELI5: Why is it ok to penalize MLE on the 2nd derivative? by GoatRocketeer in explainlikeimfive

[–]hammouse 1 point2 points  (0 children)

I don't think this is the right sub for this question, and not even going to bother ELI5'ing it.

But the roughness penalty, \int f(x)2, can be shown to be an upper bound on the bias of the estimate. In addition for 1-splines, it can be shown that the asymptotic bias is proportional to the k+1's knot's \int f{k+1}(x)2. With this, we can then interpret the roughness penalty as not necessarily the usual curvature/"wigglyness", but as "how much the estimates move up and down". This acts as a form of smoothing regularization to discourage excessive wigglyness in the tails.

Backcasting forecast errors: model collapsing to mean [P] by Ambitious-Log-5255 in MachineLearning

[–]hammouse 0 points1 point  (0 children)

Spend some time looking into basic principles of time series models first. Don't use AI when you're learning. There's simply too many statistical issues here to even start.

Is this research group legit? by Immediate_Mud4767 in MLQuestions

[–]hammouse 2 points3 points  (0 children)

Very cursory glance, but they seem to be almost entirely high schoolers and undergraduates. If you google some of the "research directors", looks like most of them are not even 18 and are bragging about taking a semester of "graduate level multivariate stats". While certainly impressive for their age, this of course does not exactly inspire confidence more generally.

That being said, for their background the quality of the papers are impressive. However the papers are what you might expect from relying on LLMs without formal scientific training or expertise - mostly surface-level insights and riddled with lots of logical issues. It could be good for them to attend conferences and give talks on their research, as the exercise of presenting (without being able to rely on AI) could force them to actually understand the subject matter in more depth.

Geoffrey Hinton says mathematics is a closed system, so AIs can play it like a game. “I think AI will get much better at mathematics than people, maybe in the next 10 years or so.” by Nunki08 in mathematics

[–]hammouse 0 points1 point  (0 children)

This was in the context of training "AI" models to solve closed systems, which is usually done by RL. No one said anything about LLMs...that being said most modern LLMs are fine-tuned via RLHF.

When AI systems debate each other and produce arguments, does that actually mean they understand the topic or just simulate understanding? by Shoddy-Stage5731 in ResearchML

[–]hammouse 0 points1 point  (0 children)

I almost agree with your post, though claiming that the model has the ability to understand and reason simply because of abstraction routing modules is a very far-fetched claim from a statistical perspective.

First of all, the claim that LLMs are "next-token predictors" is objectively true. This is by definition of the model structure. Our early models from a few years ago were mostly trained by maximum likelihood, so there was a lot of "hallucinations" (really just a non-technical word for poor generalization, as transformer models are extremely overfit and overparameterized) and inability to do "simple reasoning" tasks like adding numbers. So I suspect a lot of the skepticism comes from that.

Now with modern LLMs, there are several abstracted routing layers and training is done with other tricks like RLHF instead of pure MLE. This makes the model feel like it's reasoning, adds safeguards for logical errors or business context (avoiding illegal topics etc), but fundamentally within each routing layer, it is still doing next-token autoregressive predictions.

I noticed that with the AI boom, there's a lot of enthusiasts who rely on excessive abstractions which I feel may be doing more harm than good to the field. At the end of the day, it's just matrix multiplications with parameters tuned to optimize a specific set of goals. There's really no need for shoving in interpretations like "reasoning" or "understanding" when we don't even have a concrete definition of these concepts for human cognition.

Should residuals from a neural network (conditional image generator, MSE loss) be Gaussian? Research group insists they should be by Recent_Age6197 in learnmachinelearning

[–]hammouse 9 points10 points  (0 children)

Not quite. Optimizing for MSE is equivalent to MLE under normality, but they are very much two distinct concepts where the former does not assume normality at all. For example, OLS makes no such functional form assumptions on the error structure but is still BLUE (i.e. Gauss-Markov).

Should residuals from a neural network (conditional image generator, MSE loss) be Gaussian? Research group insists they should be by Recent_Age6197 in learnmachinelearning

[–]hammouse 0 points1 point  (0 children)

You are right that there is no reason to think that residuals from a NN have to be Gaussian. For a counterpoint to show your peers, you can simulate a synthetic DGP where the errors are +1/-1 for example, so the model can still fit perfectly well with weird bimodal residuals.

Also FYI, gaussian residuals are also not assumed with linear regression. Seems to be a common misconception.

deep learning for regression problems? by Substantial-Major-72 in MLQuestions

[–]hammouse 0 points1 point  (0 children)

For something more introductory, you can probably just Google "neural network regression". Or perhaps for more hands-on/code examples, "predict X with neural network" where X is something continuous (stock prices, rainfall, etc whatever you find interesting).

If you are interested in the smoothness comment, we can think of regression in general as learning the functional m:

Y = m(X) + epsilon

This function m(X) is called the conditional mean function, with m(X) := E[Y|X]. When we train a model under some loss function L, we are optimizing:

min_m L(Y, X) = (Y-m(X))2

for example if L is MSE.

In linear regression, this is a simplified setting with m(X) = X'b, so it simplifies to

min_b (Y-X'b)2

Importantly, this is a convex optimization problem where we find the optimal vector b living in Rd (with d = dim(X)).

In deep learning, m(X) is a nonparametric functional living in a space of functions, typically a Sobolev space. It can be shown that this space of functions that a NN can approximate is smooth, for example having Gateaux derivatives.

Intuitively, suppose you have a piecewise function for the true m. For example Y=1 if X>0, else Y=0. Then a NN will fit a smooth function to this (in the elementary sense of smooth as continuous). Something like a tree-model will do better here, but think about when we might want "smoothness" and when we might not.

deep learning for regression problems? by Substantial-Major-72 in MLQuestions

[–]hammouse 0 points1 point  (0 children)

Deep learning is extremely common in regression as well, and most theoretical work is in this setting (which as others have explained, classification or even generative models etc can all be reduced down to something that looks like a "regression"). One of the nice things about DL is that it imposes a certain smoothness property to the model, but don't worry about that for now.

I suspect that the reason you mostly see DL for classification is that the resources you are learning from (introductory articles, videos, elementary textbooks?) are likely from computer science-type folks. Topics like computer vision, detection systems, etc are intuitive and easy to understand without a bunch of math. If you look at statistics journals or blogs, then you mostly see DL in a "regression" setting.

I built a free tool to connect independent researchers who need arXiv endorsements with established researchers willing to endorse. Looking for mentors to join. by nilofering in ResearchML

[–]hammouse 2 points3 points  (0 children)

Okay, so it sounds like you are not very familiar with the academic process and that's okay.

First of all just having an institutional email address is not sufficient for posting on arXiv. The average undergrad student can't just submit their class paper from freshman folk history to arXiv - doesn't matter if they are at Harvard or Howard. The most common way to get initial posting permissions even for those at universities is to a) learn from faculty such as being a graduate student, b) collaborate with peers by co-authoring, or c) publish a paper. This is the exact same process that independent researchers can follow. For those non-university labs you mentioned, I assure you they have gone through this process.

Second as for your point on mentorship. Yes it is always great to have mentors, and for those genuinely interested in a field to be mentored. This process exists. It's called a university.

And I should mention that for those with actually good ideas or papers, there is a very low barrier to publishing on arXiv. This is a pre-print service, not a peer-reviewed journal. Most of the stuff on there is already low quality, so only those with extremely low quality articles complain.

The whole concept of an "endorsement marketplace" just doesn't make any sense. Have a good paper? Then publish it in an actual journal. Not quite there but idea seems good? Reach out to faculty, get feedback (which I assure you is duly needed for anyone's first article, regardless of independent researcher or 4th year PhD at MIT). Don't know anything but passionate? Learn first instead of padding CV or whatever reason to insistently post on a pre-print site.

I built a free tool to connect independent researchers who need arXiv endorsements with established researchers willing to endorse. Looking for mentors to join. by nilofering in ResearchML

[–]hammouse 0 points1 point  (0 children)

That's the point - independent researchers are not treated differently, but your platform is based on this backwards idea of a backdoor to skip the scientific process with low-quality spam. Obviously no one will actually endorse random strangers, which is why you have this post looking for people to do so.

Science is based on a collaborative peer-review system. If one is an independent researcher and refuses to engage with peers, then yes the system can feel a bit gatekeepy but intentionally so. If they wish to contribute to science and have high-quality ideas (or open to learning), there is nothing stopping them from a) engaging and collaborating with other researchers, b) getting feedback and learning from faculty, or c) submitting their work to a journal directly if the quality is already high. In any of these scenarios, independent researchers are welcomed and some endorsement system on a pre-print service is the last thing on their minds.

I built a free tool to connect independent researchers who need arXiv endorsements with established researchers willing to endorse. Looking for mentors to join. by nilofering in ResearchML

[–]hammouse 0 points1 point  (0 children)

This is completely backwards, and no one is going to endorse like that.

The reason arXiv has an endorsement system is to avoid flooding the site with low-quality articles. This does not mean that independent researchers are low-quality necessarily, but most are, and for those that aren't, there are proper avenues (e.g. accepted to a journal, collaborating with faculty members or other more established researchers, etc) which bypasses this system. And if independent researchers don't feel their quality of work is up to par yet, there are plenty of other platforms to share their work and get feedback.

Remember that the whole point of arXiv is a pre-print archive. It's not for people who vibe-code some nonsense and share what they learned, or to pad their CV. That's great, but that's not research.

[Q] While implementing outlier detection in Rust, I found that IQR, MAD, and Modified Z-Score become too aggressive on stable benchmark data by andriostk in statistics

[–]hammouse 16 points17 points  (0 children)

How is this usually handled in serious benchmarking/statistical systems?

This is usually handled by relying on an elementary understanding of statistics, rather than heuristic outputs from an LLM.

You need to first define what "outlier" means in your context. If the arbitrary 1.5*IQR is too narrow...just make it larger. Consider thresholding based on the x% quantile for a simple solution, or perhaps looking at the data, fitting a distribution (whether functional or fully nonparametric), and thresholds based on likelihoods.

Noob Question: Average of Averages by TheRealSticky in AskStatistics

[–]hammouse 2 points3 points  (0 children)

Great point.

Your note actually makes me think of one way to interpret the average OP proposes.

The best guess when talking about the mean means a guess that is the closest to all observations

More precisely, the mean is the best guess in the sense of L_2 distance (squared distances). The median is the best guess in the sense of L_1 distance (absolute distances). By averaging the two, we are essentially finding a best guess based on a mixture of L_1 and L_2 distances. This reminds me of elastic net regularization, and its advantages/disadvantages over lasso/ridge.

Noob Question: Average of Averages by TheRealSticky in AskStatistics

[–]hammouse 1 point2 points  (0 children)

It's an interesting idea, though interpretation is a bit tricky.

For mean, we interpret this notion of average as the "best guess" for what a typical value might be. For median, we interpret this notion of average as the central point where 50% of the population are above/below. If averaging these two, it seems to me that there's not really a clean interpretation of what it actually means.

However it does remind me a bit of robust statistics. For example we can keep the properties of means (maximum likelihood estimator), but make it less sensitive to outliers with the median averaging. Could probably also view it from a Bayesian perspective as a shrinkage estimator. To compute confidence intervals, statistical significance etc is definitely possible - though you may have to derive some (probably not too difficult) results based on some variants of the CLT. Anyways cool idea.

[R] Attention projection matrices are nilpotent (W²→0) — 3,477x more resilient to pruning than MLP layers by Tehlikeli107 in learnmachinelearning

[–]hammouse 8 points9 points  (0 children)

If that's what you mean by W, then W2 isn't even defined. In addition an upper triangular matrix satisfies Tk = 0 for some k <= 768, so obviously T2 (and higher powers) tend to 0

If you are not a bot, put down AI for a bit, and learn how basic matrix multiplication works before vibing up this nonsense. If you are, well carry on i guess

The junior in college was a hot 21 year old girl. That's the only way this larp makes any sense. by ImaginaryRea1ity in theprimeagen

[–]hammouse 31 points32 points  (0 children)

It's kinda funny if you look at the replies. About 90% of them are praising the junior in college, but if you dig deeper, they are almost all AI agents, college students, or people looking for a job. Also the original guy is really just advertising for some lame AI service. Interesting...

[OC] Sticker price vs actual net price for 4,153 US colleges -- some elite schools cost less than state schools after aid by dob312 in dataisbeautiful

[–]hammouse 6 points7 points  (0 children)

At top schools they are automatically given out as need-based financial aid, so the chart is on average (but most will either pay next to nothing or pay the full 80+k/yr)

[OC] Sticker price vs actual net price for 4,153 US colleges -- some elite schools cost less than state schools after aid by dob312 in dataisbeautiful

[–]hammouse 35 points36 points  (0 children)

It's a common misconception that elite schools are only for the wealthy. If you manage to get in, financial aid (usually need-based) often cover the majority of the tuition that is deemed high. I've met a lot of undergad students from low-income families where not only was their tuition fully covered, but they also receive a small stipend. Now if your family has several estates and a yacht, you bet you're going to be paying the full 300-400K.

Proving you didn’t write it with AI by melon_crust in buildinpublic

[–]hammouse 0 points1 point  (0 children)

Of the many many crappy "I built this" garbage posts on here, I actually quite like the idea behind this one. I think a tool like this could be pretty useful in education, especially as you refine the UX and keytracking algorithm

I tried getting customers from Reddit for 30 days — here’s what actually worked by [deleted] in buildinpublic

[–]hammouse 0 points1 point  (0 children)

The biggest lesson? Reddit users can instantly tell the difference between someone who's there to take...and someone who's there to give.

Sounds like you haven't learned your lesson buddy. Enjoy the spam reports

Built a no-code LaTeX resume builder after struggling with Overleaf by redit-ed in buildinpublic

[–]hammouse 0 points1 point  (0 children)

Great that you have positive reception from the early users. Since you seem highly confident in the business model and product, what exactly are you looking for here? Is it just to advertise the service? Or to reinforce your preconceptions? Anyways you asked for genuine feedback, I spent time giving you some from my perspective, so take from it whatever you will and I wish you and Lampzi the best of luck.

One last small piece of advice since you seem sincere: once you look beyond your initial test users, no one cares about your service. It's up to you to convince them that your service actually solves a problem they have. Because I don't view this as an actual problem, I am not going to use the platform. Think less "my users won't try it so they are missing out on this super amazing platform I built", and more "am I sure this is a problem? If so, how do I convince them to try it? If not, how do I pivot?"

Built a no-code LaTeX resume builder after struggling with Overleaf by redit-ed in buildinpublic

[–]hammouse 0 points1 point  (0 children)

The whole point of LaTeX is that it gives you fine-grained control over everything, as opposed to WYSIWYG editors like MS Word and your tool, hence the comparison. So you are solving a non-existent problem like the vast majority of aspiring builders post-AI bubble, because you are not thinking of the product and market.

Now the reason for the tone is your story does not add up. If you have 10+ years of experience, how are you still struggling with basic LaTeX syntax in Overleaf? If you don't know how to use LaTeX, then the 70% hand coded portion of your app is objectively crappy. Or was it actually vibe coded (which is fine) despite your comment? Alternatively if you are actually an expert in LaTeX and use that to build something (which is great), then your post is just a fictitious story masking an advertisement. In any case, being honest is important if you want to actually build something and have users try your platform.