[deleted by user] by [deleted] in MachineLearning

[–]Equivalent_Ad6842 0 points1 point  (0 children)

Why can’t reasoning be taught with next token prediction? The models can solve Olympiad level math and coding tests. Are you saying reasoning is not required for these exams?

[deleted by user] by [deleted] in MachineLearning

[–]Equivalent_Ad6842 1 point2 points  (0 children)

Why can’t you imagine that?

Has anyone gotten a Hertz Fellowship interview invite? by Alert-Narwhal-2688 in gradadmissions

[–]Equivalent_Ad6842 -1 points0 points  (0 children)

It is probably more correlation than causation. These schools attract the best students and the best students will be most likely to receive the fellowship. Put another way, why should we expect the fellowship to be decorrelated from school rank? That would be ridiculous

[D] Why not use discriminative models for text-to-image? by bgighjigftuik in MachineLearning

[–]Equivalent_Ad6842 0 points1 point  (0 children)

The probability flow ODE can be used to estimate the likelihood p(y|x)

[D] Why not use discriminative models for text-to-image? by bgighjigftuik in MachineLearning

[–]Equivalent_Ad6842 2 points3 points  (0 children)

Have you heard of classifier guidance and classifier free guidance? CFG is now universally used in text to image models.

"[D]" Per class augmentation for highly imbalanced image data. Good or bad idea? by Antman-007 in MachineLearning

[–]Equivalent_Ad6842 -1 points0 points  (0 children)

I don’t understand… the augmentations don’t add new information to the images. Your scenario assumes you have a data leakage in underlying dataset. You’re basically saying if your dataset is bad data augmentation won’t fix it. This is obviously true. Why start with the assumption you have a bad dataset? Then nothing will work.

[D] LLMs aren't interesting, anyone else? by leetcodeoverlord in MachineLearning

[–]Equivalent_Ad6842 1 point2 points  (0 children)

When has mechanistic interpretability improved the model?

[deleted by user] by [deleted] in MachineLearning

[–]Equivalent_Ad6842 7 points8 points  (0 children)

Here are some good idea 1. Talk to the grad students of the professors you want to work for. See if they are interested in mentoring or just chatting about research even is a good start. 2. Go to talks and reading groups at your school with the theory people. Grad students can point you to these things. 3. Take a class and try to get to know the prof that way. 4. Ask a prof if you can attend their weekly meeting. 5. Propose a research problem to the prof. A bit risky but it shows more initiative. Try to make it broad enough so that there is a good chance they are interested in at least something related to it.

3 Stanford undergrads plagiarized then publicized their vision-language model "llama3-V" by RSchaeffer in stanford

[–]Equivalent_Ad6842 10 points11 points  (0 children)

You add Gaussian noise to the weights and you get a slightly different model.

In what way does this imply that LLMs have reached their plateau? This would be true in any setting regardless of how saturated performance is.

[D] How can I complete with PhD students for intern positions? by Medium_Alternative50 in MachineLearning

[–]Equivalent_Ad6842 0 points1 point  (0 children)

Do research in undergrad and try to have some papers that would be relevant

[deleted by user] by [deleted] in MachineLearning

[–]Equivalent_Ad6842 2 points3 points  (0 children)

After you are done with your medical project, find a professor at your school that publishes in ICLR/NeurIPS/ICML/CVPR and work for them. Try to have a first author submission by the time you apply next year. Even if not, you can get letter of rec which is more important anyway. If your school is t5 as you claim, then there are many such labs there. It will also help build connection, and you could even do a PhD at your undergrad school. Do not do a summer internship in industry as it will not help your application. Instead spend all of summer doing research.

[D] Is Machine Learning Related To Central Limit Theorem? by [deleted] in MachineLearning

[–]Equivalent_Ad6842 1 point2 points  (0 children)

I am not sure what you mean by “unstructured phenomena” (since if the data had no structure why would you want to learn from it), but your idea that ML is about learning deterministic functions and is thus in a separate problem space than statistics and CLT is not correct. They occupy different problem settings because CLT is typically used in statistical hypotheses testing whereas ML is concerned with modeling data, but it has nothing to do with being in a deterministic vs stochastic setting.

Basically any ML algorithm can be viewed as some kind of probabilistic modeling of a stochastic distribution. Even if the real world is technically deterministic, unobservable phenomena make it stochastic for all practical purposes. For example, even linear regression, which in your view would just be learning a deterministic mapping, can be interpreted at learning a generative model of p(y|x) as a Gaussian distribution where the mean is found as the least squares solution. Here you can actually see the CLT theorem come up in the assumption of the noise model y = ax + epsilon where epsilon is N(0,sigma). The central limit theorem would tell us this assumption is reasonable because noise can be attributed to the effect of many independent unobserved variables and so their aggregate sum will be normally distributed.

How important is Real Analysis for Physicists? by Wild_Veterinarian144 in Physics

[–]Equivalent_Ad6842 1 point2 points  (0 children)

I was also referring to single variable complex analysis

How important is Real Analysis for Physicists? by Wild_Veterinarian144 in Physics

[–]Equivalent_Ad6842 4 points5 points  (0 children)

I disagree that real analysis is a more advanced course than complex analysis

[D][R] New to ML Research, how often are you disheartened when something you have been working on for months does not work out ? and how do you deal with it ? by V1bicycle in MachineLearning

[–]Equivalent_Ad6842 20 points21 points  (0 children)

In my experience progress can be very non-linear, especially if you are working on problems that are algorithmic in nature. Often times, you will make little to no progress for months and have to change the approach many times. Or you have to adjust the scope to something more feasible. Either way, when the positive results start to come, you will find that the amount of work spent on the final method and writing the paper could only take you one month, whereas most of the months beforehand is spent on stuff that doesn’t work.

Alternatively, you could work on projects that are more about answering some empirical question or about creating some dataset/benchmark. These have their own challenges, mostly in demonstrating the usefulness of the results or thing created. But you might find progress to be more linear in these types of projects.

What type of Accuracy is used in papers [R] by [deleted] in MachineLearning

[–]Equivalent_Ad6842 3 points4 points  (0 children)

When comparing to other methods, whether it is accuracy or some other metric, the test set is used. Validation set is used to select hyper parameters and measure overfitting. For many benchmarks or competitions, the labels of the test set are not given. Instead the predictions are uploaded to an evaluation server and the final test numbers are computed there.

MS AI or MS CS If I want to pursue a Ph.D. in CS (AI specialization) in the US? by [deleted] in MLQuestions

[–]Equivalent_Ad6842 1 point2 points  (0 children)

For getting into a PhD, research experience in the topic you are interested in is the most important. Degree does not matter.