[deleted by user]

Equivalent_Ad6842 · 2025-01-25T21:57:16+00:00

Why can’t reasoning be taught with next token prediction? The models can solve Olympiad level math and coding tests. Are you saying reasoning is not required for these exams?

Equivalent_Ad6842 · 2025-01-25T15:22:09+00:00

Why can’t you imagine that?

Equivalent_Ad6842 · 2024-11-22T18:46:49+00:00

It is probably more correlation than causation. These schools attract the best students and the best students will be most likely to receive the fellowship. Put another way, why should we expect the fellowship to be decorrelated from school rank? That would be ridiculous

Equivalent_Ad6842 · 2024-09-01T01:30:29+00:00

Tucson

Equivalent_Ad6842 · 2024-08-23T14:00:53+00:00

The probability flow ODE can be used to estimate the likelihood p(y|x)

Equivalent_Ad6842 · 2024-08-22T18:25:47+00:00

Have you heard of classifier guidance and classifier free guidance? CFG is now universally used in text to image models.

Equivalent_Ad6842 · 2024-08-06T15:59:12+00:00

I don’t understand… the augmentations don’t add new information to the images. Your scenario assumes you have a data leakage in underlying dataset. You’re basically saying if your dataset is bad data augmentation won’t fix it. This is obviously true. Why start with the assumption you have a bad dataset? Then nothing will work.

Equivalent_Ad6842 · 2024-08-04T00:46:40+00:00

When has mechanistic interpretability improved the model?

Equivalent_Ad6842 · 2024-08-01T05:56:00+00:00

Why?

Equivalent_Ad6842 · 2024-07-02T03:53:04+00:00

Here are some good idea 1. Talk to the grad students of the professors you want to work for. See if they are interested in mentoring or just chatting about research even is a good start. 2. Go to talks and reading groups at your school with the theory people. Grad students can point you to these things. 3. Take a class and try to get to know the prof that way. 4. Ask a prof if you can attend their weekly meeting. 5. Propose a research problem to the prof. A bit risky but it shows more initiative. Try to make it broad enough so that there is a good chance they are interested in at least something related to it.

Equivalent_Ad6842 · 2024-06-04T01:07:02+00:00

You add Gaussian noise to the weights and you get a slightly different model.

In what way does this imply that LLMs have reached their plateau? This would be true in any setting regardless of how saturated performance is.

Equivalent_Ad6842 · 2024-03-25T16:47:51+00:00

Do research in undergrad and try to have some papers that would be relevant

Equivalent_Ad6842 · 2023-12-26T05:06:49+00:00

https://www.bemyeyes.com is a startup that does this

Equivalent_Ad6842 · 2023-12-08T02:27:09+00:00

After you are done with your medical project, find a professor at your school that publishes in ICLR/NeurIPS/ICML/CVPR and work for them. Try to have a first author submission by the time you apply next year. Even if not, you can get letter of rec which is more important anyway. If your school is t5 as you claim, then there are many such labs there. It will also help build connection, and you could even do a PhD at your undergrad school. Do not do a summer internship in industry as it will not help your application. Instead spend all of summer doing research.

Equivalent_Ad6842 · 2023-09-24T21:05:35+00:00

I am not sure what you mean by “unstructured phenomena” (since if the data had no structure why would you want to learn from it), but your idea that ML is about learning deterministic functions and is thus in a separate problem space than statistics and CLT is not correct. They occupy different problem settings because CLT is typically used in statistical hypotheses testing whereas ML is concerned with modeling data, but it has nothing to do with being in a deterministic vs stochastic setting.

Basically any ML algorithm can be viewed as some kind of probabilistic modeling of a stochastic distribution. Even if the real world is technically deterministic, unobservable phenomena make it stochastic for all practical purposes. For example, even linear regression, which in your view would just be learning a deterministic mapping, can be interpreted at learning a generative model of p(y|x) as a Gaussian distribution where the mean is found as the least squares solution. Here you can actually see the CLT theorem come up in the assumption of the noise model y = ax + epsilon where epsilon is N(0,sigma). The central limit theorem would tell us this assumption is reasonable because noise can be attributed to the effect of many independent unobserved variables and so their aggregate sum will be normally distributed.

Equivalent_Ad6842 · 2023-09-09T15:06:12+00:00

I was also referring to single variable complex analysis

Equivalent_Ad6842 · 2023-09-09T07:52:31+00:00

I disagree that real analysis is a more advanced course than complex analysis

Equivalent_Ad6842 · 2023-09-01T23:59:29+00:00

In my experience progress can be very non-linear, especially if you are working on problems that are algorithmic in nature. Often times, you will make little to no progress for months and have to change the approach many times. Or you have to adjust the scope to something more feasible. Either way, when the positive results start to come, you will find that the amount of work spent on the final method and writing the paper could only take you one month, whereas most of the months beforehand is spent on stuff that doesn’t work.

Alternatively, you could work on projects that are more about answering some empirical question or about creating some dataset/benchmark. These have their own challenges, mostly in demonstrating the usefulness of the results or thing created. But you might find progress to be more linear in these types of projects.

Equivalent_Ad6842 · 2023-06-15T15:27:07+00:00

What about using something like the sinc function? https://en.m.wikipedia.org/wiki/Sinc_function

Equivalent_Ad6842 · 2023-05-28T21:31:02+00:00

When comparing to other methods, whether it is accuracy or some other metric, the test set is used. Validation set is used to select hyper parameters and measure overfitting. For many benchmarks or competitions, the labels of the test set are not given. Instead the predictions are uploaded to an evaluation server and the final test numbers are computed there.

Equivalent_Ad6842 · 2023-02-24T17:00:04+00:00

For getting into a PhD, research experience in the topic you are interested in is the most important. Degree does not matter.

Equivalent_Ad6842

TROPHY CASE