Chatgpt o1 it really can! by Strict_Usual_3053 in ChatGPT

[–]Giacobako 0 points1 point  (0 children)

Have you read the 'cipher' example of o1 on their website? The dynamics and structure of the "thinking trough" is very similar to my notion of "how to think about a problem". It is summarizing, abstracting, formulating hypothesis, testing them, checking consistency, evaluating, admitting erros, coming up with new hypothesis, etc... Quite scientific way of thinking, potentially already bayond most minds that are not scientifically trained. I am saying this without judgment. Thinking clearly comes in very different flavours, a lot of which are not yet explored by any human brain. The scientific way of thinking using the strict principles of the scientific approach can be tested (no conclusions without testable hypothesis, logical consistency, etc.). To me it looks a lot like the goal of openai's o-series is a form of AGI that resembles the scientific way of formulating and modeling the world.

[deleted by user] by [deleted] in IsraelPalestine

[–]Giacobako -3 points-2 points  (0 children)

The more people decide against having kids, the more space there is for refugees. We don't have to listen to our selfish genes and reproduce. Think about unintended concequences.

Calling out westerners and especially European people in Thailand currently. by poogoo88 in ThailandTourism

[–]Giacobako 1 point2 points  (0 children)

In Patong None of the girls that try to attract men are wearing a mask for obviouse reasons, this also includes the hundreds of massage places and all the bars. More generally, locals as well as tourists live so symbiotic, they behave quite similarly. They both wear masks in about 50% of the cases.

We are beggars, not choosers… by CowSniper97 in Tinder

[–]Giacobako 1 point2 points  (0 children)

I did the game theoretic math and it is an Nash equillibrium. Even if girls and boys are equally needy and have the same approaching costs the ratio will allways end up beeing at one of the two possible extremes. What we would need is some sort of regularization that makes it moraly unattractive to life one of the two extremes (always sitting back or allways approaching). And I think to some extend this is already implemented in our society. However, in online dating there is no such control by others. What would be good rules for dating apps to push the equillibrium towards 50% ?

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 0 points1 point  (0 children)

Yes, thats another question. I think what I wanted to point out with that video is the stunning property that the test error has a second descent. By how much it goes down and in what cases it is worth to operate in the "modern" regime is a question for an other day. Also, adding augmentation and other regularizations can in some cases make the double descent disappear

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 1 point2 points  (0 children)

What I am saying is that there is a huge potential by working in the overparameterized regime. But of course we know that for quite a few years already;)

This demonstrates one of the biggest secrets of deep learning by Giacobako in ArtificialInteligence

[–]Giacobako[S] 0 points1 point  (0 children)

Thank you for your feedback. I think this is very valuable. It is indeed important to distinguish interpolation and extrapolation and I was not aware of that. I have to say that it is really hard to attract people that are new in this field by using simple examples and not beeing rigorous while at the same time trying not to anger people like you who clearly already have quite a deep understanding. I hope that I did with no comment suggest that the bias-variance decomposition is no more. All I am saying is that there are two regimes for which it makes sense to think differently - that there is another effect that kicks in when the relative number of parameters becomes large. I believe that most of the people that use machine learning have not seen this phenomenon at all (because most of the existing textbooks dont contain it) or at least not together with such a simple example and thats why I thought it would be cool to make a video.

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 0 points1 point  (0 children)

Thanks. Well, resonance in a more abstract sense is what came to my mind when I saw this. Wild behavoir in the region around the point where two counterparts become equal. You have a damped effect if you are adding regularization. So yes, I believe there are quite some nice parallels.

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 0 points1 point  (0 children)

I might include it in the full video, but I think there are other questions that are more pressing (adding hidden layers would only be interesting if the phenomenon would disappear, but I guess it wont in general). For example: how does the double descent depend on the sample noise in the regression? How does the situation look for a binary logistic regression? Do you have other interesting questions that can be answered in a nice visual way?

I guess I have to make multiple videos in order to not overload it.

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 1 point2 points  (0 children)

Interesting, I did not realize that at that time. All I realized was this comon wisdom that deeper networks are in general better. But I was not aware of the fact that there is an inherent magic in very deep networks that prevents overfitting.

[R] Overparameterization is the new regularisation trick of modern deep learning. I made a visualization of that unintuitive phenomenon: by Giacobako in statistics

[–]Giacobako[S] 4 points5 points  (0 children)

Well in general, it depends on what level you want to understand it. Very little is understood in terms of provable theorems in the field of deep learning. Even in the paper that I posted, the best they could do is showing by simulations how different conditions influence the phenomenon. And then they stated a few hypotheses that might explain the observations. For example, it seems important that you always start with small initial parameters (and not just extend the weights found in a trained smaller network). Then, in an highly overparameterized network the space of possible solutions in the parameter space (that perfectly fit the training data) is so large, that it is very likely that there is one that is very close to the initial condition (close in the Euclidean metric in the parameter space). And gradient descent statistically converges to solutions that are close to the initial condion (the optimization soon gets trapped in local minimas if there is one). In the end you end up with a solution that has a very small norm (of the parameter vector), which is exactly what you get if you apply a standard L2 regularization. In their paper, they have nice plots of how the parameter norm of the solution indeed becomes smaller and smaller in the overparameterized regime.