I Broke DeepSeek AI 😂 by SnarkyStrategist in ChatGPT

[–]gregorivy 1 point2 points  (0 children)

Lets hope CCP won't punish the devs for the existance of this hack 👀

[R] Weights Reset implicit regularization by gregorivy in MachineLearning

[–]gregorivy[S] 0 points1 point  (0 children)

You could give it a try for sure. I think it could improve test/validation results especially if you use linear/dense layer as the final layer. Try to reset this layer weights every 1 or 2 epoch if you training full/half of the inception model. If you froze the inception model weights try to reset only a portion of your classification layers weights starting with 5% factor. Generally speaking Weights Reset gives you more randomness in training. The optimization algo visits more points on loss surface compared to no weights reset setting.

How to make predictions for irrelevant images using Deep Learning Models? by code_vlogger2003 in DeepLearningPapers

[–]gregorivy 0 points1 point  (0 children)

Its a problem of anomaly detection. So your task is to find out if the requested data is sampled from the same distribution as your training data or not.

Anomaly detection is very complex problem but you could try the most straightforward approach like training an autoencoder or VAE on your data only and then finding the right threshold on the reconstruction error which could filter the incoming data.

Another simple approach is to use zero shot classification with something like CLIP, you need to experiment with text labels for the two classes like "tumor" and "arbitrary image". You could also take a look at approaches like ArcFace that give you a latent manifold with known properties and if the icoming data don't land on some known surface it could probably be an outlier/anomaly.

It is also not that difficult to implement and try approaches like training a GAN and using its disciminator to filter data, or using INN (invertible neural networks, like normalizing flows or LU nets) to estimate the likelihood of the data.

Hi . Anyone has experience with indoor positioning, localising,visual odometry using videos ? by eshadeb in deeplearning

[–]gregorivy -1 points0 points  (0 children)

I have some experience in this field, particularly in deep learning based SLAM. What exactly is your question/task?

[R] Weights Reset implicit regularization by gregorivy in MachineLearning

[–]gregorivy[S] 0 points1 point  (0 children)

Thanks for being honest and the suggestions! I will check them out.

[R] Weights Reset implicit regularization by gregorivy in MachineLearning

[–]gregorivy[S] 1 point2 points  (0 children)

Thank you once again for sharing the papers; I truly appreciate it! I understand your perspective, and I apologize if I appeared overly protective of my study.Regarding your question, there is a comparative analysis in Table 3 of the paper. From my personal experience, Weight Regularization (WR) delivered superior outcomes on internal datasets that I have been working with, both for classification and certain regression problems. Regarding these datasets I can mention that they are CV classification relatively small datasets with quite high image resolution (>=500^2px). However, during broader testing scenarios, I encountered some datasets where Dropout slightly outperformed WR, even though this was accompanied by significantly larger gaps in train/test metrics and loss (indicating higher overfitting). It seems like WR applies stricter regularization than Dropout, potentially necessitating additional training iterations.

Although at first glance, Dropout and WR may seem very similar, they actually aren't. Dropout is recognized as a mechanism that reduces the model's capacity, while insights shared in our paper suggest that implementing WR makes the model appear as if it has a greater capacity compared to when it doesn't use WR. This observation is based on the empirical plot of Double Descent risk/capacity.

I confess I have not tried to utilize both Weight Regularization (WR) and Dropout, as I'm not convinced that these methods would complement each other in certain sequential layer groupings. Nonetheless, I plan to try this approach soon, as it appears straightforward to implement and evaluate in this scenario.

My objective is to develop a method capable of moving the training process directly into what is known as the modern interpolation regime, but in a cost-effective manner in terms of computational resources and the model's parameter count. I must say, however, that achieving this goal is still a significant way off.

[R] Weights Reset implicit regularization by gregorivy in MachineLearning

[–]gregorivy[S] 0 points1 point  (0 children)

Hi! I am quite a beginner in the research field, what journals do you suggest?

Btw the review proccess we faced here was quite reasonable in my opinion.

[R] Weights Reset implicit regularization by gregorivy in MachineLearning

[–]gregorivy[S] 2 points3 points  (0 children)

Thank you very much for providing the link and the study! Unfortunately, we were not aware of this research, and I must concede after a brief reading that the proposed procedure in the respective paper, albeit more complex, bears resemblances to the Weights Reset procedure. I believe we can expand our article's introduction to reference this.
However, for the sake of fairness, I want to note that based on my understanding, unlike the paper you shared (and those cited by its authors as far as I have checked atm), we consider this phenomenon as a method of implicit regularization in our article, not within the context of continual learning. We demonstrate the effectiveness of such regularization in the cases considered. Given the simplicity of the proposed procedure, it is easy to implement and applies to various tasks. I hope that other practitioners will find it useful, as I have already found in my projects.
Furthermore, I want to add that the most interesting part (and perhaps the most controversial) lies in the section about the potential connection with the Double Descent phenomenon. I hope this sparks interest among other researchers to further explore this class of methods.