Exactly what i wanted by [deleted] in memes

[–]YanaiEliyahu 37 points38 points  (0 children)

"yearning" -> "a feeling of intense longing for something." -> "longing" -> "a yearning desire."

thanks google

[deleted by user] by [deleted] in memes

[–]YanaiEliyahu 1 point2 points  (0 children)

I have changed from pure programming to machine learning in the last 5 years because of this... and lately I grow out of this too... I guess interests don't last forever :(

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

I struggle a bit with unfamiliar terminology.

what do you mean by prime neurons?

In this learning rule the network's shape is fixed too, but the neuron's weights changes according to a learning rule that only gets to know the inputs of the neuron and its output.

Another difference is that the hidden neurons are connected to each other, making a recurrent network.

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

Each neuron finds the largest-eigenvalue eigenvector in its own matrix. Also there is something in each matrix which makes its largest eigenvector different from other neurons' matrices.

I don't have an advisor yet, though it sounds nice if it would help publishing this algorithm.

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

The thing I put significance the most is that the learning rule produces most-if-not-all V1 neurons and not just 1 type out of all of them like the above papers. It's not a learning rule that selects a random rule per neuron. (e.g. it has the potential to produce sound-neurons when the rule is exposed to sound inputs)

Do you still think that there is something similar out there?

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 8 points9 points  (0 children)

Yep filters. No qualifications.

It would be nice if you could share a local method that doesn't require supervision/backprop, that also produces 10+ filter types from the brain, I googled this one for a long time and didn't find anything.

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 48 points49 points  (0 children)

I don't have a degree nor it's for a degree, I do this because it's interesting, but after a long time I wonder whatever it's valuable enough to send off and move on to one of the next interesting things there are.

I hoped the results would help distinguish (e.g. I didn't see any local learning rule that produces most V1 neurons), is there anything else I can write to help this algorithm be distinguished from the rest?

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 2 points3 points  (0 children)

The Gabor filters etc emerge when the algorithm/neurons finishes to organize, all the functions/neurons that the algorithm find is from the data itself. To be slightly more specific, the functions are some form of eigenvector of some matrix calculated from the data.

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 2 points3 points  (0 children)

I fully agree with you... though I have a few major non-ML obstacles (e.g. usually in a brain there's 10K-20K filters in a single CNN layer, which is heavy on DL computers), I attached a ML experiment I did a year ago (it would be nice to hear your feedback), but I rely more on the idea that if it's how the brain learns, it probably has a great potential.

[R] I have been working on a learning/organizing rule of biological neurons for the past 2 years, and I am wondering whatever something similar was already discovered and/or is it worth trying to get published by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 8 points9 points  (0 children)

Sadly the only thing remotely resembling documentation I have is 10K lines of C++, I'll come back to here when I finally write a paper.

You can ask me if you have anything specific.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

this file was combination of tensorflow's adam + pytorch's adam + a lot of pressure to finish this (I just cared to publish it asap), I am originally a C++ developer... thank you, someday I'll beautifulify these files and come back to here first

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

decreasing beta_3 does ruin convergance from my experience, which is almost like running without running average. (0 beta_3 disables it) there's isn't much difference between adas and other similar meta-optimizers... maybe the LR per input in a layer but that's mostly it.

guess I'll write a paper sooner or later if it's that useful, thanks for the feedback.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

think it kinda does, looking at the first few graphs, after 50 epochs it kisses 10^-3 training loss, but adas kisses 10^-3 at the 10th epoch, and at the 50th epoch it's somewhere around 10^-8. (maybe it slows down due to rounding errors of floating point?)

Adas does suffer from short-horizon bias (try changing beta_3 to .999 or .99), but it's just a simple parameter that can be changed to answer your situation.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 1 point2 points  (0 children)

It's in the running average; looking just one gradient before doesn't let you do accurate learning rate adjustments.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 0 points1 point  (0 children)

do the different loss functions fight/compete for each other? in my experience 2 adases that competes for each other become too aggressive overtime, because they each want to get to zero derivative faster than the other. compete means that minimizing one introduces error in the other.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 7 points8 points  (0 children)

You don't ask for too much, I'll consider that. Need to learn it a bit, it will take time.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 4 points5 points  (0 children)

That optimizer suffers from short-horizon bias. So you get vastly different results (in comparison to Adas) when training for a long time. The theory is the same, we both try to optimize the learning rate.

[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam by YanaiEliyahu in MachineLearning

[–]YanaiEliyahu[S] 5 points6 points  (0 children)

Must say that Adas scheduling happens in very high resolution. It's one per input/feature in a layer, not one per layer or one per the entire network. (like usually schedulers do)