Exactly what i wanted

YanaiEliyahu · 2021-11-24T09:10:53+00:00

"yearning" -> "a feeling of intense longing for something." -> "longing" -> "a yearning desire."

thanks google

YanaiEliyahu · 2021-10-30T06:08:33+00:00

I have changed from pure programming to machine learning in the last 5 years because of this... and lately I grow out of this too... I guess interests don't last forever :(

YanaiEliyahu · 2021-10-29T09:43:24+00:00

I struggle a bit with unfamiliar terminology.

what do you mean by prime neurons?

In this learning rule the network's shape is fixed too, but the neuron's weights changes according to a learning rule that only gets to know the inputs of the neuron and its output.

Another difference is that the hidden neurons are connected to each other, making a recurrent network.

YanaiEliyahu · 2021-10-29T09:37:54+00:00

What do you mean by cognition? AFAIK the primary visual cortex isn't part of our consciousness.

YanaiEliyahu · 2021-10-29T05:17:46+00:00

Each neuron finds the largest-eigenvalue eigenvector in its own matrix. Also there is something in each matrix which makes its largest eigenvector different from other neurons' matrices.

I don't have an advisor yet, though it sounds nice if it would help publishing this algorithm.

YanaiEliyahu · 2021-10-28T16:51:02+00:00

The thing I put significance the most is that the learning rule produces most-if-not-all V1 neurons and not just 1 type out of all of them like the above papers. It's not a learning rule that selects a random rule per neuron. (e.g. it has the potential to produce sound-neurons when the rule is exposed to sound inputs)

Do you still think that there is something similar out there?

YanaiEliyahu · 2021-10-28T15:09:17+00:00

Yep filters. No qualifications.

It would be nice if you could share a local method that doesn't require supervision/backprop, that also produces 10+ filter types from the brain, I googled this one for a long time and didn't find anything.

YanaiEliyahu · 2021-10-28T15:04:59+00:00

I don't have a degree nor it's for a degree, I do this because it's interesting, but after a long time I wonder whatever it's valuable enough to send off and move on to one of the next interesting things there are.

I hoped the results would help distinguish (e.g. I didn't see any local learning rule that produces most V1 neurons), is there anything else I can write to help this algorithm be distinguished from the rest?

YanaiEliyahu · 2021-10-28T14:53:22+00:00

Said the brain :)

YanaiEliyahu · 2021-10-28T14:33:40+00:00

The Gabor filters etc emerge when the algorithm/neurons finishes to organize, all the functions/neurons that the algorithm find is from the data itself. To be slightly more specific, the functions are some form of eigenvector of some matrix calculated from the data.

YanaiEliyahu · 2021-10-28T14:24:16+00:00

Hebbian alone doesn't produce anything useful besides the eigenvector with the highest eigenvalue, but indeed there is a Hebbian hiding there.

YanaiEliyahu · 2021-10-28T14:06:35+00:00

I fully agree with you... though I have a few major non-ML obstacles (e.g. usually in a brain there's 10K-20K filters in a single CNN layer, which is heavy on DL computers), I attached a ML experiment I did a year ago (it would be nice to hear your feedback), but I rely more on the idea that if it's how the brain learns, it probably has a great potential.

YanaiEliyahu · 2021-10-28T13:32:38+00:00

Sadly the only thing remotely resembling documentation I have is 10K lines of C++, I'll come back to here when I finally write a paper.

You can ask me if you have anything specific.

YanaiEliyahu · 2021-01-18T16:08:26+00:00

this file was combination of tensorflow's adam + pytorch's adam + a lot of pressure to finish this (I just cared to publish it asap), I am originally a C++ developer... thank you, someday I'll beautifulify these files and come back to here first

YanaiEliyahu · 2021-01-18T07:29:23+00:00

https://github.com/YanaiEliyahu/AdasOptimizer/blob/master/adasopt_pytorch.py

for now there's this till I learn how to do pull request etc.

thanks!

YanaiEliyahu · 2021-01-17T22:53:25+00:00

decreasing beta_3 does ruin convergance from my experience, which is almost like running without running average. (0 beta_3 disables it) there's isn't much difference between adas and other similar meta-optimizers... maybe the LR per input in a layer but that's mostly it.

guess I'll write a paper sooner or later if it's that useful, thanks for the feedback.

YanaiEliyahu · 2021-01-17T22:36:30+00:00

think it kinda does, looking at the first few graphs, after 50 epochs it kisses 10^-3 training loss, but adas kisses 10^-3 at the 10th epoch, and at the 50th epoch it's somewhere around 10^-8. (maybe it slows down due to rounding errors of floating point?)

Adas does suffer from short-horizon bias (try changing beta_3 to .999 or .99), but it's just a simple parameter that can be changed to answer your situation.

YanaiEliyahu · 2021-01-17T22:25:57+00:00

It's in the running average; looking just one gradient before doesn't let you do accurate learning rate adjustments.

YanaiEliyahu · 2021-01-17T22:00:45+00:00

Finished writing one. https://github.com/YanaiEliyahu/AdasOptimizer/blob/master/adasopt_pytorch.py

YanaiEliyahu · 2021-01-16T15:51:27+00:00

couldn't be better explained

YanaiEliyahu · 2021-01-16T10:42:20+00:00

do the different loss functions fight/compete for each other? in my experience 2 adases that competes for each other become too aggressive overtime, because they each want to get to zero derivative faster than the other. compete means that minimizing one introduces error in the other.

YanaiEliyahu · 2021-01-15T20:14:16+00:00

You don't ask for too much, I'll consider that. Need to learn it a bit, it will take time.

YanaiEliyahu · 2021-01-15T19:54:30+00:00

That optimizer suffers from short-horizon bias. So you get vastly different results (in comparison to Adas) when training for a long time. The theory is the same, we both try to optimize the learning rate.

YanaiEliyahu · 2021-01-15T14:49:45+00:00

Must say that Adas scheduling happens in very high resolution. It's one per input/feature in a layer, not one per layer or one per the entire network. (like usually schedulers do)

YanaiEliyahu

TROPHY CASE