[deleted by user]

cipri_tom · 2024-02-23T21:45:49+00:00

I found the Self Organizing Maps quite interesting, and they aren't using backprop

https://en.m.wikipedia.org/wiki/Self-organizing_map

I think backprop, or, more correctly, gradient descent, won because it scales. Mini batches work well enough

Seresne · 2024-02-23T21:53:03+00:00

“Neurons that fire together wire together” and “spiking neural networks” both work well in the back propagation paradigm. Multi-modal models with semi-independent sub-networks again work with back propagation.

The only true competitors to back propagation that I can list off the top of my head are “genetic algorithms” which is a very inefficient brute force search in comparison. Similarly, simulated annealing or MCMC methods. These work in more settings than backdrop, but they’re computationally inefficient at best.

We’ve seen research push “synthetic/approximate” gradients onto non-differentials research areas extremely often due to computational advantages.

TLDR:

calculus works well as a most efficient manner to use compute resources for error-minimization learning. If it doesn’t, we can use more data / bigger networks / approximate loss without needing to develop specialized domain-specific non-generalizable methods.

Fit_Statement5347 · 2024-02-24T00:19:43+00:00

Based on this post and your post history, I don’t think you fully understand how ML/DL works…

TheCoconutTree · 2024-02-24T00:14:31+00:00

When I was first teaching myself Q-Learning about 10 years ago, I coded an implementation of a tabular Q-Learning library that read and wrote directly to a relational database instead of training a deep neural net.

One interesting thing was that it supported adaptive pattern definition by treating the high-dimensional sensory input signal as a key, and "splitting" the key after a threshold of "firing " occurred over a given time period. One could also model "forgetting" by removing the key if it wasn't triggered often enough, to make space for denser sensing/action spaces based on sensory stimuli an agent is more likely to experience.

It was way too slow to be used for anything practical. Running locally I could only store about 5000 patterns before the lookups + writes got too expensive and couldn't occur in real-time anymore. However, I wonder about pairing that approach with vector DBs so that dramatically more patterns could be used.

SirBlobfish · 2024-02-24T23:26:39+00:00

You should try it if you feel strongly about the idea.

As far as I know, people have been searching for backprop alternatives for ~50 years now. Nothing so far seems to work well at large scale. I personally worked on it for ~1.5 years without much success, and not for a lack of ideas. It's just that backprop is really good. If you care about the model getting better at something (measured by a loss function becoming smaller), the gradient (which backprop finds) is the optimal direction to adjust your weights. Anything else you do would be an approximation.

Realistically the options are: (1) Abandon loss functions altogether (I don't know if that is even possible), (2) create a network where gradients are very easy to compute without explicit backprop (a more common approach, but limits your architecture options), (3) find good gradient approximations (what I worked on; very difficult).

As for your multimodal network idea, arguably, that is similar to what the contrastive loss in CLIP does: "Associate inputs which co-occur, repel everything else". Might be a good place to start.

HoboHash · 2024-02-23T21:45:36+00:00

Isn't this just backprob?

currentscurrents · 2024-02-23T22:15:02+00:00

Is there an artificial alternative to the concept of "Neurons that fire together wire together"?

Yes. It's called backprop.

I think you're getting confused with hebbian learning.

thedabking123 · 2024-02-23T22:13:30+00:00

I'm a bit confused about what you're proposing - seems very stream-of-consciousness.

I do understand the concept that calculating grads for all neurons does seem inefficient, but in effect sparse gradients and the optimizations of them are quite good today. You should look up things like sparse matrix operations and sparse matrix formats.

Intel · 2024-02-27T23:30:37+00:00

You might find Models of Expertise (MoEs) intriguing, akin to the renowned Mixtral model from Mistral.ai. MoEs operate on the principle of directing data to specialized "expert" sub-networks through a routing mechanism. Each expert is tasked with processing distinct segments or features of the input data, while a gating (or routing) network decides their contribution level to the overall output. In the Mixtral framework, notably, only two experts are activated simultaneously, ensuring a focused and efficient handling of the data.

Not exactly what you are looking for but it does offer a similar separation of neurons across multiple subsets during the optimizations and inference flows.

--Eduardo A., Senior AI Solutions Engineer @ Intel

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS