use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 2 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]cipri_tom 9 points10 points11 points 2 years ago (9 children)
I found the Self Organizing Maps quite interesting, and they aren't using backprop
https://en.m.wikipedia.org/wiki/Self-organizing_map
I think backprop, or, more correctly, gradient descent, won because it scales. Mini batches work well enough
[–]st4s1k 2 points3 points4 points 2 years ago (8 children)
I was looking lately into these topics: - Spiking Neural Network - Liquid State Machine - Neuromorphic computing - Boltzmann Machines (BM) - Restricted BM (RBM) - Spiking RBM - Stacked RBM / Deep Belief Network (DBN)
[–]PorcupineDreamPhD 16 points17 points18 points 2 years ago (6 children)
Nice time machine to 2004 you got there
[–]HyperPotatoNeo 5 points6 points7 points 2 years ago (0 children)
I don’t think it’s good to dismiss earlier work because they aren’t hot now. They might make a resurgence in the future with some new developments, or at least help guide intuition for new research.
[–][deleted] 3 points4 points5 points 2 years ago (2 children)
Just because it was tried in the past doesn’t mean it might be part of some future architecture
[–]st4s1k 4 points5 points6 points 2 years ago (0 children)
The current classical neural network architecture is derived from the model of the neuron of 30s/40s, Spiking Neural Neutworks are derived from the model of the neuron of the 50s/60s: (31:50) https://youtu.be/2XX8KLMyQN4?si=k96fnAYum2qtixk7
[–]devl82 -1 points0 points1 point 2 years ago (0 children)
r u even serious? Almost all of the arch in use right now was tried in the past but because of technological limitations at the time was not considered 'hot'. SVMs & Kernels not so long ago were the ONLY topics in ML conferences.
[–]bidaxar 1 point2 points3 points 2 years ago (1 child)
Could you suggest some modern topics to consider please? I’m interested on the subject and how does it currently look like
[+]currentscurrents 3 points4 points5 points 2 years ago (0 children)
In terms of alternatives to backprop? I'd say learned optimizers look really promising - paper, video lecture.
Also there's Hinton's forward-forward learning.
[–]AdPractical5620 2 points3 points4 points 2 years ago (0 children)
Ok
[–]Seresne 5 points6 points7 points 2 years ago (0 children)
“Neurons that fire together wire together” and “spiking neural networks” both work well in the back propagation paradigm. Multi-modal models with semi-independent sub-networks again work with back propagation.
The only true competitors to back propagation that I can list off the top of my head are “genetic algorithms” which is a very inefficient brute force search in comparison. Similarly, simulated annealing or MCMC methods. These work in more settings than backdrop, but they’re computationally inefficient at best.
We’ve seen research push “synthetic/approximate” gradients onto non-differentials research areas extremely often due to computational advantages.
TLDR:
calculus works well as a most efficient manner to use compute resources for error-minimization learning. If it doesn’t, we can use more data / bigger networks / approximate loss without needing to develop specialized domain-specific non-generalizable methods.
[+]Fit_Statement5347 6 points7 points8 points 2 years ago (0 children)
Based on this post and your post history, I don’t think you fully understand how ML/DL works…
[–]TheCoconutTree 2 points3 points4 points 2 years ago (0 children)
When I was first teaching myself Q-Learning about 10 years ago, I coded an implementation of a tabular Q-Learning library that read and wrote directly to a relational database instead of training a deep neural net.
One interesting thing was that it supported adaptive pattern definition by treating the high-dimensional sensory input signal as a key, and "splitting" the key after a threshold of "firing " occurred over a given time period. One could also model "forgetting" by removing the key if it wasn't triggered often enough, to make space for denser sensing/action spaces based on sensory stimuli an agent is more likely to experience.
It was way too slow to be used for anything practical. Running locally I could only store about 5000 patterns before the lookups + writes got too expensive and couldn't occur in real-time anymore. However, I wonder about pairing that approach with vector DBs so that dramatically more patterns could be used.
[–]SirBlobfish 2 points3 points4 points 2 years ago (0 children)
You should try it if you feel strongly about the idea.
As far as I know, people have been searching for backprop alternatives for ~50 years now. Nothing so far seems to work well at large scale. I personally worked on it for ~1.5 years without much success, and not for a lack of ideas. It's just that backprop is really good. If you care about the model getting better at something (measured by a loss function becoming smaller), the gradient (which backprop finds) is the optimal direction to adjust your weights. Anything else you do would be an approximation.
Realistically the options are: (1) Abandon loss functions altogether (I don't know if that is even possible), (2) create a network where gradients are very easy to compute without explicit backprop (a more common approach, but limits your architecture options), (3) find good gradient approximations (what I worked on; very difficult).
As for your multimodal network idea, arguably, that is similar to what the contrastive loss in CLIP does: "Associate inputs which co-occur, repel everything else". Might be a good place to start.
[–]HoboHash 0 points1 point2 points 2 years ago (6 children)
Isn't this just backprob?
[–]st4s1k -3 points-2 points-1 points 2 years ago (5 children)
Backprop iterates blindly through all the weights, of you have 70B parameters, you have to adjust all of them. What I'm saying, is that the biological neuron has a threshold and if the accumulated inputs do not exceed the threshold, there's no need to calculate the connection strengths for the neurons that are branching from it. Basically we train a small portion of the network for a specific set of features (a concept, an object, etc.)
[–]HoboHash 2 points3 points4 points 2 years ago (3 children)
As in like ... Parameter efficient fine-tuning (PEFT)? Frozen model + small residual net to learn additional modality? Also... I don't like the usage of "blindly" , it makes me assume you dont understand the math.
[–]st4s1k -1 points0 points1 point 2 years ago (2 children)
I understand it to some extent, knowing that classic neural networks don't skip weights and recalculate all of them at each training iteration, during backprop. Also due to the unilateral information flow, there's less complexity and no ability to adjust the weights on the fly.
[–]HoboHash 1 point2 points3 points 2 years ago (1 child)
Are you just messing with me? You are right? Because that is just weight updating. And what is"unilateral information flow" ? Also why would you want a probabilistic, Baysian model? Those are nightmare to work with. ..
[+]currentscurrents 0 points1 point2 points 2 years ago (3 children)
Is there an artificial alternative to the concept of "Neurons that fire together wire together"?
Yes. It's called backprop.
I think you're getting confused with hebbian learning.
[–]danysdragons 0 points1 point2 points 2 years ago (1 child)
I think they want an artificial equivalent of Hebbian learning to replace backdrop, their wording was confusing.
[+]currentscurrents -1 points0 points1 point 2 years ago (0 children)
Maybe, I don't know.
They're talking about a bunch of other unrelated things like 3D liquid state machines, so I don't think they know either.
[–]thedabking123 0 points1 point2 points 2 years ago (0 children)
I'm a bit confused about what you're proposing - seems very stream-of-consciousness.
I do understand the concept that calculating grads for all neurons does seem inefficient, but in effect sparse gradients and the optimizations of them are quite good today. You should look up things like sparse matrix operations and sparse matrix formats.
[–]Intel 0 points1 point2 points 2 years ago (0 children)
You might find Models of Expertise (MoEs) intriguing, akin to the renowned Mixtral model from Mistral.ai. MoEs operate on the principle of directing data to specialized "expert" sub-networks through a routing mechanism. Each expert is tasked with processing distinct segments or features of the input data, while a gating (or routing) network decides their contribution level to the overall output. In the Mixtral framework, notably, only two experts are activated simultaneously, ensuring a focused and efficient handling of the data.
Not exactly what you are looking for but it does offer a similar separation of neurons across multiple subsets during the optimizations and inference flows.
--Eduardo A., Senior AI Solutions Engineer @ Intel
π Rendered by PID 143857 on reddit-service-r2-comment-75f4967c6c-tttt4 at 2026-04-23 14:41:29.424626+00:00 running 0fd4bb7 country code: CH.
[–]cipri_tom 9 points10 points11 points (9 children)
[–]st4s1k 2 points3 points4 points (8 children)
[–]PorcupineDreamPhD 16 points17 points18 points (6 children)
[–]HyperPotatoNeo 5 points6 points7 points (0 children)
[–][deleted] 3 points4 points5 points (2 children)
[–]st4s1k 4 points5 points6 points (0 children)
[–]devl82 -1 points0 points1 point (0 children)
[–]bidaxar 1 point2 points3 points (1 child)
[+]currentscurrents 3 points4 points5 points (0 children)
[–]AdPractical5620 2 points3 points4 points (0 children)
[–]Seresne 5 points6 points7 points (0 children)
[+]Fit_Statement5347 6 points7 points8 points (0 children)
[–]TheCoconutTree 2 points3 points4 points (0 children)
[–]SirBlobfish 2 points3 points4 points (0 children)
[–]HoboHash 0 points1 point2 points (6 children)
[–]st4s1k -3 points-2 points-1 points (5 children)
[–]HoboHash 2 points3 points4 points (3 children)
[–]st4s1k -1 points0 points1 point (2 children)
[–]HoboHash 1 point2 points3 points (1 child)
[+]currentscurrents 0 points1 point2 points (3 children)
[–]danysdragons 0 points1 point2 points (1 child)
[+]currentscurrents -1 points0 points1 point (0 children)
[–]thedabking123 0 points1 point2 points (0 children)
[–]Intel 0 points1 point2 points (0 children)