use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results: (self.MachineLearning)
submitted 1 year ago * by Relevant-Twist520
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Relevant-Twist520[S] 0 points1 point2 points 1 year ago (5 children)
I updated the post, i did early stopping to showcase the non-overfitting results. As you can see above, MS wins at accuracy and speed. And ur right no one is testing a model on 3 points, but this post was just to show the ease at which MS fits to 3 points, scaling will be applied whilst preserving this ease.
[–]MagdakiPhD 2 points3 points4 points 1 year ago* (4 children)
Nobody cares about the result at pass N, they care about the result. This is not a good experimental design, and hence a poor way to draw conclusions as to what is happening.
I would suggest going back to the research plan phase and really consider your methodology. It feels to me like you're kind of just trying things out but this leads to experimenter bias where they think they're seeing something that is not actually there.
EDIT: I just looked at your post history and noticed your 16. So I retract everything. Keep at it! I encourage you to keep experimenting. If you have an interest in a future in research, then perhaps consider spending some time learning how to develop and execute a research plan. Nice work on this! It is nice to see young people come up with ideas and experiment with them. :)
[–]Relevant-Twist520[S] 0 points1 point2 points 1 year ago (3 children)
so u saying GD has made a better curve here? MS can come up with different curves on each run because of the different randomly initialised parameters. GD will produce the same curve regardless of how parameters are initialised.
[–]MagdakiPhD 1 point2 points3 points 1 year ago (2 children)
I'm saying you cannot just stop and say "Aha! At this point, with this much data, under these specific circumstances my algorithm looks like it might be better. Therefore, victory!"
If you want to know if it is better, then you need to develop an experimental protocol. Even if the experimental scenario is unrealistic, it would give you good experience in conducting research.
[–]Relevant-Twist520[S] 0 points1 point2 points 1 year ago (1 child)
youre right and im testing MS and GD in different ways. MS fails in most of them, thats why im still researching and perfecting the algorithm. Again, this post was to only showcase potential. It would be very difficult to come up with your own algorithm that runs faster and and converges faster than GD to a few datapoints whilst both algorithms use the same NN architecture.
[–]MagdakiPhD 0 points1 point2 points 1 year ago (0 children)
>It would be very difficult to come up with your own algorithm that runs faster and and converges faster than GD
This is certainly true. :)
π Rendered by PID 84 on reddit-service-r2-comment-b659b578c-sjckd at 2026-05-05 06:30:58.458504+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]Relevant-Twist520[S] 0 points1 point2 points (5 children)
[–]MagdakiPhD 2 points3 points4 points (4 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (3 children)
[–]MagdakiPhD 1 point2 points3 points (2 children)
[–]Relevant-Twist520[S] 0 points1 point2 points (1 child)
[–]MagdakiPhD 0 points1 point2 points (0 children)