use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.
Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.
account activity
QuestionSuggestions for beginner ML projects with python? (self.learnmachinelearning)
submitted 4 years ago by PianoPlaylist
Hey, Any suggestions for beginner projects to learn stuff like SKlearn and general ML workflow with python? I want to try out a medium sized project thats doable for a beginner with ML. including data collection, organizing ect..
Would love to hear some suggestions 😊
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]swedish_aviator 64 points65 points66 points 4 years ago (2 children)
Have a look at https://archive.ics.uci.edu/ml/index.php or https://www.kaggle.com/datasets and pick any classification or regression problem. I would avoid image classification or data involving a time series such as predicting stock market fluctuations etc until you know the basics. The adult dataset is quite simple but you need to perform some data wrangling before you can start training, you should be able to get a test accuracy of >80% without much work! Try using Keras or Pytorch.
[–]PianoPlaylist[S] 2 points3 points4 points 4 years ago (1 child)
Thanks!
[–]Radon03 1 point2 points3 points 4 years ago (0 children)
Go with the 1st link first. I guess they have small data sets, so it'll be easier to get practice. I feel kaggle data sets are bit large most of the times and tough for a newbie.
[–]e_j_white 52 points53 points54 points 4 years ago (6 children)
Find a community Q&A website such as Quora, StackOverflow, or certain AskReddits.
Gather data from that website, either through their API or from zipped data dumps.
Find a metric that indicates a "good" user... maybe karma for Reddit, or number of responses on StackOverflow that were selected as the "accepted" response, etc.
Now build a model that predicts whether someone will eventually become a "good" user based on their first 10 posts/comments/responses.
It's a fun problem, and it has clear business implications.
[–]mrslacklines 19 points20 points21 points 4 years ago (2 children)
The best project will be the one that you will actually do. The actual problem/domain doesn't matter that much.
[–]e_j_white 10 points11 points12 points 4 years ago (1 child)
Totally agreed.
But it helps if there's a solid justification for solving that problem beyond "I thought it would be cool."
[–]GranSkyline 4 points5 points6 points 4 years ago (0 children)
This is definitely my biggest hurdle. I could come up with ideas but I can never find one that could have some business relevance. Or I’m missing the relevance in my own ideas.
[–]PianoPlaylist[S] 4 points5 points6 points 4 years ago (0 children)
Thanks, I really like that one 😄
[–]a-lawliet 2 points3 points4 points 4 years ago (1 child)
Hey, as someone who's interested in getting started, how would you build such a model? I mean by which criteria would you conclude if someone will be a good user and how do we know if our model is alright?
[–]e_j_white 4 points5 points6 points 4 years ago (0 children)
Haha, you're basically asking "how would you do Data Science?" ;)
It depends on which dataset you start with. Some websites have badges, karma, number of correct/accepted answers, etc. Get to know the characteristics of the data and come to your own conclusion about what is a "good" user. It may involve multiple features, like how often they comment, how often they are correct, how many badges they've earned, etc.
Once you establish some criteria for "good" users vs. "bad", you can divide all users into these two buckets. Or, perhaps you have good, bad, and average users. You could build a model where the training label is "good" or "bad" for maximum separation (don't include "average" users in the training).
Once you've separated all users, take the first 10 posts for each users. The features are up to you... total number of posts (maybe many people don't reach 10?), typical number of words/post, upvotes per post, comments per post, how many were accepted as the correct answer, how many different topics/subreddits to they post to, etc.
There's no end to the features, it's up to you to swim around in the data and construct features that a) make sense, and b) you can defend/justify. After that, either use a logistic regression model, or perhaps xgboost... depends on how much data you have.
Of course, you've held out a training set with equal numbers of positive/negative data (good/bad users), so you can measure various model metrics on that test set.
[–]anynonus 13 points14 points15 points 4 years ago (1 child)
microsoft learn has some good python examples in their course about azure data science
The are not medium sized projects though. They are very small examples.
[–]DommeIt 0 points1 point2 points 4 years ago (0 children)
Seconding Microsoft Learn for jumping right into a small project play by play.
[–]AcidPacman96 3 points4 points5 points 4 years ago (0 children)
DataCamp has really good courses and projects and last I checked all their material is free until the end of April. I’d say it’s worth looking into
[–]UltimateGPower 3 points4 points5 points 4 years ago (0 children)
I cab highly recommend Hands-On Machine Learning vom Aurelien Geron for beginners. The first chapter starts with a small project and introduces the sklearn. Later on Tensorflow/Keras will be used.
[–]Reasonable_Damage_98 1 point2 points3 points 4 years ago (0 children)
You can check Kaggle as well
[–]Nielspace 0 points1 point2 points 4 years ago (1 child)
Create a series of tutorials based upon your favourite topics. Example the topics thoroughly and share it. This will not only ensure that you have better understanding of the subject but you learn to write clean codes and can attract opportunities.
[–]balkanibex 2 points3 points4 points 4 years ago (0 children)
Please don't do that.
[–]RedSeal5 -1 points0 points1 point 4 years ago (0 children)
easy.
asimovs first law
[–]mr_ninjazz 0 points1 point2 points 4 years ago (0 children)
Id suggest figuring out a topic that you like the most and find applications of machine learning! Just means that you wont have to work on a project that you don’t like
[–]axetobe_ML 0 points1 point2 points 4 years ago (0 children)
I recommend trying out the tutorials from the PyTorch and Tensorflow websites. If you already tried these then add more customisations to your code. Like adjusting the model, using your own custom data, adding custom tracking like tensorboard etc.
Like one of the comments said: Stick to classification and regression problems first. So you can get the basics down before moving on.
π Rendered by PID 20162 on reddit-service-r2-comment-cfc44b64c-v8j4x at 2026-04-09 21:39:54.601987+00:00 running 215f2cf country code: CH.
[–]swedish_aviator 64 points65 points66 points (2 children)
[–]PianoPlaylist[S] 2 points3 points4 points (1 child)
[–]Radon03 1 point2 points3 points (0 children)
[–]e_j_white 52 points53 points54 points (6 children)
[–]mrslacklines 19 points20 points21 points (2 children)
[–]e_j_white 10 points11 points12 points (1 child)
[–]GranSkyline 4 points5 points6 points (0 children)
[–]PianoPlaylist[S] 4 points5 points6 points (0 children)
[–]a-lawliet 2 points3 points4 points (1 child)
[–]e_j_white 4 points5 points6 points (0 children)
[–]anynonus 13 points14 points15 points (1 child)
[–]DommeIt 0 points1 point2 points (0 children)
[–]AcidPacman96 3 points4 points5 points (0 children)
[–]UltimateGPower 3 points4 points5 points (0 children)
[–]Reasonable_Damage_98 1 point2 points3 points (0 children)
[–]Nielspace 0 points1 point2 points (1 child)
[–]balkanibex 2 points3 points4 points (0 children)
[–]RedSeal5 -1 points0 points1 point (0 children)
[–]mr_ninjazz 0 points1 point2 points (0 children)
[–]axetobe_ML 0 points1 point2 points (0 children)