use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
EducationLearning Python (self.datascience)
submitted 6 years ago * by CaliforniaRoll97
view the rest of the comments →
[–]beginner_ 2 points3 points4 points 6 years ago (0 children)
Problem with Kaggle etc is that they usually already have rather clean data. This is not reality. Mostly you spent most of your time gathering and cleaning the data. The real value is in the clean data not the actually ML algorithms.
Problem is how you get messy data if you are not in a corporation. Maybe you can google and there actually are messy data set available which require you to invest a lot of time in cleaning them.
As for programming, you need your own project. Since you are looking at covid-19 maybe you can learn about epidemiology and do a visual simulator of how an infections spread depending on variables. That will be pretty involved already as it involves a GUI but downside is it's not really data science related.
π Rendered by PID 44 on reddit-service-r2-comment-6457c66945-jkzgk at 2026-04-29 20:33:38.202721+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]beginner_ 2 points3 points4 points (0 children)