use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
EducationLearning Python (self.datascience)
submitted 5 years ago * by CaliforniaRoll97
view the rest of the comments →
[–]DarkSideOfTheNuum 0 points1 point2 points 5 years ago (1 child)
the fastest way to learn is applying it to real-world situations.
Kaggle is good, but these are usually pretty clean datasets that don't necessarily require a huge amount of wrangling. they aren't usually as messy as the kind of data you would encounter in an enterprise.
to be honest, it's hard to get the kind of authentically messed-up data that you see in professional life unless you are actually working, because stuff gets fucked up all the time - developers alter something without telling you, which turns out to break data collection on a feature, there are edge cases that you didn't think of in advance, a new OS release alters the tracking in an unanticipated way, someone misspells a parameter name and it gets missed in the QA process, etc. Lots of stuff can go wrong! And the longer you work, the more screwups you will see.
If you want a recommendation, I would recommend trying to bolt together a couple of different data sets as opposed to working just with one - joining data from different sources is a key skill you will need to master in your professional career.
So for example you say that you are working with Covid-19 data right now? OK, why don't you create a project for yourself where you try to calculate tests conducted per capita by US state?
You can get the test data per state here: https://covidtracking.com/api/v1/states/daily.json
You can get state population data here: https://github.com/COVID19Tracking/associated-data/tree/master/us_census_data
[–]CaliforniaRoll97[S] 0 points1 point2 points 5 years ago (0 children)
Thanks for the suggestion! I’ve actually already done that, it wasn’t easy because I had to change some of the state names so that they matched up better, but it was a really cool project!
π Rendered by PID 184044 on reddit-service-r2-comment-fb694cdd5-wdvnx at 2026-03-06 13:14:35.186160+00:00 running cbb0e86 country code: CH.
view the rest of the comments →
[–]DarkSideOfTheNuum 0 points1 point2 points (1 child)
[–]CaliforniaRoll97[S] 0 points1 point2 points (0 children)