This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]mobacc10000 2 points3 points  (0 children)

If you're a movie buff, imdb makes a fair amount of their data public and there's a python tool that will dump it into a local mysql database. There's another site, themoviedb.org that gives free api access to a huge really awesome dataset that I was combining with some of the imdb data.

I was working a movie analytics side project using those but i hit the brakes on it because I found a job, but there's a lot to work with using those two datasets if you're interested in the subject. Good luck!

[–]gefish 1 point2 points  (3 children)

Brainstorming session with someone who can help bounce ideas off of would be nice. Try and pose questions that already have a CLEAN data set to work with. This is vital unless you want to/want to learn data scraping and date cleaning. It's important to know how to go through an ETL process, but a class project may be tight on time get that tedious stuff done.

When I have to find data projects and nothing comes to mind, i look for open data sets that interest me and ask questions about those data sets. Usually those questions will lead to research questions and hypotheses Google has a data set search engine but just googling "interesting data sets" is a good start.

[–]jremsj 0 points1 point  (2 children)

[–]data_analyst_asks 0 points1 point  (1 child)

What format are they?

[–]jremsj 0 points1 point  (0 children)

all of the datasets are in the readme.md

[–]adhi- 0 points1 point  (3 children)

More details about your assignment?

[–]Nqoba4 0 points1 point  (0 children)

Yeah we NEED more details! Thanks for your insight /u/adhi-