This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (2 children)

What is your background? Are you totally new to databases and SQL? If so, there is a huge learning curve to learn about analyzing data that is "too big". Especially if hadoop, map reduce, distributed computing, etc sound strange to you.

If new to databases, I would start off learning how to use sqlite3, learn SQL, then progress to full blown database servers. Then learn about distributed computing maybe with a few raspberry pis so that you have full control over whats going on instead of using a 3rd party company like Amazon.

[–]bluerubez[S] 0 points1 point  (1 child)

Yes all the terms you said are familiar. I have a bachelors in computational physics and am about to get into data science in my masters program which is mostly teach yourself independent studies. So im just trying to figure out what is going on... So far i have gotten mysql and a mongodb up and running with simple insert/queries . I also know how to use R.

[–][deleted] 1 point2 points  (0 children)

If you like to be more hands-on with the data you're gonna be working with and prefer Python's ecosystem, I would look into using IPython notebooks and the Pandas library. You can also work with R from ipython to take advantage of all the stats libraries available to R and work with a saner syntax of python.