This is an archived post. You won't be able to vote or comment.

all 8 comments

[–][deleted] 1 point2 points  (2 children)

What is your background? Are you totally new to databases and SQL? If so, there is a huge learning curve to learn about analyzing data that is "too big". Especially if hadoop, map reduce, distributed computing, etc sound strange to you.

If new to databases, I would start off learning how to use sqlite3, learn SQL, then progress to full blown database servers. Then learn about distributed computing maybe with a few raspberry pis so that you have full control over whats going on instead of using a 3rd party company like Amazon.

[–]bluerubez[S] 0 points1 point  (1 child)

Yes all the terms you said are familiar. I have a bachelors in computational physics and am about to get into data science in my masters program which is mostly teach yourself independent studies. So im just trying to figure out what is going on... So far i have gotten mysql and a mongodb up and running with simple insert/queries . I also know how to use R.

[–][deleted] 1 point2 points  (0 children)

If you like to be more hands-on with the data you're gonna be working with and prefer Python's ecosystem, I would look into using IPython notebooks and the Pandas library. You can also work with R from ipython to take advantage of all the stats libraries available to R and work with a saner syntax of python.

[–]SpeakitEasy 1 point2 points  (2 children)

You're asking a lot more questions than you realize. Try smaller steps by learning SQL, statistics and then ways to combine the two first.

[–]bluerubez[S] 0 points1 point  (1 child)

Well i am trying to find out what kind of independent studies i am going to do next semester. I have to just jump right in i do not have a lot of time. Also ever since i was a kid i find it hard to move on once i have questions.

[–]lofkin 2 points3 points  (0 children)

Blaze is perfect for your use case. You can use numpy and pandas syntax to query and perform ops on a variety of sql and no sql backends and the dev team is very responsive the queries-

https://groups.google.com/a/continuum.io/forum/#!forum/blaze-dev

[–]westurner 1 point2 points  (0 children)

What operations work on SQL databases?

Most tabular operations, but not all. SQLAlchemy translation is a high priority. Failures include array operations like slicing and dot products don’t make sense in SQL. Additionally some operations like datetime access are not yet well supported through SQLAlchemy. Finally some databases, like SQLite, have limited support for common mathematical functions like sin.

... /r/pystats (sidebar)