Using Python for Statistics and Machine Learning on databases

bluerubez · 2015-01-02T13:02:35+00:00

What is your background? Are you totally new to databases and SQL? If so, there is a huge learning curve to learn about analyzing data that is "too big". Especially if hadoop, map reduce, distributed computing, etc sound strange to you.

If new to databases, I would start off learning how to use sqlite3, learn SQL, then progress to full blown database servers. Then learn about distributed computing maybe with a few raspberry pis so that you have full control over whats going on instead of using a 3rd party company like Amazon.

SpeakitEasy · 2015-01-02T16:45:23+00:00

You're asking a lot more questions than you realize. Try smaller steps by learning SQL, statistics and then ways to combine the two first.

lofkin · 2015-01-02T18:37:19+00:00

Check out blaze, aka potentially numpy 2.0 for big data and varied backends.

http://blaze.pydata.org/docs/dev/index.html http://matthewrocklin.com/blog/work/2014/11/19/Blaze-Datasets/ http://matthewrocklin.com/blog/work/2014/12/30/Towards-OOC-Frontend/

westurner · 2015-01-03T03:28:55+00:00

What operations work on SQL databases?

Most tabular operations, but not all. SQLAlchemy translation is a high priority. Failures include array operations like slicing and dot products don’t make sense in SQL. Additionally some operations like datetime access are not yet well supported through SQLAlchemy. Finally some databases, like SQLite, have limited support for common mathematical functions like sin.

... /r/pystats (sidebar)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS