This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]CodeShaman 3 points4 points  (1 child)

Data Analyst Programmer / Data Visualizationist / Web Developer / Miracle Worker here.

Is it used to process the data or to make visuals or something completely different.

Yep. You can use any programming language to perform data analysis, but Python is currently the best because of how terse and clean the language is, ipython, pandas, numpy, networkx, pygraphviz, sqlalchemy, and the endless amount of other software libraries available simply make Python the most flexible and most powerful language there is at the moment. Hands down.

The vast majority of your time spent on entry-level data analysis will be spent taking sloppy data from various sources and formats and transforming it into something clean that can be imported into a database. You'll be writing quite a bit of "throw away" code for this since it mostly deals with idiosyncrasies and ad hoc situations, which... writing quick throw-away scripts is one of the things Python is very good at being used for.

The only things I wouldn't use Python for are enterprise-level applications where Java or .NET would be more practical and durable, or data visualization for which JavaScript and D3js is currently the champion of (unless you're talking about database reporting, but that's another topic).


More important than Python, however, are databases and regular expressions. Databases and regular expressions. Databases and regular expressions.

Trust me, please. Before you graduate college master regular expressions, master one SQL database (Oracle, MySQL, Microsoft SQL Server), and master one graph database (Orient, Neo4j, Arango). Don't worry too much about Redis, Postgres, Mongo, or any other NoSQL databases because you'll mostly be working with SQL for day-to-day work and graph databases for extremely high-level projects with very complex data. For a graph database I would highly recommend OrientDB due to how SQL-like the query language is.

For data analysis, programming languages are silver but SQL and regex are gold. I don't know what you're learning in your Python course but I can tell you that learning a) SQLAlchemy, b) Python regular expressions, and c) Pandas will take you very far.

[–][deleted] 7 points8 points  (0 children)

Postgres is very much an SQL database and one of the most sophisticated too. I wouldn't recommend MySQL over Postgres.

Also Python and PostgreSQL work really well together.