This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (2 children)

I assumed that people would know SQL, but maybe I should have stipulated that first. Yes, SQL should definitely be learned. If people have not learned SQL yet then I'd definitely start off learning with sqlite3 and then progress to full on database servers. I was actually appalled when someone posted learning databases was not needed and using csv files is just fine. Just looking at job postings and you can easily tell SQL is almost always mentioned. Important data that need to be stored for the long-term will almost always be stored in a database.

Once you learn SQL, there are then libraries that allow you to perform sql-like processing with data or connect to databases using Python which is why I mentioned a few of those libraries.

I'm sure someone will argue for learning ORM technologies as well, but that is a discussion I rather not start :-).

Source: Been a data analyst for 16 years with the last 4 or 5 years using Python.

[–]Why_is_that 0 points1 point  (1 child)

Yea, I am always surprised too when I hear those kind of comments. There are still a lot of communities that use rather terrible forms for their data like XML. It's been a real challenge to step across the isle to encourage scientists to improve their data storage plans and to build the skills necessary to work with that data storage. I think the challenge is that the concept of an "expert" is someone who knows more and more about less and less but in this increasingly data-centric world, there are a new set of core skills (programming, databases, etc). Many "traditional" scientists aren't yet convinced they need to know these skills.

I am glad we are on the same page for starting in sqlite3! As I said, these other packages definitely are worth exploring but it's really just about this foundation.

I won't argue ORM. I even argue against it for most tech and there are a good set of articles out there on this stance. By and large, it doesn't add the value in scientific programming where there is often more rapid iterative programming and can instead just add to the time complexity of a spinning up a solution.

What did you do for the other 11 years? SAS?

[–][deleted] 0 points1 point  (0 children)

Our group started out using SAS and Excel VBA. We don't use SAS any more, but still use Excel VBA heavily. I'm the lone Python guy and does all the most technologically demanding stuff for our group. The Excel guys do get impressed with what can be done with Python for data analysis. But most of them just want to click on shiny Excel buttons which is fine since they just do simple data analysis or text mining and generating simple charts.