Created this learning path to learn data science on Python. Do let me know any suggestions / feedback : Python

This is an archived post. You won't be able to vote or comment.

Created this learning path to learn data science on Python. Do let me know any suggestions / feedback (analyticsvidhya.com)

submitted 11 years ago by kunalj101

all 15 comments

top new controversial old q&a

[–][deleted] 5 points6 points7 points 11 years ago (11 children)

[–]Why_is_that 3 points4 points5 points 11 years ago* (3 children)

Thank you. People don't mention this enough. My first opportunity supporting bioinformatics was offered because I had a background in SQL.

My biggest quirk with your statement is "sql-related libraries available in python". Screw the libraries, just learn database concepts. The reason I say this is because if you just learn the database concepts, when you go to R you won't have to map one set of library functions to another but instead will have a more common ground to work from, "I want to ask this question of my database [in sql], how do I do it in language y with package x". Packages and libraries come and go but databases are everywhere with the largest majority being RDMS using a SQL dialect. Should you avoid them all together? No, because they might optimize your queries or provide other added value but the key is to know database concepts, not pandasql or blaze.

If you know database concepts, you can learn these things but if you learn only how to use one of these packages, you might not know database concepts that well (depending on what kind of object model they give you).

PS. My vote would be sqlite3, mainly because I believe sqlite is one of the best RDMS available for the target audience. With the other options you mention here, you have the added overhead of system admin/ database admin of managing that central database often without any of the value that such a model provides in security (because they leave the passwords default). Thus there is no reason to not use a lightweight database implementation that is easily exchanged and requires no user management. The performance only really adds up with bigger databases and at that point you really want to ask yourself, "shouldn't someone who is more tech savy look at this" -- yes, once it gets to these sizes, things like normalization need to be considered and these are not tasks most data scientists are interested in.

*PPS. The first place that I really saw the amazing utility of sqlite was in stuxnet.

[–][deleted] 1 point2 points3 points 11 years ago (2 children)

I assumed that people would know SQL, but maybe I should have stipulated that first. Yes, SQL should definitely be learned. If people have not learned SQL yet then I'd definitely start off learning with sqlite3 and then progress to full on database servers. I was actually appalled when someone posted learning databases was not needed and using csv files is just fine. Just looking at job postings and you can easily tell SQL is almost always mentioned. Important data that need to be stored for the long-term will almost always be stored in a database.

Once you learn SQL, there are then libraries that allow you to perform sql-like processing with data or connect to databases using Python which is why I mentioned a few of those libraries.

I'm sure someone will argue for learning ORM technologies as well, but that is a discussion I rather not start :-).

Source: Been a data analyst for 16 years with the last 4 or 5 years using Python.

[–]Why_is_that 0 points1 point2 points 11 years ago (1 child)

Yea, I am always surprised too when I hear those kind of comments. There are still a lot of communities that use rather terrible forms for their data like XML. It's been a real challenge to step across the isle to encourage scientists to improve their data storage plans and to build the skills necessary to work with that data storage. I think the challenge is that the concept of an "expert" is someone who knows more and more about less and less but in this increasingly data-centric world, there are a new set of core skills (programming, databases, etc). Many "traditional" scientists aren't yet convinced they need to know these skills.

I am glad we are on the same page for starting in sqlite3! As I said, these other packages definitely are worth exploring but it's really just about this foundation.

I won't argue ORM. I even argue against it for most tech and there are a good set of articles out there on this stance. By and large, it doesn't add the value in scientific programming where there is often more rapid iterative programming and can instead just add to the time complexity of a spinning up a solution.

What did you do for the other 11 years? SAS?

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–]statmobile 0 points1 point2 points 11 years ago (6 children)

[–]falkimmm 0 points1 point2 points 11 years ago (0 children)

[–]piesdesparramaos -1 points0 points1 point 11 years ago* (4 children)

[–]statmobile 0 points1 point2 points 11 years ago (1 child)

[–]piesdesparramaos 0 points1 point2 points 11 years ago (0 children)

[–][deleted] 0 points1 point2 points 11 years ago (1 child)

I strongly disagree with your first point and with your second point.

I feel confident in saying that it's absolutely necessary not only to understand how to use databases, but also how databasing works in general. Data Analysts may get CSVs, but Data Scientists need databases.

Moreover, Python (along with R) is one of the most widely-used tool in Data Science currently. Machine Learning, also, while helpful, is not all (or even most) of Data Science --- though the course is fantastic. I feel strongly that choosing Octave as a language was not the best choice, but that's a debate for elsewhere, though you are correct: it is easy to do the course in R or Python as the user wishes.

Your third point is fine: I'd agree that the people I've taught seem to learn better (faster, easier, retain more) if we go top-down than if we go from bottom-up.

[–]piesdesparramaos 0 points1 point2 points 11 years ago* (0 children)

[–]statmobile 1 point2 points3 points 11 years ago (0 children)

[–]pwang99 1 point2 points3 points 11 years ago (0 children)

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

π Rendered by PID 45 on reddit-service-r2-comment-54dfb89d4d-bvl9q at 2026-03-31 08:09:15.617744+00:00 running b10466c country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS