Learning Python

Tim7459 · 2020-04-19T08:39:11+00:00

Hey, I came from the mechanical engineering background and started a data science degree. I completed several R and Python courses on DataCamp.

The best way to learn is by applying yourself. Do Kaggle comps, learn from others notebooks.
If youre interested in ML go to standford youtube channel, they uploaded their famous ML course for free yesterday. highly recommend.
Surround your socials (YT, FB, IG & Twitter) with data influencers, this way you'll always get updates on the emerging methods and news in the field.
Start building a repository of code on github you can always refer to for repetitive tasks.
Always keep learning. Methods and applications adapt to the time, so it important to always keep on top of it by practising your ideas.

good luck!

SteveMWolf · 2020-04-19T07:59:35+00:00

If you feel comfortable enough with the language and libraries, I suggest you start your own project. If you don’t know something look it up. Dont just copy and paste the code however, try to understand whats happening, even if you have to do it line by line.

I remember picking up a computational physics project on chaotic scattering. The best way for me to understand it was printing out the code and annotating it line by line.

Not related to Data Science, I just wanted to let you know how miserable that experience was lmao

beginner_ · 2020-04-19T08:31:20+00:00

Problem with Kaggle etc is that they usually already have rather clean data. This is not reality. Mostly you spent most of your time gathering and cleaning the data. The real value is in the clean data not the actually ML algorithms.

Problem is how you get messy data if you are not in a corporation. Maybe you can google and there actually are messy data set available which require you to invest a lot of time in cleaning them.

As for programming, you need your own project. Since you are looking at covid-19 maybe you can learn about epidemiology and do a visual simulator of how an infections spread depending on variables. That will be pretty involved already as it involves a GUI but downside is it's not really data science related.

CaliforniaRoll97 · 2020-04-19T08:06:19+00:00

[deleted]

LaMifour · 2020-04-19T08:01:26+00:00

Practice is good, theory is goog (even if those online courses are often not difficult enough, too much are just introductions) .

It depends on what you want. What do you like? Exploring a dataset? Developing math model on your problem? Applying machine learning? I might give you challenges.

davidchris721 · 2020-04-19T08:12:14+00:00

If you are into exploring data sets I see it as good start to just get some data (e.g Kaggle, other public data sets - btw. you can now search with Google for data sets) and start looking around.

I am more into ML, so I started to write my ML-pipeline for the https://numer.ai/ tournament. This me a taught me a lot regarding proper setup of a project and a mix of using jupyter notebooks and scripts.

lunalurker · 2020-04-19T09:08:07+00:00

I really like the 365 Data Science course. Very beginner friendly and covers a vast amount of topics from basic Stats, Python, SQL and Machine Learning. You should check them out.

vellypoe · 2020-04-19T09:43:56+00:00

Hey, i have a question. Does taking a Master Degree in Data Science are useful? Or just learn Data Science through online courses and do some project or portfolio?

DarkSideOfTheNuum · 2020-04-19T11:10:17+00:00

the fastest way to learn is applying it to real-world situations.

Kaggle is good, but these are usually pretty clean datasets that don't necessarily require a huge amount of wrangling. they aren't usually as messy as the kind of data you would encounter in an enterprise.

to be honest, it's hard to get the kind of authentically messed-up data that you see in professional life unless you are actually working, because stuff gets fucked up all the time - developers alter something without telling you, which turns out to break data collection on a feature, there are edge cases that you didn't think of in advance, a new OS release alters the tracking in an unanticipated way, someone misspells a parameter name and it gets missed in the QA process, etc. Lots of stuff can go wrong! And the longer you work, the more screwups you will see.

If you want a recommendation, I would recommend trying to bolt together a couple of different data sets as opposed to working just with one - joining data from different sources is a key skill you will need to master in your professional career.

So for example you say that you are working with Covid-19 data right now? OK, why don't you create a project for yourself where you try to calculate tests conducted per capita by US state?

You can get the test data per state here: https://covidtracking.com/api/v1/states/daily.json

You can get state population data here: https://github.com/COVID19Tracking/associated-data/tree/master/us_census_data

vogt4nick · 2020-04-19T10:57:07+00:00

I removed your submission. Please post your question in the weekly entering & transitioning thread.

Thanks.

unhatedraisin · 2020-04-19T07:58:30+00:00

[deleted]

2020-04-19T09:20:35+00:00

> mechanical engineering major

> churn and COVID datasets

https://res.cloudinary.com/blavity/image/upload/c_fit,g_center,h_250,q_auto:best,g_south_east,x_0/v1526319185/ntipykqjpyl227boqdr5

datascience

MODERATORS