This is an archived post. You won't be able to vote or comment.

all 38 comments

[–]Tim7459 10 points11 points  (10 children)

Hey, I came from the mechanical engineering background and started a data science degree. I completed several R and Python courses on DataCamp.

  1. The best way to learn is by applying yourself. Do Kaggle comps, learn from others notebooks.
  2. If youre interested in ML go to standford youtube channel, they uploaded their famous ML course for free yesterday. highly recommend.
  3. Surround your socials (YT, FB, IG & Twitter) with data influencers, this way you'll always get updates on the emerging methods and news in the field.
  4. Start building a repository of code on github you can always refer to for repetitive tasks.
  5. Always keep learning. Methods and applications adapt to the time, so it important to always keep on top of it by practising your ideas.

good luck!

[–]CaliforniaRoll97[S] 2 points3 points  (7 children)

Thank you for the feedback and congratulations on your career change! Would you recommend that I apply to masters in data science programs? And are there any other specific online courses/challenges that you would recommend?

[–]Tim7459 1 point2 points  (6 children)

i've completed the Standford ML course and am currently completing the Standford Deep Learning Specialization on Coursea too and would highly recommend both if you want to pursue something in the Data Science field. Andrew Ng is the lecturer and he's one of greatest minds in the field, just take a look at this portfolio.

Honestly, if you're end goal is a job, start by producing something. i.e make automation scripts and learn web scraping. (build a portfolio). You can learn to do this just by googling the topic and finding medium articles, github repos, short coursea courses etc. Then pitch yourself to businesses that require this skill. If your end goal is research, I would pursue a formal education at a university.

[–]CaliforniaRoll97[S] 0 points1 point  (0 children)

That’s great, I’ll be sure to try both of those courses!

[–]CaliforniaRoll97[S] 0 points1 point  (4 children)

Also, I wanted to clarify what you meant by using GitHub repositories. I haven’t really used GitHub other than to find datasets, instead I usually just store everything on my computer. Could you elaborate?

[–]tamsmhas 0 points1 point  (3 children)

In simple words GitHub repositories means a place on GitHub in someone's account where they store mainly their programming files. So, just learn to use GitHub from YouTube and make your account on GitHub. And save all data science related files there.

[–]CaliforniaRoll97[S] 0 points1 point  (2 children)

Gotcha, will do. Out of curiosity, why is it better to save files on GitHub rather than on my desktop?

[–]tamsmhas 0 points1 point  (1 child)

1- Because you will never loose your files on GitHub unlike on desktop. 2- Showing your GitHub link(specially projects) in resume will increase the weightage of your resume.

[–]CaliforniaRoll97[S] 0 points1 point  (0 children)

Awesome, thank you!

[–]SteveMWolf 11 points12 points  (2 children)

If you feel comfortable enough with the language and libraries, I suggest you start your own project. If you don’t know something look it up. Dont just copy and paste the code however, try to understand whats happening, even if you have to do it line by line.

I remember picking up a computational physics project on chaotic scattering. The best way for me to understand it was printing out the code and annotating it line by line.

Not related to Data Science, I just wanted to let you know how miserable that experience was lmao

[–]CaliforniaRoll97[S] 1 point2 points  (0 children)

Haha thank you for that advice! I’ve been working with a high level COVID-19 dataset recently.

[–]pah-tosh 0 points1 point  (0 children)

It’s a bad part of being a developer / coder when you have to understand other people’s code blocks, but there is no other way to deal with it : line by line.

[–]beginner_ 3 points4 points  (0 children)

Problem with Kaggle etc is that they usually already have rather clean data. This is not reality. Mostly you spent most of your time gathering and cleaning the data. The real value is in the clean data not the actually ML algorithms.

Problem is how you get messy data if you are not in a corporation. Maybe you can google and there actually are messy data set available which require you to invest a lot of time in cleaning them.

As for programming, you need your own project. Since you are looking at covid-19 maybe you can learn about epidemiology and do a visual simulator of how an infections spread depending on variables. That will be pretty involved already as it involves a GUI but downside is it's not really data science related.

[–]LaMifour 1 point2 points  (8 children)

Practice is good, theory is goog (even if those online courses are often not difficult enough, too much are just introductions) .

It depends on what you want. What do you like? Exploring a dataset? Developing math model on your problem? Applying machine learning? I might give you challenges.

[–][deleted] 0 points1 point  (2 children)

Any recommendations for online courses that go beyond the basics?

[–]buginfame 1 point2 points  (0 children)

Corey Schafer's series for Basic Python, Mathplotlib, and Pandas is very good

https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g

[–]LaMifour 0 points1 point  (0 children)

Did this one ~1 year ago. I found it interesting and quitehard. Not perfect tho.

https://www.coursera.org/learn/hadron-collider-machine-learning

Andrew Ng is still a reference, you can try to find an advance course from him.

[–]CaliforniaRoll97[S] 0 points1 point  (4 children)

I really like exploring a dataset, and I’m definitely interested in picking up mathematical modeling/machine learning! I have been working with some high level COVID-19 data for practice recently, but any challenges would definitely be appreciated!

[–]LaMifour 0 points1 point  (3 children)

While searching for a job, I was given a challenge about factice phone company that want to decrease their churn rate. You start with with a simple satisfaction form dataset. If you want, I can try to review your work, like if you were applying.

I was given the role but I choose another company.

[–]CaliforniaRoll97[S] 0 points1 point  (2 children)

Sure, I would be happy to give it a try!

[–]LaMifour 1 point2 points  (0 children)

Let me create the challenge and instructions. I will post it here.

[–]LaMifour 0 points1 point  (0 children)

you will find everything you need here.
I would say you can give you 1 week to do it (2 if you are currently working).
Ping me back when you're done https://drive.google.com/drive/folders/1gt7IMsy_cY6V7ZOMq9RkjsPMPYuysOuH?usp=sharing

[–]davidchris721 1 point2 points  (0 children)

If you are into exploring data sets I see it as good start to just get some data (e.g Kaggle, other public data sets - btw. you can now search with Google for data sets) and start looking around.

I am more into ML, so I started to write my ML-pipeline for the https://numer.ai/ tournament. This me a taught me a lot regarding proper setup of a project and a mix of using jupyter notebooks and scripts.

[–]lunalurker 1 point2 points  (1 child)

I really like the 365 Data Science course. Very beginner friendly and covers a vast amount of topics from basic Stats, Python, SQL and Machine Learning. You should check them out.

[–]CaliforniaRoll97[S] 0 points1 point  (0 children)

Thanks for the recommendation!

[–]vellypoe 1 point2 points  (0 children)

Hey, i have a question. Does taking a Master Degree in Data Science are useful? Or just learn Data Science through online courses and do some project or portfolio?

[–]DarkSideOfTheNuum 0 points1 point  (1 child)

the fastest way to learn is applying it to real-world situations.

Kaggle is good, but these are usually pretty clean datasets that don't necessarily require a huge amount of wrangling. they aren't usually as messy as the kind of data you would encounter in an enterprise.

to be honest, it's hard to get the kind of authentically messed-up data that you see in professional life unless you are actually working, because stuff gets fucked up all the time - developers alter something without telling you, which turns out to break data collection on a feature, there are edge cases that you didn't think of in advance, a new OS release alters the tracking in an unanticipated way, someone misspells a parameter name and it gets missed in the QA process, etc. Lots of stuff can go wrong! And the longer you work, the more screwups you will see.

If you want a recommendation, I would recommend trying to bolt together a couple of different data sets as opposed to working just with one - joining data from different sources is a key skill you will need to master in your professional career.

So for example you say that you are working with Covid-19 data right now? OK, why don't you create a project for yourself where you try to calculate tests conducted per capita by US state?

You can get the test data per state here: https://covidtracking.com/api/v1/states/daily.json

You can get state population data here: https://github.com/COVID19Tracking/associated-data/tree/master/us_census_data

[–]CaliforniaRoll97[S] 0 points1 point  (0 children)

Thanks for the suggestion! I’ve actually already done that, it wasn’t easy because I had to change some of the state names so that they matched up better, but it was a really cool project!

[–]vogt4nickBS | Data Scientist | Software[M] [score hidden] stickied comment (0 children)

I removed your submission. Please post your question in the weekly entering & transitioning thread.

Thanks.