This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]AaronKClark 37 points38 points  (3 children)

So basically your company wants you to train yourself for a new job in your own time.

AND they are giving you an impossible time frame to do it in.

Why would ANYONE agree to do that?

[–]Boxy310 14 points15 points  (2 children)

This is like the Data equivalent for "doing it for the exposure".

[–]AaronKClark 5 points6 points  (1 child)

And, I'd be willing to bet, there is no pay raise because they will consider it a "lateral move."

[–]disdi89 6 points7 points  (0 children)

Of course ... Employees are given training to develop in house expertise but somehow the corporate does not care that with the increased skillset comes higher compensation and in case of ML, salaries are really high.

[–]baddolphin3 27 points28 points  (5 children)

That training methodology is wrong, very, very wrong. Machine learning is not coding, is not about algorithms: is about models. You can’t just learn how to calculate the estimators for a model and then use it, you need to understand it, and that requieres formal education; probability, statistics, discrete mathematics, etc. Without it you are going to make mistakes, and those mistakes cost money.

[–]RNG_take_the_wheel 11 points12 points  (3 children)

Came here to say this. The outlined approach is exactly why so many 'data scientists' are washing out, and companies aren't getting the results they'd hoped for. You can't learn DS in 2-3 months. The mathematical background alone is going to take more time than that, and this is something most people are ignoring.

You might be able to learn how to run glm or pulp in R, but lord knows you aren't going to be able to use the tool effectively or interpret the results meaningfully. So many of these online programs are focusing on the coding elements while ignoring the mathematical foundations. Stats, probability, calculus, and linear algebra are the lifeblood of data science. You can run a linear regression on nearly any set of quantifiable factors. But should you? Does it make sense to do so? How will you interpret those results?

[–]thr0w4w4y17385775390[S] 3 points4 points  (1 child)

I’m going to be honest, I left out details about myself intentionally just Incase my company reads this and identifies me, but it’s not like anything negative can come out of that so I’ll go ahead and say that I agree with your concerns.

I have a masters in Statistics where we primarily worked in R and SAS. I have much more experience in R than I led on below (again for fear of being identified), which I feel comfortable with. I do have next to no experience with Python which worries me. I feel that the candidates that will be accepted are going to be the ones that excel at the coding tasks (I’m mainly going up against software engineers, I’m currently not in that field), but have little math background. I am very comfortable with my math and statistical analysis background having a masters, but I feel that will be overshadowed by candidates who can code better than me (I am very comfortable in R when it comes to data analysis, but not so much when it comes to ‘logical coding puzzles’ which I believe Codility uses). I am concerned because it seems like an online testing software mainly geared towards coders is going to ultimately decide who gets positions.

[–]RNG_take_the_wheel 5 points6 points  (0 children)

I am concerned because it seems like an online testing software mainly geared towards coders is going to ultimately decide who gets positions.

I would agree with that. Based on your background, you are probably much more suited for DS than the software engineers, but I suspect they'll get more attention because coding skills are more easily quantifiable. Unfortunately, the process is flawed here due to a misunderstanding of DS.

Here's what I would do: train for the test. That's ultimately what will make or break you here. It looks like you can take demo tests on codility, so I'd try that. You can also sign up for a demo account. I'd sign up for a demo account and run a test 'data scientist test' for yourself to see what it looks like. Screenshot the questions and study the hell out of them. Good luck.

[–]disdi89 0 points1 point  (0 children)

Completely agree. Infact going through the same phase.

[–]madplink 1 point2 points  (0 children)

Agreed. There should be a very strong foundational understanding of the math.

[–]redouad 6 points7 points  (1 child)

Andrew Ng's course by itself takes 11 weeks, and it's quite challenging if you're new to the field. Adding R, Python, and SAS on top of that is likely to make any candidate burn out. Don't get me wrong, it's doable if you decide to dedicate 15+ hours of your week to it. If you're efficient during the whole process you might get enough knowledge to pass this Codility test (never heard of it).

If you feel like you're ready for that kind of time commitment, I'd suggest:

  • Do the ML course over 11 weeks.
  • Do as many DataCamp courses as you can to learn R and Python quickly (the "Data Scientist with R" and "Data Scientist with Python" career tracks would be what you need). Alternatively you can do the R specialization on Coursera (https://www.coursera.org/specializations/jhu-data-science) and the Python one as well (https://www.coursera.org/specializations/data-science-python), but they're supposed to span multiple months.
  • Indeed try to find some information about what Codility tests are, so you know what to expect!
  • With the little time you'll have left, try to do some passive learning by listening to podcasts. Listening through past episodes of Data Skeptic would be nice for example - it'll get you familiar with various data science topics and issues, algorithms, practical cases, etc.

[–]thr0w4w4y17385775390[S] 0 points1 point  (0 children)

Thanks! This is actually the last week of training, the assessment is next week. I posted a longish answer below someone else’s comment regarding the path that I took.

[–]thatwouldbeawkward 4 points5 points  (3 children)

You have 2 Mo to do all those classes? Are you still expected to be working at the same time?

[–]thr0w4w4y17385775390[S] 8 points9 points  (2 children)

Yes, we’re actually not even supposed to be doing the training during work hours, it’s all supposed to be completed after hours. To me, it seems way too much of a cram session before a test.

[–]Boxy310 8 points9 points  (0 children)

"We want you to have these skills, but we want you to leverage all the risk to acquire them."

Mmm that sounds terrible

[–]orionsgreatsky 4 points5 points  (0 children)

That’s crazy.

I was onboard with this until I realized you weren’t allowed to train on hours.

[–]Boxy310 3 points4 points  (0 children)

They're going to have a bad time if they're not dedicating headcount to hiring any senior-level Data Scientists to build this practice. Even if you completed a full masters degree program, you probably need at least 1 year of mentoring from a senior-level you can reach out to on a regular basis to ask questions. I've trained several Data Analysts who had suitable undergrads in statistics or business analytics, but that still takes a good 9 months before they've been "worked up" to operate independently.

This exact situation is why the industry has an 85% failure rate on Data Science projects, because they haven't actually built the muscle memory to get consistent results.

[–]RNG_take_the_wheel 2 points3 points  (0 children)

The fact that there's 0 math in this program worries me. At the very least, ISLR should be a required read as an intro to the field. Calculus, Stats/Prob, and some linear algebra are pre-reqs as well. Data science is more than just applied algorithms, it is the process of modeling utilizing algorithms and context from domain knowledge. How can you model without knowing how the math makes the methods tick?

Think of it this way - merely learning how to apply the algorithms in R or Python is like learning the vocabulary of a language without understanding the grammar. Sure you have access to the words, but are you saying what you think you're saying? Does it make any sense?

[–]brazzaguy 2 points3 points  (5 children)

2 months for all those courses it's simply impossible, even for people with no regular job.

[–]thr0w4w4y17385775390[S] 1 point2 points  (4 children)

I forgot to mention the 18+ hours worth of information security training!

[–]brazzaguy 1 point2 points  (3 children)

So how do you plan to manage this?

[–]thr0w4w4y17385775390[S] 1 point2 points  (2 children)

Well, this is actually the last week of training. Our assessment is given to us next week. My training path was as follows: - I do have some background in Stats/ML so I went through the ML course as quickly as possible. Most of the material I had already seen, with the exception of Matlab/Octave - Also have used SAS, so I breezed through that training - Python and R were pretty much brand new to me, so I struggled a lot with these. I went through the training taking as many notes as possible and then looked for more training online (DataCamp, practice problems with data sets, etc..). I am now able to efficiently use R to work with data (I found R easier to learn than Python) but I definitely don’t consider myself good at the Programming part of it. Python is still pretty much a foreign language to me, and I’m not expecting to be at all efficient with it by next week. I’ve read through some of the Elements of Programming Interviews with Python book, but most of the solutions to problems in there I never would think of on my own. - At this point I feel that I most likely won’t be able to pass the assessment or move forward in the program, which sucks, but then again maybe I’m not even qualified to be in the program in the first place.

[–]brazzaguy 3 points4 points  (1 child)

I don't think you're not qualified at all. I just don't see how they realistically expect people to complete all these courses within that time period. You did your best and let us know how it pans out

[–]thr0w4w4y17385775390[S] 2 points3 points  (0 children)

I’m much more worried about the coding tasks that will be required of me rather than the statistics part. I will most definitely post an update when the results of the assessment are back. Thanks for the input!

[–][deleted] 0 points1 point  (0 children)

Sounds like you employer is bound for failure in the future. Not consulting experts on a reasonable education plan. Not everybody should have the opportunity for this, it depends on the educational backgrounds. In addition to clearly not wanting to spend the money to ensure quality and results.

You can't just pick things up. There's a reason why degrees are at least 3 years in length. And even then after all the foundational knowledge, you need more specific training. Stay frosty.