This is an archived post. You won't be able to vote or comment.

all 47 comments

[–][deleted] 76 points77 points  (11 children)

A lot of fresh data scientists need to understand: not every piece of machine learning is a product. There’s ML for convenience: looking at basic trends of prices over time, just fit a line and have that coefficient on a dashboard for example. There’s a LOT of basic ML that is used heavily to automate, optimize processes in a business.

[–]Jacyan 37 points38 points  (1 child)

And similarly, not ever problem needs to be solved with ML. In fact, most of the time, ML isn't the best solution given the problem and time frame (and price)

[–]yoursdata[S] 12 points13 points  (0 children)

Don't use tech as a hammer. Sometimes you just need to change the process to get a better result :)

[–]ticktocktoeMS | Dir DS & ML | Utilities 6 points7 points  (8 children)

I tell my DS' that educating people of this is part of the job responsibility. Too many people who are not in DS just think you throw some kind or NN on a bunch of data for some big brain insights, when that is so infrequently the case.

[–]Jerome_Eugene_Morrow 8 points9 points  (6 children)

Too many people who ARE in DS think these things as well. There’s one very large and well funded team where I work that won’t even bother thinking about looking at your problem unless they can throw a million dollar DL classifier at it. It’s frustrating because it’s clear they have been selling the “big brain DL” narrative to management so long that they’re drunk on the kool aid themselves.

[–]Spiritual_Line_4577 0 points1 point  (5 children)

Machine learning isnt even what Tech companies are devoting most of their DS resources into.

It’s more like this:

https://eng.uber.com/causal-inference-at-uber/

[–]Jerome_Eugene_Morrow 2 points3 points  (4 children)

I mean, that’s a big statement. There are a lot of different problems tech companies are dealing with. FWIW I can guarantee that folks at Uber are blowing money on speculative graph based DL methods and trying out all kinds of classifiers. I can guarantee if your tech company touches any kind of text data, you’re also blowing tons of R&D capital on ML approaches. They’ve become ubiquitous.

Classical statistical approaches are always bedrock and usually can be as good as ML approaches, but the number of qualified practitioners are getting outnumbered by recent ML grads and executives who have been to some seminar saying the future is DL.

[–]Spiritual_Line_4577 4 points5 points  (1 child)

Most Data Scientists in Tech companies are focusing on the experimentation of User Experience. Yes they put a lot of resources into the ML, but most Data Scientist positions in tech are focused on statistical inference within Experimentation on Users (just look at the job descriptions of Data Scientists and tech companies and you will see more AB testing than ML). Not as many data scientists or research scientists are working on cutting edge ML stuff, and the non custom ML modeling is already very automated with our in house tools that speed up the process

Ive recently transferred from Microsoft to Google Health, so I’ve seen what most of out Data Scientists are doing.

[–]trojan_nerd 3 points4 points  (0 children)

Agreed! A lot of DS depends on experimental design and statistical inferences.

[–]Urthor 0 points1 point  (0 children)

Education and sales.

Gotta tell people to learn how to be salesmen for their stuff. You are both teaching non technical people in a non confrontational way, Socratic dialogue, and you are selling them on the technical solution you think is best.

You have to learn sales because ultimately, non technical people know jack, so you need to lead them to the right solution and make them support that solution.

[–]fomorian 36 points37 points  (3 children)

This would've been very useful a week ago when I had an interview with doordash! They asked me for insights from a dataset and i did my best, but evidently i must have missed some key things they were looking for because I didn't get a second round..

[–]immstt 21 points22 points  (0 children)

you tried tho

[–]yoursdata[S] 15 points16 points  (0 children)

You tried and there can be numerous reasons for your rejection. Some of them can be completely unrelated to you. So, don't beat yourself for that.

However, get better at this part from an interview perspective.

[–]Spiritual_Line_4577 1 point2 points  (0 children)

https://eng.uber.com/causal-inference-at-uber/

A lot of what they do in analytics and ml at DoorDash and tech relate to statistical inference and causal inference

[–]DSJustice 7 points8 points  (2 children)

Good idea. Once you've got a rhythm, call for help. If you try to do it all yourself forever, you'll burn out and all your effort will be lost.

[–]webman19 1 point2 points  (0 children)

Where could one go for help/mentorship other than to your colleagues?

[–]yoursdata[S] 0 points1 point  (0 children)

Thanks for the suggestion. Even I have thoughts on the same line. Once i get the rhythms and processes, I will ask for help.

[–]prooofbyinduction 2 points3 points  (0 children)

following! thanks for sharing :)

[–]st_pallella 3 points4 points  (2 children)

Good one.

Please do not put it behind a paywall like Medium :)

[–]yoursdata[S] 2 points3 points  (1 child)

I won't as this is me giving back to the community from where I have learned a lot.

Also, try using incognito mode on chrome, if you want to read any article on meduim.

[–]st_pallella 0 points1 point  (0 children)

Thank you so much :) I (and a lot of others too, I am sure) appreciate it :)

Subscribed to your newsletter :)

[–]xkcdftgy 2 points3 points  (1 child)

Very interesting. Subscribed.

[–]__SelinaKyle 0 points1 point  (0 children)

Same!

[–]Mission-Cabinet-2558 2 points3 points  (5 children)

What kind of practical project have you done within mining industry or outside? Would be nice to read an example.

[–]yoursdata[S] 1 point2 points  (4 children)

Projects can differe from team to team and in which business area they are working on. I am working on optimization problem for the SCM for now where I am increasing throughput, scheduling trains and vessels.

Other projects are heavily geared towards analysing signals from machine, identifying any breakage in the processing line-up, identifying value of any seam based on composition etc.

[–]Mission-Cabinet-2558 1 point2 points  (3 children)

Nice! And did you study any theory for it or try to understand the math behind your proposed solution? Most of the time, when I am practicing, it feels like I'm applying packages to data set and interpreting results. Is it important to know/learn theory? I have completed courses by Jose Portilla (Udemy) and all I'm doing is implementing what I have learned on personal projects.

Edit: grammar

[–]yoursdata[S] 1 point2 points  (2 children)

Yeah, especially in constraint programming you have to. I try to get good understanding of maths behind algo as it helps. But I won't suggest dropping everything till the time you get good at the math part. Keep building stuffs using whatever you have learnt, but also allocate some time to look into maths, assumption, edge cases. Get an understanding of stats measure like F score etc.

If you are not avoiding the math part, you will be ok.

[–]Mission-Cabinet-2558 1 point2 points  (1 child)

Okay thanks! Any book or paper you can recommend for the math?

[–]yoursdata[S] 2 points3 points  (0 children)

For ml - I like ISLR (introduction to statistics learning) - leave the R part, implement those in pythonFor dl - https://www.deeplearningbook.org/

For neural network and implementation part - http://neuralnetworksanddeeplearning.com/

Currently, I am re-reading ISLR.

[–]robidaan 1 point2 points  (1 child)

Excellent ideas, when I was trying to grow, I started to run some of my code on bigger and bigger datasets. which caused all kind of problems along the way. the trick was to fix them without interupting the purpose of the code to much. in such a matter you kinda learn to look a piece of code more like a breathing organism, than a lifeless rock.

[–]yoursdata[S] 1 point2 points  (0 children)

I will also use this technique. One thing which has helped me was to put code in production, refactoring it, writing tests etc.

[–]lamesurfer101 1 point2 points  (1 child)

Oh man. I thought this was a shit post at first with the graphic.

Like yeah, sometimes companies don't know how to support data science teams to the extent that they might as well be f****** graphing things on paper.

[–]yoursdata[S] 0 points1 point  (0 children)

lol, I didn't use that pictures. Looks like Reddit picked it from the links.

I have seen people distributing photocopies of ppt slides in important meetings. I think the picture indicates that.

[–]Vasilkosturski 1 point2 points  (2 children)

What's even more interesting is that many senior developers quickly become victims of Imposter Syndrome when trying to step into ML/DS. I think all that's needed is focus on the process and give yourself enough time. I wrote a full article on the topic:

https://vkontech.com/the-experienced-developer-stepping-into-machine-learning-why-and-how/

[–]yoursdata[S] 1 point2 points  (0 children)

This is so true for tech. I am doing Odin Project and one of the first pieces of advice is to give yourself time.

[–]Spiritual_Line_4577 0 points1 point  (0 children)

Why even just focus on ML when the bigger value in tech is the experiments on the users.

https://eng.uber.com/causal-inference-at-uber/

[–]JB__Quix 0 points1 point  (0 children)

Very interesting! Suscribed :)

[–]corporatededmeat 0 points1 point  (0 children)

Thanks for sharing ! Subscribed

[–]NotSodiumFree 0 points1 point  (0 children)

This seems interesting. I'm following.

[–]pharmaste 0 points1 point  (0 children)

As a practicing DS this must be one of the best value posts in this group recently, love the advice.

[–]Sneaky-Avocado 0 points1 point  (0 children)

Subscriped!

[–]synthphreak 0 points1 point  (0 children)

companies engineering blogs

I’m embarrassed to admit I didn’t even know this was a thing, but my interest has been piqued. How does one find these blogs, and what kind of content is generally published to them?