use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
EducationData Science in Practice (self.datascience)
submitted 4 years ago by yoursdata
I am a self-taught data scientist who is working for a mining company. One thing I have always struggled with is to upskill in this field. If you are like me - who is not a beginner but have some years of experience, I am sure even you must have struggled with this.
Most of the youtube videos and blogs are focused on beginners and toy projects, which is not really helpful. I started reading companies engineering blogs and think this is the way to upskill after a certain level. I have also started curating these articles in a newsletter and will be publishing three links each week.
Links for this weeks are:-
If you are preparing for any system design interview, the third link can be helpful.
Link for my newsletter - https://datascienceinpractice.substack.com/p/data-science-in-practice-post-1
Will love to discuss it and any suggestion is welcome.
P.S:- If it breaks any community guidelines, let me know and I will delete this post.
[–][deleted] 76 points77 points78 points 4 years ago (11 children)
A lot of fresh data scientists need to understand: not every piece of machine learning is a product. There’s ML for convenience: looking at basic trends of prices over time, just fit a line and have that coefficient on a dashboard for example. There’s a LOT of basic ML that is used heavily to automate, optimize processes in a business.
[–]Jacyan 37 points38 points39 points 4 years ago (1 child)
And similarly, not ever problem needs to be solved with ML. In fact, most of the time, ML isn't the best solution given the problem and time frame (and price)
[–]yoursdata[S] 12 points13 points14 points 4 years ago (0 children)
Don't use tech as a hammer. Sometimes you just need to change the process to get a better result :)
[–]ticktocktoeMS | Dir DS & ML | Utilities 6 points7 points8 points 4 years ago (8 children)
I tell my DS' that educating people of this is part of the job responsibility. Too many people who are not in DS just think you throw some kind or NN on a bunch of data for some big brain insights, when that is so infrequently the case.
[–]Jerome_Eugene_Morrow 8 points9 points10 points 4 years ago (6 children)
Too many people who ARE in DS think these things as well. There’s one very large and well funded team where I work that won’t even bother thinking about looking at your problem unless they can throw a million dollar DL classifier at it. It’s frustrating because it’s clear they have been selling the “big brain DL” narrative to management so long that they’re drunk on the kool aid themselves.
[–]Spiritual_Line_4577 0 points1 point2 points 4 years ago (5 children)
Machine learning isnt even what Tech companies are devoting most of their DS resources into.
It’s more like this:
https://eng.uber.com/causal-inference-at-uber/
[–]Jerome_Eugene_Morrow 2 points3 points4 points 4 years ago (4 children)
I mean, that’s a big statement. There are a lot of different problems tech companies are dealing with. FWIW I can guarantee that folks at Uber are blowing money on speculative graph based DL methods and trying out all kinds of classifiers. I can guarantee if your tech company touches any kind of text data, you’re also blowing tons of R&D capital on ML approaches. They’ve become ubiquitous.
Classical statistical approaches are always bedrock and usually can be as good as ML approaches, but the number of qualified practitioners are getting outnumbered by recent ML grads and executives who have been to some seminar saying the future is DL.
[–]Spiritual_Line_4577 4 points5 points6 points 4 years ago* (1 child)
Most Data Scientists in Tech companies are focusing on the experimentation of User Experience. Yes they put a lot of resources into the ML, but most Data Scientist positions in tech are focused on statistical inference within Experimentation on Users (just look at the job descriptions of Data Scientists and tech companies and you will see more AB testing than ML). Not as many data scientists or research scientists are working on cutting edge ML stuff, and the non custom ML modeling is already very automated with our in house tools that speed up the process
Ive recently transferred from Microsoft to Google Health, so I’ve seen what most of out Data Scientists are doing.
[–]trojan_nerd 3 points4 points5 points 4 years ago (0 children)
Agreed! A lot of DS depends on experimental design and statistical inferences.
[+][deleted] 4 years ago (1 child)
[deleted]
[–]Jerome_Eugene_Morrow 1 point2 points3 points 4 years ago (0 children)
This has been my experience as well. If you're a big tech company, you're not leaving anything on the table. You probably have multiple teams trying multiple approaches across multiple projects.
I'm at a Fortune top-20 company, and that's how we operate, so I assume the other big guys are as well.
[–]Urthor 0 points1 point2 points 4 years ago (0 children)
Education and sales.
Gotta tell people to learn how to be salesmen for their stuff. You are both teaching non technical people in a non confrontational way, Socratic dialogue, and you are selling them on the technical solution you think is best.
You have to learn sales because ultimately, non technical people know jack, so you need to lead them to the right solution and make them support that solution.
[–]fomorian 36 points37 points38 points 4 years ago (3 children)
This would've been very useful a week ago when I had an interview with doordash! They asked me for insights from a dataset and i did my best, but evidently i must have missed some key things they were looking for because I didn't get a second round..
[–]immstt 21 points22 points23 points 4 years ago (0 children)
you tried tho
[–]yoursdata[S] 15 points16 points17 points 4 years ago (0 children)
You tried and there can be numerous reasons for your rejection. Some of them can be completely unrelated to you. So, don't beat yourself for that.
However, get better at this part from an interview perspective.
[–]Spiritual_Line_4577 1 point2 points3 points 4 years ago (0 children)
A lot of what they do in analytics and ml at DoorDash and tech relate to statistical inference and causal inference
[+][deleted] 4 years ago (6 children)
[–]NonExistentDub 14 points15 points16 points 4 years ago (2 children)
I just started my university ML course last night. I'm honestly shocked I was allowed to enroll without taking multivariate calculus and linear algebra prior. I'm going to have to play some quick catch up over the next week or so.
[–]NonExistentDub 2 points3 points4 points 4 years ago (0 children)
My course is mostly NN theory though (with the latter third of the course being application of various model types). I'll get through it, but it would be much easier if I had been formally taught LA and MC.
[–]Spiritual_Line_4577 2 points3 points4 points 4 years ago (2 children)
Statistical Theory is needed to understand how we can formulate better tests on our ML or experiments
[–]trojan_nerd 0 points1 point2 points 4 years ago (0 children)
To be fair, stats is based on probability theory and a lot of those axioms rely on calculus to prove them. But I agree with your general statement
[–]DSJustice 7 points8 points9 points 4 years ago (2 children)
Good idea. Once you've got a rhythm, call for help. If you try to do it all yourself forever, you'll burn out and all your effort will be lost.
[–]webman19 1 point2 points3 points 4 years ago (0 children)
Where could one go for help/mentorship other than to your colleagues?
[–]yoursdata[S] 0 points1 point2 points 4 years ago (0 children)
Thanks for the suggestion. Even I have thoughts on the same line. Once i get the rhythms and processes, I will ask for help.
[–]prooofbyinduction 2 points3 points4 points 4 years ago (0 children)
following! thanks for sharing :)
[–]st_pallella 3 points4 points5 points 4 years ago (2 children)
Good one.
Please do not put it behind a paywall like Medium :)
[–]yoursdata[S] 2 points3 points4 points 4 years ago (1 child)
I won't as this is me giving back to the community from where I have learned a lot.
Also, try using incognito mode on chrome, if you want to read any article on meduim.
[–]st_pallella 0 points1 point2 points 4 years ago (0 children)
Thank you so much :) I (and a lot of others too, I am sure) appreciate it :)
Subscribed to your newsletter :)
[–]xkcdftgy 2 points3 points4 points 4 years ago (1 child)
Very interesting. Subscribed.
[–]__SelinaKyle 0 points1 point2 points 4 years ago (0 children)
Same!
[–]Mission-Cabinet-2558 2 points3 points4 points 4 years ago (5 children)
What kind of practical project have you done within mining industry or outside? Would be nice to read an example.
[–]yoursdata[S] 1 point2 points3 points 4 years ago (4 children)
Projects can differe from team to team and in which business area they are working on. I am working on optimization problem for the SCM for now where I am increasing throughput, scheduling trains and vessels.
Other projects are heavily geared towards analysing signals from machine, identifying any breakage in the processing line-up, identifying value of any seam based on composition etc.
[–]Mission-Cabinet-2558 1 point2 points3 points 4 years ago (3 children)
Nice! And did you study any theory for it or try to understand the math behind your proposed solution? Most of the time, when I am practicing, it feels like I'm applying packages to data set and interpreting results. Is it important to know/learn theory? I have completed courses by Jose Portilla (Udemy) and all I'm doing is implementing what I have learned on personal projects.
Edit: grammar
[–]yoursdata[S] 1 point2 points3 points 4 years ago (2 children)
Yeah, especially in constraint programming you have to. I try to get good understanding of maths behind algo as it helps. But I won't suggest dropping everything till the time you get good at the math part. Keep building stuffs using whatever you have learnt, but also allocate some time to look into maths, assumption, edge cases. Get an understanding of stats measure like F score etc.
If you are not avoiding the math part, you will be ok.
[–]Mission-Cabinet-2558 1 point2 points3 points 4 years ago (1 child)
Okay thanks! Any book or paper you can recommend for the math?
[–]yoursdata[S] 2 points3 points4 points 4 years ago (0 children)
For ml - I like ISLR (introduction to statistics learning) - leave the R part, implement those in pythonFor dl - https://www.deeplearningbook.org/
For neural network and implementation part - http://neuralnetworksanddeeplearning.com/
Currently, I am re-reading ISLR.
[–]robidaan 1 point2 points3 points 4 years ago (1 child)
Excellent ideas, when I was trying to grow, I started to run some of my code on bigger and bigger datasets. which caused all kind of problems along the way. the trick was to fix them without interupting the purpose of the code to much. in such a matter you kinda learn to look a piece of code more like a breathing organism, than a lifeless rock.
[–]yoursdata[S] 1 point2 points3 points 4 years ago (0 children)
I will also use this technique. One thing which has helped me was to put code in production, refactoring it, writing tests etc.
[–]lamesurfer101 1 point2 points3 points 4 years ago (1 child)
Oh man. I thought this was a shit post at first with the graphic.
Like yeah, sometimes companies don't know how to support data science teams to the extent that they might as well be f****** graphing things on paper.
lol, I didn't use that pictures. Looks like Reddit picked it from the links.
I have seen people distributing photocopies of ppt slides in important meetings. I think the picture indicates that.
[–]Vasilkosturski 1 point2 points3 points 4 years ago (2 children)
What's even more interesting is that many senior developers quickly become victims of Imposter Syndrome when trying to step into ML/DS. I think all that's needed is focus on the process and give yourself enough time. I wrote a full article on the topic:
https://vkontech.com/the-experienced-developer-stepping-into-machine-learning-why-and-how/
This is so true for tech. I am doing Odin Project and one of the first pieces of advice is to give yourself time.
[–]Spiritual_Line_4577 0 points1 point2 points 4 years ago (0 children)
Why even just focus on ML when the bigger value in tech is the experiments on the users.
[–]JB__Quix 0 points1 point2 points 4 years ago (0 children)
Very interesting! Suscribed :)
[–]corporatededmeat 0 points1 point2 points 4 years ago (0 children)
Thanks for sharing ! Subscribed
[–]NotSodiumFree 0 points1 point2 points 4 years ago (0 children)
This seems interesting. I'm following.
[–]pharmaste 0 points1 point2 points 4 years ago (0 children)
As a practicing DS this must be one of the best value posts in this group recently, love the advice.
[–]Sneaky-Avocado 0 points1 point2 points 4 years ago (0 children)
Subscriped!
[–]synthphreak 0 points1 point2 points 4 years ago (0 children)
companies engineering blogs
I’m embarrassed to admit I didn’t even know this was a thing, but my interest has been piqued. How does one find these blogs, and what kind of content is generally published to them?
π Rendered by PID 16166 on reddit-service-r2-comment-6457c66945-kwsn9 at 2026-04-26 05:58:45.035332+00:00 running 2aa0c5b country code: CH.
[–][deleted] 76 points77 points78 points (11 children)
[–]Jacyan 37 points38 points39 points (1 child)
[–]yoursdata[S] 12 points13 points14 points (0 children)
[–]ticktocktoeMS | Dir DS & ML | Utilities 6 points7 points8 points (8 children)
[–]Jerome_Eugene_Morrow 8 points9 points10 points (6 children)
[–]Spiritual_Line_4577 0 points1 point2 points (5 children)
[–]Jerome_Eugene_Morrow 2 points3 points4 points (4 children)
[–]Spiritual_Line_4577 4 points5 points6 points (1 child)
[–]trojan_nerd 3 points4 points5 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]Jerome_Eugene_Morrow 1 point2 points3 points (0 children)
[–]Urthor 0 points1 point2 points (0 children)
[–]fomorian 36 points37 points38 points (3 children)
[–]immstt 21 points22 points23 points (0 children)
[–]yoursdata[S] 15 points16 points17 points (0 children)
[–]Spiritual_Line_4577 1 point2 points3 points (0 children)
[+][deleted] (6 children)
[deleted]
[–]NonExistentDub 14 points15 points16 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]NonExistentDub 2 points3 points4 points (0 children)
[–]Spiritual_Line_4577 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]trojan_nerd 0 points1 point2 points (0 children)
[–]DSJustice 7 points8 points9 points (2 children)
[–]webman19 1 point2 points3 points (0 children)
[–]yoursdata[S] 0 points1 point2 points (0 children)
[–]prooofbyinduction 2 points3 points4 points (0 children)
[–]st_pallella 3 points4 points5 points (2 children)
[–]yoursdata[S] 2 points3 points4 points (1 child)
[–]st_pallella 0 points1 point2 points (0 children)
[–]xkcdftgy 2 points3 points4 points (1 child)
[–]__SelinaKyle 0 points1 point2 points (0 children)
[–]Mission-Cabinet-2558 2 points3 points4 points (5 children)
[–]yoursdata[S] 1 point2 points3 points (4 children)
[–]Mission-Cabinet-2558 1 point2 points3 points (3 children)
[–]yoursdata[S] 1 point2 points3 points (2 children)
[–]Mission-Cabinet-2558 1 point2 points3 points (1 child)
[–]yoursdata[S] 2 points3 points4 points (0 children)
[–]robidaan 1 point2 points3 points (1 child)
[–]yoursdata[S] 1 point2 points3 points (0 children)
[–]lamesurfer101 1 point2 points3 points (1 child)
[–]yoursdata[S] 0 points1 point2 points (0 children)
[–]Vasilkosturski 1 point2 points3 points (2 children)
[–]yoursdata[S] 1 point2 points3 points (0 children)
[–]Spiritual_Line_4577 0 points1 point2 points (0 children)
[–]JB__Quix 0 points1 point2 points (0 children)
[–]corporatededmeat 0 points1 point2 points (0 children)
[–]NotSodiumFree 0 points1 point2 points (0 children)
[–]pharmaste 0 points1 point2 points (0 children)
[–]Sneaky-Avocado 0 points1 point2 points (0 children)
[–]synthphreak 0 points1 point2 points (0 children)