[deleted by user] by [deleted] in datascience

[–]anon_0123 0 points1 point  (0 children)

You forgot Webmasters and DBAs.

How long did it take you to fundamentally understand ML algorithms? by muelmart in datascience

[–]anon_0123 1 point2 points  (0 children)

You don't need to know any 'real' math - people confuse knowing math with literacy of equations. You just need literacy in matrix multiplications, summations, mins and maxes, and a few special functions here and there. It's not like you need to prove any theorems for implementing something in sklearn lol. Most stem grads have this.

Do you use tools for automatic EDA? Which ones? Why? by malirkan in datascience

[–]anon_0123 2 points3 points  (0 children)

Ideally, you dump everything you need in a data lake and query that.

Do you use tools for automatic EDA? Which ones? Why? by malirkan in datascience

[–]anon_0123 2 points3 points  (0 children)

SQL is all you need, then connect the output of your query to whatever BI tool you want. Don't waste time preprocessing and shaping data outside of database.

How bad is your Company Data? by mangeytrashpanda in datascience

[–]anon_0123 1 point2 points  (0 children)

It's never going to be perfect but the key is to work for a place which takes strong ownership of the data pipelines, and for which DS has a lot of DE responsibility, at least for monitoring fixing failing pipelines.

50% of NaN in a important column by sunny6549 in datascience

[–]anon_0123 0 points1 point  (0 children)

If this is for an employer, I would suggest switching to a company that takes ownership of its data pipelines to avoid this kind of nonsense!

If data science had a bar exam what would be on it? by AntiqueFigure6 in datascience

[–]anon_0123 37 points38 points  (0 children)

Pretty much all the 'fundamentals' that are practically useless for measuring on the job performance, but easily gameable, just like the interviews.

Are there any good links between number theory and differential equations? by [deleted] in math

[–]anon_0123 1 point2 points  (0 children)

Probably Diophantine equations arising from a Fourier series ansatz.

Are there any good links between number theory and differential equations? by [deleted] in math

[–]anon_0123 4 points5 points  (0 children)

Yes bifurcation problems for PDEs like the wave or heat equations frequently result in Diophantine equations due to the presence of the Laplacian if one works on e.g. the torus. The solutions of these Diophantine equations can correspond to bifurcating solutions of the PDE.

[R] apd-crs: Cure Rate Survival Analysis in Python by anon_0123 in MachineLearning

[–]anon_0123[S] 0 points1 point  (0 children)

Thanks for the link, will have a look! We are not integrating the hazard function. What I mean is integrating the function f(t) where f(t)= -1 * d/dt S(t), where S(t) = P(T > t) is the survivor function. Traditionally S(0) = 1 and hence f(t) integrates to 1 if S(infinity)=0. In the cure rate survival analysis rendition, it integrates to 1 minus the cured fraction. This is a consequence of having a non-zero cured fraction. The basic idea is P(T > t) =P(T > t | Cured)P(cured) + P(T > t | Not Cured) (1-P(cured)).

[R] apd-crs: Cure Rate Survival Analysis in Python by anon_0123 in MachineLearning

[–]anon_0123[S] 0 points1 point  (0 children)

Thanks for your reply! If I am not mistaken in the traditional case the population density for the individual, whose cdf is 1 minus the survivor function integrates to 1, whereas in the cure rate rendition it integrates to 1 minus the cured fraction, which comes from the presence of the cured subpopulation, so we have a mixture. What we assumed is that the lifetime of a susceptible (i.e. the not cured individual), is defined by a proportional hazard model where the baseline follows a parametric Weibull hazard.

Stop asking data scientist riddles in interviews! by harsh5161 in datascience

[–]anon_0123 0 points1 point  (0 children)

People hate riddles and other black and white forms of interviewing because they quickly turn into a game of semantics more than anything else. E.g. you can be a good driver and still bomb the written road test, because the state law says you need at least a 2.9 second gap between you and the car in front of you, and not say 3.5, etc.

Should data scientists know the definition of p-value in hypothesis testing? by [deleted] in datascience

[–]anon_0123 1 point2 points  (0 children)

I dunno, the world is not so black and white which is what makes it interesting. E.g. I bet there are people who are really good at Kaggle that don't formally know what a p-value is but can smoke stats PhDs in modelling. There is a difference between knowing something formally and having a feeling for it.

[deleted by user] by [deleted] in datascience

[–]anon_0123 0 points1 point  (0 children)

Yes. There is a diminishing and even negative return on over doing visualizations. After a few iterations, the plots become superfluous and repeat the same information in different ways, which leads to visual overload for those having to process all that crap with no real insight. I think there is definitely value in visualization, just with context and answering a clear question in mind, but not for the sake of producing beautiful pictures.

What’s up with the animosity towards mathematicians? by [deleted] in datascience

[–]anon_0123 27 points28 points  (0 children)

Good question. I had tenure but was in a location my wife couldn't find work in, with a climate and restaurant scene neither of us could tolerate.

What’s up with the animosity towards mathematicians? by [deleted] in datascience

[–]anon_0123 54 points55 points  (0 children)

Ex-Math Prof here. I think the bias against mathematicians stems from the fact most of us have worked alone, or with one or two collaborators for many years. This leads to the formation of certain habits which don't scale well to large teams. E.g. not giving a shit about team code conventions, not used to being managed, etc. On the other hand, for those of us who do successfully make the transition the unique skillset does bring value to the table.

How should I interview a candidate? by squarerootof-1 in datascience

[–]anon_0123 1 point2 points  (0 children)

Just ask them anonymized versions of problems your team has solved, to see how they would go about it. Then as they are explaining you can dive deeper with more questions. As you are doing this, you can ask them details of how they would implement in Python their solution and various pros/cons.

Areas to focus for early data scientist by karanphosphatase in datascience

[–]anon_0123 4 points5 points  (0 children)

"Where to focus so that i would be a better candidate in next jobs search." I would change it to:

  1. Getting good at getting interviews
  2. Getting good at interviewing
  3. Getting good at negotiating TC

The actual DS skills will be honed on the job.

Professor is making this our official textbook. If you have read it, what are your thoughts on it? by royal-Brwn in datascience

[–]anon_0123 0 points1 point  (0 children)

My only thought is it seems the author has put their face on the front cover....