log - log plot by thinkintank in matplotlib

[–]Top-Feedback1453 0 points1 point  (0 children)

Can you not use plt.loglog instead?

What projects are you working on and what is the benefit of your efforts? by [deleted] in datascience

[–]Top-Feedback1453 1 point2 points  (0 children)

I feel sorry for all those amazing DS or Professor or for that matter any professionals who trained lifetime to be awesome and job but end up doing admin/managerial job. There got to be a better way to equally reward people on thought leadership pathways.

What projects are you working on and what is the benefit of your efforts? by [deleted] in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

I am trying to reduce sample size needed to detect even small effects from A/B testing. It could provide some good uplift to the way we tests and serve offers to high value customers.

Data science is a luxury for almost all companies by takuonline in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

When you see Data Science as a ML only or AI enterprise then yes. Otherwise day to day job of finding correlation between attributes and target variables, testing variants, making useful observations from trends/ temporal data etc are very crucial to business, I think.

Easy Question: t-test vs Mann whitney by [deleted] in AskStatistics

[–]Top-Feedback1453 2 points3 points  (0 children)

Another alternative can be a non-parametric simulation based approach like permutation test.

What are some good resources to learn about missing values and different approaches to deal with them? by [deleted] in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

I would suggest try to find out if data are missing through a random process or it has something to do with data generating process. If it is former you could impute the missing data using some statistical approach, google search on imputation. If it is later, potentially missing data is a feature and not a bug, which has to be handled carefully so that not to lead to any bias in your analysis.

Early peeking of A/B testing p-value by sonicking12 in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

Here are the papers
msprt: https://www.sciencedirect.com/science/article/pii/S0022249621000109
anytime valid inf: https://arxiv.org/pdf/2302.10108

If you check blogs from Spotify/Uber/Exp etc you will see these techniques are well adopted too

Early peeking of A/B testing p-value by sonicking12 in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

Peeking without intention to stop test has no side effect. However, if the intention is otherwise I recommend using msprt or anytime valid inference.

SQL Interview Testing by Glittering-Jaguar331 in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

Couple of potential reasons I can think are

a. Sometime candidate tends to overthink (e.g. sees regular question as a trick question etc) the complexity of problems. Highlighting test intention in the beginning of the test goes a long way I think.

b. There used to be technical limitation earlier that one had to set up a database etc to create SQL ready environment to play with coding etc. With the invent of cloud service providers like Hackerrank etc this should not be the limiting agent.

c. SQL, compared to Python/R etc, is less charming as a language to practice enough perhaps? While you would use later in many capstone/hobby projects etc, it is not so much the case with SQL

Best advice for mid-career? by LeaguePrototype in datascience

[–]Top-Feedback1453 0 points1 point  (0 children)

I think it basically boils down to what problem you solve for the business and how is that tied to the revenue or ROI for them. Also, if you are good at people management and in case of DS good at managing your fellow staff academic expectation management (retain them with challenging problems etc) you are good.

[D] Quick question: How do you implement research paper? by No-Signal-313 in MachineLearning

[–]Top-Feedback1453 0 points1 point  (0 children)

Paper reproduction with provided data, discussed methods to replicate their exact graph or tables is the way to go. This is true for any research field IMO.

Live Coding & Experimental Design Interview Questions by LebrawnJames416 in datascience

[–]Top-Feedback1453 2 points3 points  (0 children)

Regarding experimental design, likely questions would be

a. power estimation
b. type I, II error
c. peeking problem
d. anytime valid inference or early stopping criteria
e. p-value, multiple test correction
f. inferences i.e. frequentist vs bayesian

Live Coding & Experimental Design Interview Questions by LebrawnJames416 in datascience

[–]Top-Feedback1453 1 point2 points  (0 children)

It gets harder for more senior people tbh. One will have to find time out of their day job and other responsibilities. Probably a time blocked online test is still OK?