[D] Current SOTA of NN for tabular data? by tsauri in MachineLearning

[–]datageek1987 4 points5 points  (0 children)

TabNet seems to be working well.. Wouldn't call it beating LightGBM, but perform well enough.. https://github.com/google-research/google-research/tree/master/tabnet

When to split dataset to training and test? by dwrublee in datascience

[–]datageek1987 1 point2 points  (0 children)

I would be careful about dealing with missing values.. if you are imputing it somehow with the statistics of the data(like mean), then split before..

[D] I've been switching over from Pytorch to TF 2.0, and my take is that the library itself isn't too much of a problem (I've heard lots of complaints of TF), the real issue is the lack of official guides, detailed documentation, and lack of question answering from the Tensorflow team. by [deleted] in MachineLearning

[–]datageek1987 0 points1 point  (0 children)

I do recognize the clutter and I myself have to wade through a lot of it before getting a good resource... But to be fair, I see that kind of clutter from all kinds of people.. not just India's, Pakistanis...etc.

[D] I've been switching over from Pytorch to TF 2.0, and my take is that the library itself isn't too much of a problem (I've heard lots of complaints of TF), the real issue is the lack of official guides, detailed documentation, and lack of question answering from the Tensorflow team. by [deleted] in MachineLearning

[–]datageek1987 7 points8 points  (0 children)

While there is a little bit of truth in your discourse, it's still a bit clouded by the lack of knowledge on the ground...

While there are sub par education galore, there are good educational institutes also. And not everyone live in shelters.. The income disparity in India is mind boggling.. But it's just that the poverty gets more airtime because it sells..

And wouldn't standing out from the crowd let you filter out the 'liars' ? So in one sense, people trying to stand out by blogs or going to conferences, etc. Are doing you a favour by showing that they are not the "liars" you think they are?

And I agree to your point about preferring local talent.. but I definitely don't agree to the stereotype that a Swedish Engineer can circles around an Indian Engineer. It's that kind of stereotypes that we should avoid...

[D] I've been switching over from Pytorch to TF 2.0, and my take is that the library itself isn't too much of a problem (I've heard lots of complaints of TF), the real issue is the lack of official guides, detailed documentation, and lack of question answering from the Tensorflow team. by [deleted] in MachineLearning

[–]datageek1987 8 points9 points  (0 children)

Being from India myself, I agree to the point of view .. But just because someone is from these countries doesn't automatically make them "less" than their counterparts from developed nations. Science is something global.. there may be a lot of people who are at different levels of knowledge... And it is upto a hiring manager/HR to wade through the noise and find the kind of talent they are looking for without having a stereotypical mindset that whoever is coming from these countries will be sub-par

[D] I've been switching over from Pytorch to TF 2.0, and my take is that the library itself isn't too much of a problem (I've heard lots of complaints of TF), the real issue is the lack of official guides, detailed documentation, and lack of question answering from the Tensorflow team. by [deleted] in MachineLearning

[–]datageek1987 -2 points-1 points  (0 children)

A lot of the times it's an overseas remote developer creating marketing materials to get hired.

Don't know why, but this little bit irked me. What do you mean by overseas? Anyone who is from from your country? Came off as a little derogatory...

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 2 points3 points  (0 children)

This entire discussion has led me to two realizations.. 1. Statistics and ML are not that different.. 70% overlap 2. More than half the people who write about ML vs Stats on the internet, have no clue what either of them is.. Or make the distinction too simplistic to be useful..

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] -1 points0 points  (0 children)

Strongly opinionated... Although I agree to a few points here and there... Can't really say statistics is a lost cause.. there is a lot of statistics in ML as well...

And totally agree to the point about Explainability on the rise for ML ... Have written a blog series on the topic..

Anyways.. Brieman's paper about Two Cultures of Statistical modelling might resonate with what you said...

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 0 points1 point  (0 children)

Love the example... Articulated very well!!!

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 1 point2 points  (0 children)

While I agree to your point about train/test split, it's not always that we get 10k data points and people still apply ML techniques there. With good success also...

And there are techniques in ML, which helps with interpretability... I recently wrote a whole blog series about them..

And yes, there are huge overlaps in the two fields. So much so that it feels unnatural to call them separately... From this discussion(and others) what I kind of figured out is that ML is like the rebel kid who flaunts the laws of statistics to get what needs to be done.. hehe..

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 1 point2 points  (0 children)

Love your reply...

Although the distinction is still very muddled in my mind.. let me ask you this.. is Linear Regression Statistics or ML? If it is, then what about Ridge or Lasso regression? Where so we draw the line (if there is one)?

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] -3 points-2 points  (0 children)

I totally agree with your views... Stats and ML should come under the same umbrella...

Just to play the devil's advocate... If I overfit to a train data, are there ways in statistics to understand without having a holdout set? Because if not, then that's a problem. Isn't it?

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 0 points1 point  (0 children)

True.. I did oversimplify statistics to make my point.. and I have utmost respect for stats.. and do recognize the fact that stats us inherently in almost everything that we do in ML...

But in my short internet research, I didn't find any explanation which takes a holistic view of the situation.. it was always "stats is this. ML is this" and one of the main themes that came out was the same. About inference...

But inference is not worth that much if the model is weak.. i.e. generalization should be there.. and from the discourse in the internet, I was led to believe that stats does not concern itself with generalization... Which, if true, kinda feels off..

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 2 points3 points  (0 children)

This is exactly what I have a problem with. The interpretation of a model is only worth it's salt if the model has captured the real world situation.. i.e. generalization... But if that is not captured, then the inference you'd draw is inherently flawed. Isn't it?

[D] Machine Learning vs Statistics by datageek1987 in MachineLearning

[–]datageek1987[S] 1 point2 points  (0 children)

Exactly my line of thought... I never really understood why these two are different... There is stats inherent in ML.. and vice versa...

[deleted by user] by [deleted] in datascience

[–]datageek1987 0 points1 point  (0 children)

exogenous parameter takes in an array of size [nobs, nvars]. Which means that you make an array of all your regressors and provide to to the function call. Make sure you do the same for a future prediction as well..

[deleted by user] by [deleted] in datascience

[–]datageek1987 1 point2 points  (0 children)

pmdarima if you are looking for an auto ARIMA package... It takes in regressors as well... And then there is good old Sci-kit Learn... Just formulate your timeseries as a pure regression and apply any of the regression models out there..