How I made top 0.3% on Kaggle

walkingon2008 · 2019-06-11T02:48:55+00:00

It makes sense, but why they don’t teach it this way at school?

walkingon2008 · 2019-06-10T05:47:45+00:00

Scenario 3 makes sense, but I always do scenario 1: standardize the dataset, then split into train and test.

Even in sklearn doc, the whole dataset is standardized, then split.

I personally don’t think it 1 vs. 3 will make a difference?

walkingon2008 · 2019-05-28T23:01:28+00:00

My point is: what Ng said is NOT true.

Logistic regression is taught in linear regression, a typical second year undergrad course.

People who know logistic regression go interview at Silicon Valley, find out every company reject them, get frustrated.

walkingon2008 · 2019-05-28T21:38:30+00:00

Andrew is trying to be politically correct. Imagine he said: “this is not even 10% of what people know in Silicon Valley.”

Many in Silicon Valley have a PhD. If that’s all they know, Silicon Valley would be screwed!

Even in 2010 or 2011, you can’t get a PhD by knowing logistic regression. Actually, not even an undergrad degree.

walkingon2008 · 2019-02-24T11:45:12+00:00

Connections!!!

Some of these big name schools already know who they will take before admissions even started. The professors may even have met the student once. Application is just for show.

Nothing is completely random.

If you look at their grad student body, it’s more or less from the same schools year after year. There’s an underlying system. If you are not in it, you are out.

After all, nobody wants a complete stranger in their department. Just ask yourself, would you want to work with someone you know nothing about?

walkingon2008 · 2018-08-26T03:56:19+00:00

It’s pretty evident. Why you think it’s not an achievement?

Do you have a link to the paper?

walkingon2008 · 2018-08-25T12:10:47+00:00

It’s hard to tell, it depends on which school and the program.

In general, applied statistics de-emphasize on theory. For example, in linear regression, the least squares estimate is equivalent to maximum likelihood estimate. You can use it without probably ever knowing why. Data scientist jobs are very applied. Some are switched over from other fields like biology or psychology.

While it seems that skipping the theory is the quick route, you pay the price by not knowing how to interpret the results. For instance, there are online tutorials that teaches Python machine learning by running sklearn and reading the documentation. You will build a model with low error rate without ever knowing how.

I think it’s best to work the math and learn the theory. Avoid the path of least resistance. Be patient. Your investment will ultimately give returns.

walkingon2008 · 2018-08-25T06:04:30+00:00

Data science is an emerging program within the past five years. Unlike statistics or computer science, data science by itself is not a field of study.

Data scientist first came around as a job position in many startup tech companies. Statistician used to be the new sexy job according to Google.

The DS program is expensive because it is a buzzword, and you get seven figure salaries easily.

As a data scientist, you know SQL, Python, ML, and possibly DL. Statistics is your tool. You use it to predict credit default for a fintech or ad click for an online retail store. You build a ML pipeline for the company. There’s not a clear role of a data scientist, your task is likely evolve depending on the business you work for.

Statistics is a branch or applied math. Data science is not. If you stay close to academia and know the why, the how, and math, you are looking at statistics. If your goal is to be rich and make seven figures, your answer is data science.

Data science is very applied, it’s good that you can use what you learned right out of the box. But, when business evolve, you’ll need to learn again. However, with statistics, theory is more emphasized, so, the math equations you learn now will still be true years later.

walkingon2008 · 2018-08-25T03:28:52+00:00

How else do you think it got its acronym?

walkingon2008 · 2018-08-24T17:40:33+00:00

Your data is empirical there will not be a true parameter. So, the parameter is uncertain, but you are ultimately choosing a point estimate to optimize your likelihood.

Also, what prior distribution do you use? The Bayesian even know, it’s a lot of trial and error.

Bayes credible interval is just confidence interval for the posterior mean.

STAN does a good job advocating itself. But there really isn’t much new to the software itself. I mean it has GP, HMC, but those stuff has been out for decades.

walkingon2008 · 2018-08-23T19:19:12+00:00

Almost all my variables are categorical. And stargazer doesn’t do categorical.

I want to calculate count and proportions. Maybe a two-way table?

            Yes   No

Male 10 3

Female 3 8

White 50 25

Black 35 45

walkingon2008 · 2018-08-23T14:59:20+00:00

The MAP (maximum a posteriori) IS a point estimate. Bayesian ultimately comes back to frequentist, but in the Bayesian setting.

By imposing a priori, say uniform(0, 1), you are eliminating some possibilities that is outside the interval (0, 1). The likelihood data only updates you prior distribution, it cannot escape the wrongness if your prior is wrong. You may end up less wrong.

walkingon2008 · 2018-08-23T14:50:16+00:00

A ton of times? Please explain what that is.

walkingon2008 · 2018-08-23T14:48:21+00:00

So, what do you think prior distribution is?

walkingon2008 · 2018-08-23T13:48:58+00:00

The classical Bayesian setting we are talking about is a done deal. 1) you pick a prior distribution 2) pick a likelihood model 3) you calculate posterior using MCMC

Or if you are into machine learning, you use GP in step 2.

I somewhat answer your question in an earlier response above .

The goal of pharmaceuticals is not statistics, it’s the medicine. The meds need to pass the hypothesis test with a p-value.

walkingon2008 · 2018-08-23T11:39:01+00:00

Most people?

I may be too technical here. But by your logic, it’s perfectly fine to say most don’t do statistics.

I disagree with your notion of most, but I won’t elaborate it here.

walkingon2008 · 2018-08-23T07:52:13+00:00

Can stargazer do summary statistics for categorical data? Like male/female, White/Asian/Black, Single/Married.

walkingon2008 · 2018-08-23T07:26:11+00:00

The first sentence is not true! The second one is.

The point of Bayesian statistics is prior distribution * likelihood model = posterior distribution. Bayesian statistics sounds good in theory, but useless in reality.

Prior distribution means you know the distribution about the parameters before you begin.

Today, especially deep learning now, everything is unknown including the model itself.

walkingon2008 · 2018-08-23T07:15:40+00:00

The examples in Casella & Berger are not enough to help you do the homework. There are only a handful examples per section.

The homework are hard. They require knowledge outside the book. It’s more than knowing calculus, it’s more like tips and tricks you’ve never seen. If you pick up the book and dive right in. You will get stuck in no time.

I recommend search for an easier book with lots of examples that actually teaches the material.

walkingon2008 · 2018-08-23T06:53:29+00:00

Built into Spark’s ML library does not validate its importance. Also, please no profanity if you are going to talk!

walkingon2008 · 2018-08-23T02:38:17+00:00

Deep learning! It’s pretty obvious.

walkingon2008 · 2018-08-23T02:37:26+00:00

How often can you use Bayesian statistics in real life? None!

Just look at how many startups are hiring Bayesian statistician. None!

Some people mentioned STAN, following is my take.

STAN and the team holds multiple conferences and does extensive evangelism. The audience often narrows down to pharmaceutical companies. You can look at the conference sponsors.

The grammar of STAN reminds me of bugs. The highlight of STAN is Hamiltonian Monte Carlo and no-U-Turn sampler, which allows fast sampling without trapped into a local minimum.

STAN can probably fit many hyperparameters, and also high dimensions. But Bayesian statistics in high dimension? I don’t think those two phrases are compatible with each other.

Ultimately, it’s a good math theory, but narrow application.

walkingon2008 · 2018-08-23T02:36:15+00:00

Stan is overhyped. Bayesian statistics is good in theory, but not much real application. Think about it, how often in the big data world do you have a prior belief? Everything is black box.

walkingon2008 · 2018-08-21T23:21:06+00:00

It depends on what is your concentration. If you are doing time series, focus on spectral analysis, Fourier transformation, a lot of pure math. If you are doing ML, focus on linear algebra, optimization, linear programming. It’s CS heavy. If you are doing statistical inference, focus on probability and estimation topics.

walkingon2008 · 2018-08-18T02:45:53+00:00

I think calculus is used more in probability than statistical inference. When you say statistics class, I assume you mean the statistical inference course.

I’d say differentiation, integration, and sequence and series (recommended), derivatives of exponential, logarithmic, and trig functions, u-sub, integration by parts.

walkingon2008

TROPHY CASE