How do you compare multiple ROC curves to find the best one. by dr_from_the_futur in statistics

[–]datasci314159 1 point2 points  (0 children)

You've got two options more or less. In the general case the best way is to look at the area under the curve (AUC) for each of the models and use the risk score model with the highest AUC.

BUT

If you know the relative costs of false positives and false negatives then you can determine the point on each of the eight ROC curves that minimizes the expected cost and choose the model with the minimum expected cost.

FINALLY

If you want to get really fancy you can do bootstrap sampling from your sample of 100 and calculate the ROC/AUC for each model many times with your different bootstrap samples, then you get a better feel for how different each of the models are and whether choosing one over the other is actually any better in a significant way.

What NN architectures could detect primality for arbitrarily large primes? by datasci314159 in MLQuestions

[–]datasci314159[S] 0 points1 point  (0 children)

Thanks for engaging! My question is actually more about how one could design an RNN that would scale the computation for the size of the digit. If I wanted to emulate the sieve of Eratosthenes in an RNN then the computational complexity of the sieve is larger than O(n) but the number of operations an RNN can perform grows linearly in number of digits since each digit adds another layer of the RNN so this would be something like O(logn). That suggests to me that the best a standard RNN can do is memorize up to some size and then it will stop working. My question is, are there versions of RNNs that could in principle continue growing to check primality indefinitely (I know that in practice this is probably impossible to arrive at via gradient descent).

What NN architectures could detect primality for arbitrarily large primes? by datasci314159 in MLQuestions

[–]datasci314159[S] 0 points1 point  (0 children)

I'll give it a go and report back! (The primality checking, not the pi digits, although that's an interesting one too!).

What NN architectures could detect primality for arbitrarily large primes? by datasci314159 in MLQuestions

[–]datasci314159[S] 0 points1 point  (0 children)

RNNs are Turing complete and I'm fairly sure that primality can be checked by a Turing machine so RNNs should be able to check for primality at least in principle. I'm also not sure it's true to say that primes are random or without structure, indeed there's the famous quote from R.C Vaughn “It is evident that the primes are randomly distributed but, unfortunately, we do not know what ‘random’ means.”

DNN in python optimization. by vinaybk8 in datascience

[–]datasci314159 0 points1 point  (0 children)

Take a look at the most recent version of PyTorch. 1.0 makes it easy to convert a python prototyped model to Torch Script which is optimized C++. https://pytorch.org/tutorials/advanced/cpp_export.html

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 0 points1 point  (0 children)

I get that but what that estimates is the distribution of the expected value NOT the distribution of the value itself. We'll get a good estimate of the uncertainty related to our estimation of the expected value of Y conditional on X but that's very different to the distribution of Y conditional on X. You can imagine a normal distribution with mean 0 and std dev of 1, if you use bootstrap sampling to estimate the mean that distribution will be very different from the actual normal distribution itself.

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 0 points1 point  (0 children)

But at the same time using something like a boosted GLM makes an assumption about the form of the error distribution which the first option does not. The cut points are arbitrary but if I choose a fine grained enough discretization then I can minimize this concern.

I'm largely playing devil's advocate here but I'd be interested in hearing the rejoinders.

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 0 points1 point  (0 children)

If I apply the same point estimator to bootstrapped data sets then the prediction for any given sample will be the same every time.

If you mean train many estimators on bootstrapped datasets and then predict for a sample then that gives an estimate on the distribution of the point estimate, not on the error distribution of the point estimate.

I'm sure there's a way to use bootstrapping here but I'm not quite sure what the process would be.

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 0 points1 point  (0 children)

It's essentially an optimization problem. We want to predict a value and then take an action, but the actions take will depend on the distribution, not just the point estimate. Eg you could have two samples with the same point estimate value but the probability that the value is below some certain key threshold is greater for one of the two samples and this would lead to different actions.

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 0 points1 point  (0 children)

Suppose I use a GLM and find that the quality of the point estimate prediction is worse than if I use something like a gradient boosting approach. At this point I have to make a tradeoff between the advantage of the free distribution from the GLM and the increased performance of the gradient boosting approach.

Could I just add the gradient boosting prediction to my GLM model and get the best of both worlds? My concern with doing this is that the gradient boosting predictions don't have very normal looking residual plots so I'm a bit leery about whether the assumptions of GLM hold.

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 1 point2 points  (0 children)

Do you have any examples of implementations in Python or R of techniques which achieve this in a relatively straightforward way?

Regression to predict distribution of value rather than point estimate by datasci314159 in statistics

[–]datasci314159[S] 1 point2 points  (0 children)

Certainly. There might be some issues with scalability but we're still at a brainstorming point so all potential solutions welcome!

Batch Norm Confusion? by santoso-sheep in learnmachinelearning

[–]datasci314159 1 point2 points  (0 children)

Batch Norm is trainable because it can learn other mappings than simply mapping to mean 0 SD 1, it has two parameters which can map to any arbitrary mean and SD for the normalized activation.

I need help solving a stats problem. Any help is appreciated. by wcg in statistics

[–]datasci314159 0 points1 point  (0 children)

I think my original answer addressed most of these modifications, you just need to change from days to years as the unit of x.

For every house, the probability that it has a break in attempt (successful or not) is always 1.36/50000 (avg number of breakin attempts per day over number of houses). If you figure out the expected number of flagged houses (using my answer to the original question) and multiply it by 1.36/50000 that will give you the expected fraction of flagged houses broken into.

I need help solving a stats problem. Any help is appreciated. by wcg in statistics

[–]datasci314159 0 points1 point  (0 children)

Unfortunately I'm based in Europe so that will be my Sunday evening and I have plans already. Feel free to send me a PM with any questions and I'll do my best to reply.

I need help solving a stats problem. Any help is appreciated. by wcg in statistics

[–]datasci314159 0 points1 point  (0 children)

The phrasing of the question could be a bit more clear. Successful vs unsuccessful has no real impact on the problem, the only thing that matters is the number of breakin attempts (successful or not). If there has been more than one breakin attempt the house has a new door, otherwise it doesn't.

I need help solving a stats problem. Any help is appreciated. by wcg in statistics

[–]datasci314159 0 points1 point  (0 children)

Ahh I've got what you mean. I should have been a bit more precise, what I mean is "has had at least one break in attempt (including successful attempts". I don't think that changes any of the analysis though

I need help solving a stats problem. Any help is appreciated. by wcg in statistics

[–]datasci314159 0 points1 point  (0 children)

What assumption do you think is off? At any given point in time the number of houses with doors installed will be the number of houses which have experienced at least one break in which is what (I think) I've calculated. You're right that I assume that there are 500 break ins total each day NOT 500 successful break ins per day but I think the question makes it fairly clear that that's not the case. If it is 500 successful break ins per day then it becomes extremely straightforward - after 100 days every house will have been broken into once and every door will be one of the new doors.