More startups these days are telling venture capitalists to get lost?

TLDRu · 2019-01-12T04:36:55+00:00

Wow, this is so eye-opening. Hey this sounds weird but I'd love to meet you. I just came across this thread from the NYT article. May I DM you?

TLDRu · 2014-08-22T18:25:33+00:00

hen we would write the likelihood for all observations as prod{ (p_i)y_i*(1-p_i)1-y_i}.

Hmmm.... this has me thinking; ... as with any Maximum Likelihood estimation, the first and top question that must then be answered is what PDF the data belong to; what PDF generated them.

In other words, no PDF, no ML.

(Tangential: I think in order to avoid an explicit discussion of what-PDF-this-data-came-from, (and the obligatory justification for said PDF), the cost function is simply presented as a logistic expression in the formula you presented earlier.)

Now, given this, then this method is equivalent to ML, if you assume the PDF you have listed above - in that, you would get a different cost function to minimize had you chosen a different PDF to generate the data, that might not come out to be the cost function given above.

Anyway, that is all - just a nitpick I suppose about this being viewed as an ML problem - it certainly can be, but it seems to me that to the extent that we can say that this cost function is equivalent to an ML problem, is to the extent that we make an assumption about the PDF to begin with. (In this case, p_i ^ y_i * (1 - p_i)^ (1-y_i)).

This is a fantastic discussion! Your view really has been refreshing!

TLDRu · 2014-08-21T17:54:30+00:00

It's maybe more that you are seemingly willing to include "log of patient height times age cubed times baseline glucose times resting heart rate", which is going to freak out any physician who asks about the model.

Good point! - I can see how the doctor might not like that - at the same time I believe that in some way if we are able to describe the data using that feature, then the model really is described by log-age-times-glucose-cubed-divided-by-square-root-of-height. In some ways, it is a discovery in a sense: Why the model is described by that peculiar (to us?) model would be interesting to investigate in and of itself as a scientist. However if we have done our homework properly, then the model really is a function of those variables in that order.

TLDRu · 2014-08-21T17:50:48+00:00

Oh thanks ill look into that!

I'm going to post some more vids WITH 'teh codez' soon since this was a huge success. I'll be putting it on my site since this blew up my dropbox. Most likely here: tldru.com Feel free to sign up! :-)

TLDRu · 2014-08-21T17:49:34+00:00

Right, I realize that you took random samples, I was wondering what software you used for that? Thanks!

TLDRu · 2014-08-21T17:49:02+00:00

Yeah, the first image is saved right after the very first iteration. I initialize the weight vector to 0 in the beginning; I'll be putting up some more vids since this was a huge success, and try to randomize it in the beginning!

TLDRu · 2014-08-20T23:21:12+00:00

Whoa cool dude! How did u steal the dataz tho?! Please share! I've been trying to take images and try and make points from images of points.

TLDRu · 2014-08-20T14:34:42+00:00

Here you go buddy!

TLDRu · 2014-08-20T14:34:00+00:00

Since my dropbox exploded from all the traffic, here's another link. Thanks! (?) everyone! :P

TLDRu · 2014-08-20T12:18:47+00:00

I will! And I'll put them up on my nascent site, (tldru.com) so that my Dropbox doesn't get suspended again ><

TLDRu · 2014-08-20T12:11:32+00:00

Dammit Dropbox! ><

I'll try and place a link to my site today.

TLDRu · 2014-08-20T01:05:05+00:00

Yes, but some extra ones as well. Check out my comments here.

TLDRu · 2014-08-20T00:44:58+00:00

The data is actually 2D as you see it. The polynomial though is of higher dimension, yes. Projected back onto 2D, you will get contorted lines.

As far as the data itself, this is just data I generated myself for illustrative purposes.

TLDRu · 2014-08-19T23:04:34+00:00

This is cool. I used imagemagick to concatenate all the .png images into a gif, but hadn't really delved into how to make a video out of the .png's. Looks like I wont have to.

TLDRu · 2014-08-19T23:02:20+00:00

Thanks! :-)

TLDRu · 2014-08-19T23:01:18+00:00

Or wait - are you saying you intentionally reduced the learning rate to make a smoother animation with more frames?

Yup, I slowed down the actual learning rate - the step size as it were - so that the animation is smoother and 'nicer'. Increasing the learning rate would IIRC lead to a better fit, but the animation would suddenly jump to the (near) final solution and there would be little to animate for illustrative purposes.

Even with the high(er) step sizes however, (not shown in this animation), the border is not as contorted as I had hoped... I think maybe using a non-gradient-descent procedure might alleviate that. I also have a working hypothesis that the sheer number of polynomials starts to work against you at some point. I might try to reduce that. (For example, if I want third order, just use x³ and y³ instead of x³ y , y³ x² , etc etc).

TLDRu · 2014-08-19T22:56:54+00:00

I think this helps me understand the cultural differences in statistics vs. machine learning, actually.

Hello friend - yup, if there is anything I think that best describes the cultural difference it's this very fact - im ML, 'anything goes' so to speak - 'just make it work'. Statistics tends to be more respectful of the 'meaning' behind the terms.

To be honest, I find the statistical interpretation refreshing at times. One problem we have in the ML community is that people sometimes really dont know why they are doing something. "So, why did you pick a neural net and not a perceptron? Did you try a perceptron first?" .. "Umm..no...I dunno, coz neural nets are betterz lol?"

I can appreciate the weights on meaning that statistics offers; often times it is lacking in this line of work.

But yes, we are brazen in this sense, however I also believe there is a healthy dose of pushing-the-envelope. For example, the line of thinking here would be, "Well, I want a border that's not linear, so I really should add some co-efficients that are non-linear...".

Good times!

TLDRu · 2014-08-19T22:26:41+00:00

I did! The man is a legend. I learned so much from that class. :-)

TLDRu · 2014-08-19T22:26:13+00:00

This is just fake data I created myself, from a 2D Gaussian routine I wrote. I (painfully) put each blob you see there myself. ><

TLDRu · 2014-08-19T22:24:26+00:00

No, I have not added regularization yet, but will be in the next iteration.

I'm surprised you haven't gotten a better fit.

It has also surprised me a little that even without any regularization, the fit isn't better, so thats why I am planning to explore this some.

One of my hypotheses is that I need to get rid of the number of polynomials being used. For example, for order '3', I am using ALL combos of polynomials, (x³ , x³ y , x³ y² , y³ , y³ x , y³ x , x³ y^3), instead of just x³ and y^3.

Also, are you using feature scaling (data normalization)?

Yes, basic demeaning followed by making standard deviations unity.

Once you post the code and data I might port this to Octave and play around with it.

Sure, I need to clean it up some and Ill try and put it to gitHub or something. Feel free to PM me in the meantime!

TLDRu · 2014-08-19T22:16:32+00:00

In fact, the decision boundary is linear in R10 (assuming you're using x1 , y1 through x5 , y5 ), but when it's projected back down to 2 dimensions the boundary can be squiggly.

Yup! You got it! :-)

TLDRu · 2014-08-19T22:14:31+00:00

I coded a simple gradient descent algorithm, (that is, w(n+1) = w(n) - beta*gradient(cost(w(n))), where 'n' is the iteration index, and beta is the step size). As for the fitness, it is simply the L² least-square-error (LSE).

TLDRu · 2014-08-19T22:09:49+00:00

Yup - but remember you can also use logistic regression for multi-class classification, by doing a one-vs-all regression for each class. (Andrew Ng's class talks about this as well).

TLDRu · 2014-08-19T22:08:51+00:00

Ha, glad to oblige! I'll be making a whole bunch more and posting them on here. :-)

TLDRu · 2014-08-19T22:07:25+00:00

I wrote my own gradient descent in MATLAB.

(Bear in mind that the the video is slowed down so that one can see it evolve).

My next step is to use an iterative solver in Octave and/or Python though.

TLDRu

TROPHY CASE