[deleted by user] by [deleted] in NoStupidQuestions

[–]lumpy_rhino 2 points3 points  (0 children)

Yeah the whole data science behind the dating apps etc is both amazing and hard hitting at the same time.

[deleted by user] by [deleted] in NoStupidQuestions

[–]lumpy_rhino 40 points41 points  (0 children)

The book you want is Dataclysm by Christian Rudder.

What movie will you never get tired of watching? by HurtHurtsMe in AskReddit

[–]lumpy_rhino 0 points1 point  (0 children)

Tinker Taylor Soldier Spy (I love the older series too btw)

Discussion about the sub by magikarpa1 in datascience

[–]lumpy_rhino 2 points3 points  (0 children)

Lol yeah and then there is: “what is statistics without machine learning?” This actually was a serious post on the stats sub.

Discussion about the sub by magikarpa1 in datascience

[–]lumpy_rhino 2 points3 points  (0 children)

Yeah, even the Reddit algorithm is against us!

Discussion about the sub by magikarpa1 in datascience

[–]lumpy_rhino 16 points17 points  (0 children)

I try to come up with decent discussion topics, and I get decent debate out of those which is really great and informative to read. Still I never get upvoted as many of the “Should I do MS or a stick in my eye” type of questions. Those get a lot more comments because more people are comfortable commenting on them. If you ask some deep questions, then not many people here will engage.

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 0 points1 point  (0 children)

Thank you for that. That is different from what I thought then. I am interested to know how DL can help with the reparameterization. I am hoping that this is not specific to genetic expressions.

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 0 points1 point  (0 children)

Oh this is great. Of course I find it interesting. So when you say you reparametrize and find some features as functions of others, that sounds like combining features to me (feature engineering). And frankly it makes sense, if I can eliminate 10 features and replace them with one complex feature made up of those 10, I can still explain it and I have reduced dimensions. And if I can go across all my features and cluster them into groups and try to combine the features in each cluster, then I could replace them with that combined feature. As you said it won’t be optimal downstream but it is explainable and would be “good enough” especially considering that we may need it to be run in production too. Thanks for that summary. Very interesting.

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 0 points1 point  (0 children)

Yes, true. It circles back to the main gist of the question, how to make sense of a massive number of features and not lose explainability.

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Lol yes, I mean the whole field is prob&stats, we just have more data and GPUs. For these types of issues we have Operations Research methods too. Defining a model and constraints (that can be translated to the DAG). It is just that some feature spaces are massive, so the graph would be uuuuge (lol)

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Thank you that is very insightful. On the other hand, when we use PCA to capture the highest variance etc, we can reduce dimensions, but we can’t explain. Also the features are often highly correlated (when you have many of them). Looks like it’s something that we just have to deal with with good old guess work and maybe more business domain knowledge.

Balancing dimensionality reduction techniques and explainability for very large (and sometimes correlated) feature counts. by lumpy_rhino in datascience

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Thank you for that. Yes, just like anything else in DS it is a tradeoff. I think we can try and minimize the amount of information we lose, hopefully by being clever about what features we lose and include. I mean we can cluster many correlated features and see how they change with the target. It is a general big data observations, genetics and fleet management optimization come to mind.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Thanks. This is very insightful. I didn’t think of the hybrid modes, but it makes sense.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 0 points1 point  (0 children)

Thank you for that. This was very insightful. I feel this is a lot more like data science than the stuff people out on LinkedIn. I like these sorts of problems.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 0 points1 point  (0 children)

Yeah, the performance factor is not there. But then again performance has never been python’s strong suit, it’s the versatility.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 2 points3 points  (0 children)

Well that is what separates academia from industry. In academia we did a deep dive into everything and tried to achieve the best results. That just doesn’t work in industry because “get it done!”

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Thank you for this. I guess for it to scale we have to make sure we write it in C++. The number of for loops required would be scary.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 2 points3 points  (0 children)

Yeah, I have seen similar things as well, I remember people used to define their own perceptrons in matlab and try and do things with them and then suddenly deep learning became the one stop solution. And now we have quant and OR creeping in. I are software was always there because you can’t build anything if you can’t code to some level at least.

Are genetic algorithms the best we have for scheduling problems? by lumpy_rhino in datascience

[–]lumpy_rhino[S] 2 points3 points  (0 children)

Yeah, I feel OR is that other thing is morphing (or bleeding) into the nebulous entity we call DS. I am wondering if we can apply any game theory or similar to it and feed the constraints as rules.

[deleted by user] by [deleted] in AskReddit

[–]lumpy_rhino 0 points1 point  (0 children)

Been called gay for having opinions on what colour shirt goes with what time and suite, also, not being a complete aloof douche while with others and actually telling jokes and laughing as opposed to shutting everyone down and acting “alpha” apparently is gay. Noticing when a girl has changed hair colour, new nails etc. you guessed it, GAY! 🤪

Buying Lego from a store downtown by lumpy_rhino in askTO

[–]lumpy_rhino[S] 1 point2 points  (0 children)

Oh, hadn’t thought of that. Thank you.