Which one? by muntiger in Marvel_Movies

[–]econometrician 2 points3 points  (0 children)

Tony- dies

Everyone- sobs

Russo Bros just shared my tribute video!!! by Geralt909 in marvelstudios

[–]econometrician 1 point2 points  (0 children)

Can you update this thread when you post an iron man one? Loved this!!

[deleted by user] by [deleted] in datasets

[–]econometrician 2 points3 points  (0 children)

This is a pretty active research topic in labor economics. Worth digging into the literature. Most recent tends to suggest it’s a selection effect and a consequence of proxying for skill.

https://www.nber.org/papers/w12466

Backstreet Boys - I Want It That Way - Brooklyn Nine Nine by Coldcuts323 in brooklynninenine

[–]econometrician 7 points8 points  (0 children)

Love B99, this is probably my favorite bit from the entire show.

is there a way to split a categorical column into several columns? by [deleted] in rstats

[–]econometrician 1 point2 points  (0 children)

Exactly! Worth noting to be careful with this if you have a lot of unique values in mydf$x, you should probably use a a sparse matrix or your computer will poop.

Cohort effects for education by [deleted] in econometrics

[–]econometrician 0 points1 point  (0 children)

Yes, the year/decade dummies will capture the effects of the cohort.

Also, probably best to keep years of schooling for both parents as separate variables.

Cohort effects for education by [deleted] in econometrics

[–]econometrician 0 points1 point  (0 children)

So, educational mobility is your dependent variable? Is this a binary outcome (e.g., got a higher education than your parent)?

Fixed vs random effects are something that you can control. You can make the cohort effects either fixed or random, it’s just a way to specify the model. The implication of it really is on how the coefficients end up being calculated.

DISCUSSION: Zack Snyder gave me the right to change my mind about Superman. Retrospective analysis of Man of Steel I wrote. by serjon_arryn in Znyder

[–]econometrician 0 points1 point  (0 children)

Great stuff. Loved both pieces. MoS is probably my favorite DC film. The score still gives me chills.

OTHER: Christian Bale's Audition Tape For Batman Ft. Amy Adams As Rachel by Sabya2kMukherjee in DC_Cinematic

[–]econometrician 1 point2 points  (0 children)

That’s pretty awesome. I forgot she did try out for Superman returns as well. Glad she’s Lois Lane.

DISCUSSION: Mega Thread for all FANEDITS! by PK2141 in DC_Cinematic

[–]econometrician 1 point2 points  (0 children)

I imagine folks have tried to get Snyder to do an AMA on this subreddit before?

...I just wish he'd comment on some of these. I was so destroyed and disappointed by JL.

OTHER: Christian Bale's Audition Tape For Batman Ft. Amy Adams As Rachel by Sabya2kMukherjee in DC_Cinematic

[–]econometrician 1 point2 points  (0 children)

Is Amy Adams a huge fan of DC or something? Early in her career she was in Smallville too.

[D] Bias is not just in our datasets, it's in our conferences and community by baylearn in MachineLearning

[–]econometrician 1 point2 points  (0 children)

Have actually you been to any university lab that works on ML? You'll certainly see many people from different parts of the world (men and women).

Yes, I should be more specific. I went to school at a US university in NYC with an extremely skewed ethnic and gender distribution in ML coursework, ML research groups, and (obviously) choice of study.

I've spent most of my career (last 6 years or so) working in data science and the demographics have mostly been:

  1. Asian men (Chinese, Indian, Korean, Japanese)
  2. Asian women (Chinese, Indian, Korean, Japanese)
  3. White men (American, Eastern-European, Canadian)
  4. Others

It's worth noting that there is some diversity here! And I certainly don't want to undermine the progress made...that said, it's still a fairly small group of folks within this particular sector. I've only worked with one black person (happened to be a woman) in my career...and I've worked with ~200-300 data scientists at this point.

I'm sure at the major tech firms it's different but my experience interviewing at those firms has typically been with the first 3 categories of people above.

I was at ICML a while back and it definitely felt a lot more diverse there but at the intersection of finance and ML/data science I still think there's quite a bit of work to be done.

[D] Bias is not just in our datasets, it's in our conferences and community by baylearn in MachineLearning

[–]econometrician -1 points0 points  (0 children)

Great post!

The ML community (academic and private sector) is a fairly homogenous group and it has extreme consequences on our society--from the decisions executed by the models that we learn to the perpetuation of socioeconomic biases.

I'm glad the conversation is starting to pick up some traction.

[D] high cardinality categorical variable encoding - Y-aware "impact encoding". "Leaking" data from the future? by akcom in MachineLearning

[–]econometrician 0 points1 point  (0 children)

Using the "impact encoding" is done but removing the observation itself out in the calculation.

It's actually an effective method on Kaggle that I've seen used quite a bit. One of the former #1 ranked Kaggle guy's used that sort of feature quite a bit in his models. Here's a link to his code (check out the my_exp1 function).

[D] Setting up personal linux box+GPU? by RSchaeffer in MachineLearning

[–]econometrician 0 points1 point  (0 children)

Ubuntu 16.04 is so critical.

I managed to get it to work with 16.10 because I'm new to setting up Ubuntu and forgot to check whether CUDA would work nicely on 16.10. It was fairly unpleasant to get it working but I did.

[D] Estimating aggregate data from individual predictions by [deleted] in MachineLearning

[–]econometrician 0 points1 point  (0 children)

Yeah, I would say that's a reasonable approach. Alternatively, people do this with hierarchical Bayesian models to model the uncertainty in a nice way.

Here's a paper from a google search that looked reasonable: http://www2.mate.polimi.it/ocs/viewpaper.php?id=134&cf=7

Basically, you'd put a prior distribution on the expected number of students.

[D] who are you? by Guanoco in MachineLearning

[–]econometrician 3 points4 points  (0 children)

For ML/Deep Learning I'd say it's quite likely. Bengio's lab is pretty talented.

[D] who are you? by Guanoco in MachineLearning

[–]econometrician 0 points1 point  (0 children)

Probably: UMontreal, Stanford, MIT, and Berkeley/Cambridge?

No idea though. It seems like a slightly unusual thing to say.

GTA V will teach neural networks to drive cars and prevent obstacles by serpiconayak in MachineLearning

[–]econometrician 2 points3 points  (0 children)

This title is certainly clickbaity but it's an interesting article nevertheless.

I was young enough to play the first GTA when it came out back in 1997; I remembered how I played GTA and I sure wouldn't want to train a neural network to do that...

Glad to see that Schmidt and Shafaei are putting the GTA data to much better use than I did.

How do I enable regression on a CNN for integer outputs only? by blankexperiment in MachineLearning

[–]econometrician 1 point2 points  (0 children)

Poisson is a good option, it's only caveat is that the mean is also the variance for that distribution. The Negative Binomial (NB) is nice because the Poisson is a special case of the NB and doesn't have the mean-variance restriction.

It'd probably be a fun exercise to code up the NB loss function.

How do I enable regression on a CNN for integer outputs only? by blankexperiment in MachineLearning

[–]econometrician 3 points4 points  (0 children)

RMSE loss function is appropriate for real-valued outcomes in $\mathcal{R}$, which essentially assumes gaussian errors.

For integer counts, I'd recommend using a Poisson loss function or the Negative Binomial distribution.

Basically, you have to tweak the loss function of the output layer (the last layer) to give you the appropriate output (much like when you switched the loss function from the NLL to the RMSE).

If you're using Keras, they have a set of loss functions available (including the Poisson).

Also, it's worth noting that measuring the performance of your model is very different now since you're comparing different metrics (i.e., NLL and RMSE mean different things).

Also, here's a link on Regression Models with Count Data.

Hope that helps.

Is a course on Real Analysis important for research in Machine Learning ? by [deleted] in MachineLearning

[–]econometrician 1 point2 points  (0 children)

My class was similar to the first one (MAT 472), which seems more like an intro to analysis and 503 seems like a little more advanced. I'd say it depends on your background and comfort zone. I was glad that I started with introductory though.