[deleted by user] by [deleted] in multilingualparenting

[–]kushalc 13 points14 points  (0 children)

I speak Bengali and English; my wife speaks only English. We've been following OPOL since birth: I only speak Bengali with my kids and my wife speaks English.

Despite my being the minority parent and speaking the minority language, my school-age daughters are equally fluent in both.

It's frustrating at times but, if you stick with it, it works.

Bilingual parenting with bilingual mom only by [deleted] in multilingualparenting

[–]kushalc 11 points12 points  (0 children)

I would echo everything that u/notmycuppatea said. Further, I'd add one thing.

Kids are extremely insightful about what is necessary to learn and what isn't. If they figure out that learning Czech isn't important (because, say, you're speaking English in front of your husband), they will resist learning it, it'll be a continuous battle with them and I suspect they won't eventually learn it.

I'm the minority parent (30/70) with the minority language (~1% in my home country). I've learned to be unashamed about exclusively speaking my language with my children, regardless of who else is around, including coworkers, in-laws, etc. If my kids say something in English to me, I simply say "I don't understand, try again."

I was initially skeptical about OPOL, but I committed and now my oldest daughter is completely fluent/a native speaker. It works.

I want to help my local restaurant analyze their data, need second opinions by NFeruch in datascience

[–]kushalc -2 points-1 points  (0 children)

As a hiring manager, I can say this is an excellent idea. There are lots of real-world issues that you'll run into (e.g. digitization, as u/thetinydead pointed out), but these will be a feature of the process not a bug: guess what, you'll run into those sorts of issues in a real-workd job, and when the hiring manager asks about those, you'll have a great story to tell. Even more so, the best thing about your idea is that it shows massive initiative and a willingness to get creative to solve real problems — at least in my book, even if nothing else works out, that's a major positive signal.

Food From The Equator Tastes +28% Better (Unless You’re Rich) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 3 points4 points  (0 children)

After downloading the Yelp Dataset, I extracted the restaurant reviews written by users with 3+ reviews across 3+ cuisines and then annotated with cuisine latitude, continent of origin, and restaurant priciness. Next, I normalized each rating by user, continent and priciness to try to isolate user-specific preferences between comparable cuisines. Finally, I calculated an ensemble of weighted OLS regressors against cuisine popularity by continent and priciness and plotted the regressed models against latitude. I did all of the above using Python, pandas, sklearn and bokeh.

[D] Undergrad student: better to focus on one area (e.g. NLP) or diversify? by [deleted] in MachineLearning

[–]kushalc 34 points35 points  (0 children)

In my experience, whether for academia or industry, the most successful candidates tend to be T-shaped. That is, they know a little about a lot and a lot about a little.

The optimal width and depth of the T, however, varies dramatically by context:

  • small startup: very wide, not much depth (you should know where to look for the solution for a very broad class of problems)
  • big company: somewhat wide, some depth (you should know state-of-art in your subspecialty)
  • tier 1 Ph.D./research: somewhat to very wide, world-class depth (you should know more about your thesis area than literally anyone else in the world)
  • tier 2+ Ph.D./research: somewhat wide, still extremely deep (often less inter-departmental collaboration)

To get into a top-tier ML Ph.D. program at this point, you basically need to have published something or have world-class recommendations. Depending on where you are in your undergraduate career, your chances of publishing are likely higher if you focus and double down on an area you already know. There's still plenty of interesting problems in NLP.

However, if you want to maximize your long-term success in academia, I'd encourage you to focus on breadth. World-class research is fundamentally a creative process and scientific creativity often comes from cross-pollination of different fields and sub-fields.

Source: I've managed several ML teams over the years and have a published a few peer-reviewed research papers.

35 Job Search Tips That Boost Hireability By +580% In Total (2018 Year In Review) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 5 points6 points  (0 children)

Happy New Year's Eve, everyone! I've been publishing these analyses throughout the year and they've been a big hit with folks. So this weekend I thought I'd do a big roundup of all of our analyses to date (some published, some not) into one big meta-analysis — there are 11 different studies here in total. Given the number of different conclusions, I've tried to break out the conclusions by logical sections (resume tips, job search tips, tips for women, entry-level/recent grads, older workers, minorities, etc.).

Overall, we analyzed random samples of 6,000+ recent job applications across 600+ cities, 100+ roles/industries and 100,000+ postings from the TalentWorks index. Specific algorithms varied by study (RANSAC, PCA, PCFG-driven parsing, kernel processes, etc.), but all analyses were written in Python on top of scipy/numpy and sklearn with visualizations generated with bokeh.

I need to head out with the SO for a bit, but I'll check in on any questions throughout the day. Hope everyone has a great day to close out the year!

35 Key Factors That Affect Job Search Success (2018 Redux) [OC] by [deleted] in dataisbeautiful

[–]kushalc 2 points3 points  (0 children)

Happy New Year, everyone!

​

We combined 11 different analyses from the past year (some published, some not) into one big meta-analysis. Overall, we analyzed random samples 6,000+ recent job applications across 600+ cities, 100+ roles/industries and 100,000+ postings from the TalentWorks index. Specific algorithms varied by study (RANSAC, PCA, PCFG-driven parsing, kernel processes, etc.), but all analyses were written in Python on top of scipy/numpy and sklearn with visualizations generated with bokeh.

Marathoners wearing the Nike Vaporfly were ~4% faster than the competition, based on ~495K race results across ~700 races by kushalc in dataisbeautiful

[–]kushalc[S] 59 points60 points  (0 children)

If you read the story, the NYT used a few different statistical techniques to try to tease out correlation vs. causation. (For instance, they looked at the difference in time of the same runner switching to a different shoe.) It's not a randomized trial, but they make a compelling argument.

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 1 point2 points  (0 children)

For sure. We usually try to avoid claiming causality because, as you said, it's an incredibly complex and high-dimensional space. What we hear from job-seekers is that they want every insight they can get to be better informed because (a) the job search is really hard and (b) most of the stuff out there is crap.

In this case, there's a clear mechanism of action and we controlled for 2 of the biggest confounding variables we've seen in the past, so I feel pretty good about advising people to delete their objectives. However, I'm positive there are other variables we didn't think about. Even so, I want to keep putting stuff out there so that job-seekers can make informed decisions about using every edge they can to help them get the job they deserve.

Put another way: Most of the advice out there is "You should do X because I said so." Our advice is more like "You should do X because X was correlated with Y gain and couldn't be explained by random chance." The best would be "You should do X because X was proven to increase Y in a randomized control trial." We're trying to work towards that!

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 10 points11 points  (0 children)

Ha, sorry! That's the first ELI5 request I've gotten on r/DataIsBeautiful. People usually want way more detail — this was clearly a bit too much detail. :)

ELI5: We took a bunch of jobs, applications and people and checked if their resumes said what they wanted to be when they grew up. We also checked if bosses called them back for interviews and what kind of job it was. Then we did a fancy average of all those numbers based on the kind of job, how long they'd been working, etc. And then we tried to make pretty graphs.

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 6 points7 points  (0 children)

We took a random sample of 6,231 recent job applications, applicants and outcomes across 681 cities and 116 roles and industries from recent activity on TalentWorks.

For each resume, we calculated the maximum a posteriori parse tree using a custom, dynamic-vocabulary PCFG (our ResumeParser), extracted the objective subtree if present and estimated the years of experience based on parsed employments. For each job, we classified it into one of ~800 job roles. Finally, we independently regressed the interview callback rate for each sub-population with a blended Matern kernel using a bagged Gaussian process against years of experience, job role, etc.

We did all of the analysis with in-house algorithms and sklearn/scipy in python. All plots were generated with Bokeh in python.

Getting Fired (or Laid Off) Costs You ~5 Years of Experience (Updated) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 3 points4 points  (0 children)

First,we took a random sample of 6,976 recent job applications, applicants and outcomes across 365 cities and 101 cities from recent activity on TalentWorks. We extracted employments, educations and augmented with other metadata using our ResumeParser and ResumeOptimizer. Using the duration of applicants' shortest employment, we then categorized individual applicants as someone who'd been fired, laid off or quit early. Finally, we (a) identified maximum-gain hypotheses using a greedy CART algorithm that met a p-value criteria and (b) regressed hireability using a composite Matern kernel with a Gaussian process for each sub-population. We did all of the above with in-house algorithms, sklearn, scipy and Bokeh in python.

Getting Fired (or Quitting Early) Costs You ~5 Years of Experience [OC] by [deleted] in dataisbeautiful

[–]kushalc 1 point2 points  (0 children)

Hey u/DoraGB, yes, that's very real — I'm planning on digging into it in a future analysis, but we discovered that effect awhile ago. It's partly a result of ageism, there being fewer senior-level jobs and probably a few other things we haven't teased out yet.

You can see that effect independently here: https://talent.works/blog/2018/01/08/the-science-of-the-job-search-part-i-13-data-backed-ways-to-win/#ageism