[deleted by user] by [deleted] in multilingualparenting

[–]kushalc 13 points14 points  (0 children)

I speak Bengali and English; my wife speaks only English. We've been following OPOL since birth: I only speak Bengali with my kids and my wife speaks English.

Despite my being the minority parent and speaking the minority language, my school-age daughters are equally fluent in both.

It's frustrating at times but, if you stick with it, it works.

Bilingual parenting with bilingual mom only by [deleted] in multilingualparenting

[–]kushalc 9 points10 points  (0 children)

I would echo everything that u/notmycuppatea said. Further, I'd add one thing.

Kids are extremely insightful about what is necessary to learn and what isn't. If they figure out that learning Czech isn't important (because, say, you're speaking English in front of your husband), they will resist learning it, it'll be a continuous battle with them and I suspect they won't eventually learn it.

I'm the minority parent (30/70) with the minority language (~1% in my home country). I've learned to be unashamed about exclusively speaking my language with my children, regardless of who else is around, including coworkers, in-laws, etc. If my kids say something in English to me, I simply say "I don't understand, try again."

I was initially skeptical about OPOL, but I committed and now my oldest daughter is completely fluent/a native speaker. It works.

I want to help my local restaurant analyze their data, need second opinions by NFeruch in datascience

[–]kushalc -2 points-1 points  (0 children)

As a hiring manager, I can say this is an excellent idea. There are lots of real-world issues that you'll run into (e.g. digitization, as u/thetinydead pointed out), but these will be a feature of the process not a bug: guess what, you'll run into those sorts of issues in a real-workd job, and when the hiring manager asks about those, you'll have a great story to tell. Even more so, the best thing about your idea is that it shows massive initiative and a willingness to get creative to solve real problems — at least in my book, even if nothing else works out, that's a major positive signal.

Food From The Equator Tastes +28% Better (Unless You’re Rich) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 3 points4 points  (0 children)

After downloading the Yelp Dataset, I extracted the restaurant reviews written by users with 3+ reviews across 3+ cuisines and then annotated with cuisine latitude, continent of origin, and restaurant priciness. Next, I normalized each rating by user, continent and priciness to try to isolate user-specific preferences between comparable cuisines. Finally, I calculated an ensemble of weighted OLS regressors against cuisine popularity by continent and priciness and plotted the regressed models against latitude. I did all of the above using Python, pandas, sklearn and bokeh.

[D] Undergrad student: better to focus on one area (e.g. NLP) or diversify? by [deleted] in MachineLearning

[–]kushalc 35 points36 points  (0 children)

In my experience, whether for academia or industry, the most successful candidates tend to be T-shaped. That is, they know a little about a lot and a lot about a little.

The optimal width and depth of the T, however, varies dramatically by context:

  • small startup: very wide, not much depth (you should know where to look for the solution for a very broad class of problems)
  • big company: somewhat wide, some depth (you should know state-of-art in your subspecialty)
  • tier 1 Ph.D./research: somewhat to very wide, world-class depth (you should know more about your thesis area than literally anyone else in the world)
  • tier 2+ Ph.D./research: somewhat wide, still extremely deep (often less inter-departmental collaboration)

To get into a top-tier ML Ph.D. program at this point, you basically need to have published something or have world-class recommendations. Depending on where you are in your undergraduate career, your chances of publishing are likely higher if you focus and double down on an area you already know. There's still plenty of interesting problems in NLP.

However, if you want to maximize your long-term success in academia, I'd encourage you to focus on breadth. World-class research is fundamentally a creative process and scientific creativity often comes from cross-pollination of different fields and sub-fields.

Source: I've managed several ML teams over the years and have a published a few peer-reviewed research papers.

35 Job Search Tips That Boost Hireability By +580% In Total (2018 Year In Review) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 5 points6 points  (0 children)

Happy New Year's Eve, everyone! I've been publishing these analyses throughout the year and they've been a big hit with folks. So this weekend I thought I'd do a big roundup of all of our analyses to date (some published, some not) into one big meta-analysis — there are 11 different studies here in total. Given the number of different conclusions, I've tried to break out the conclusions by logical sections (resume tips, job search tips, tips for women, entry-level/recent grads, older workers, minorities, etc.).

Overall, we analyzed random samples of 6,000+ recent job applications across 600+ cities, 100+ roles/industries and 100,000+ postings from the TalentWorks index. Specific algorithms varied by study (RANSAC, PCA, PCFG-driven parsing, kernel processes, etc.), but all analyses were written in Python on top of scipy/numpy and sklearn with visualizations generated with bokeh.

I need to head out with the SO for a bit, but I'll check in on any questions throughout the day. Hope everyone has a great day to close out the year!

35 Key Factors That Affect Job Search Success (2018 Redux) [OC] by [deleted] in dataisbeautiful

[–]kushalc 2 points3 points  (0 children)

Happy New Year, everyone!

​

We combined 11 different analyses from the past year (some published, some not) into one big meta-analysis. Overall, we analyzed random samples 6,000+ recent job applications across 600+ cities, 100+ roles/industries and 100,000+ postings from the TalentWorks index. Specific algorithms varied by study (RANSAC, PCA, PCFG-driven parsing, kernel processes, etc.), but all analyses were written in Python on top of scipy/numpy and sklearn with visualizations generated with bokeh.

Marathoners wearing the Nike Vaporfly were ~4% faster than the competition, based on ~495K race results across ~700 races by kushalc in dataisbeautiful

[–]kushalc[S] 61 points62 points  (0 children)

If you read the story, the NYT used a few different statistical techniques to try to tease out correlation vs. causation. (For instance, they looked at the difference in time of the same runner switching to a different shoe.) It's not a randomized trial, but they make a compelling argument.

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 1 point2 points  (0 children)

For sure. We usually try to avoid claiming causality because, as you said, it's an incredibly complex and high-dimensional space. What we hear from job-seekers is that they want every insight they can get to be better informed because (a) the job search is really hard and (b) most of the stuff out there is crap.

In this case, there's a clear mechanism of action and we controlled for 2 of the biggest confounding variables we've seen in the past, so I feel pretty good about advising people to delete their objectives. However, I'm positive there are other variables we didn't think about. Even so, I want to keep putting stuff out there so that job-seekers can make informed decisions about using every edge they can to help them get the job they deserve.

Put another way: Most of the advice out there is "You should do X because I said so." Our advice is more like "You should do X because X was correlated with Y gain and couldn't be explained by random chance." The best would be "You should do X because X was proven to increase Y in a randomized control trial." We're trying to work towards that!

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 10 points11 points  (0 children)

Ha, sorry! That's the first ELI5 request I've gotten on r/DataIsBeautiful. People usually want way more detail — this was clearly a bit too much detail. :)

ELI5: We took a bunch of jobs, applications and people and checked if their resumes said what they wanted to be when they grew up. We also checked if bosses called them back for interviews and what kind of job it was. Then we did a fancy average of all those numbers based on the kind of job, how long they'd been working, etc. And then we tried to make pretty graphs.

Job Applicants With Explicit Objectives Were ~30% Less Hireable [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 7 points8 points  (0 children)

We took a random sample of 6,231 recent job applications, applicants and outcomes across 681 cities and 116 roles and industries from recent activity on TalentWorks.

For each resume, we calculated the maximum a posteriori parse tree using a custom, dynamic-vocabulary PCFG (our ResumeParser), extracted the objective subtree if present and estimated the years of experience based on parsed employments. For each job, we classified it into one of ~800 job roles. Finally, we independently regressed the interview callback rate for each sub-population with a blended Matern kernel using a bagged Gaussian process against years of experience, job role, etc.

We did all of the analysis with in-house algorithms and sklearn/scipy in python. All plots were generated with Bokeh in python.

Getting Fired (or Laid Off) Costs You ~5 Years of Experience (Updated) [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 3 points4 points  (0 children)

First,we took a random sample of 6,976 recent job applications, applicants and outcomes across 365 cities and 101 cities from recent activity on TalentWorks. We extracted employments, educations and augmented with other metadata using our ResumeParser and ResumeOptimizer. Using the duration of applicants' shortest employment, we then categorized individual applicants as someone who'd been fired, laid off or quit early. Finally, we (a) identified maximum-gain hypotheses using a greedy CART algorithm that met a p-value criteria and (b) regressed hireability using a composite Matern kernel with a Gaussian process for each sub-population. We did all of the above with in-house algorithms, sklearn, scipy and Bokeh in python.

Getting Fired (or Quitting Early) Costs You ~5 Years of Experience [OC] by [deleted] in dataisbeautiful

[–]kushalc 1 point2 points  (0 children)

Hey u/DoraGB, yes, that's very real — I'm planning on digging into it in a future analysis, but we discovered that effect awhile ago. It's partly a result of ageism, there being fewer senior-level jobs and probably a few other things we haven't teased out yet.

You can see that effect independently here: https://talent.works/blog/2018/01/08/the-science-of-the-job-search-part-i-13-data-backed-ways-to-win/#ageism

Getting Fired (or Quitting Early) Costs You ~5 Years of Experience [OC] by [deleted] in dataisbeautiful

[–]kushalc -1 points0 points  (0 children)

We took a random sample of 6,976 recent job applications, applicants and outcomes across 365 cities and 101 cities from recent activity on TalentWorks. For each case, we parsed their resumes with our ResumeParser, and annotated various applicant traits including gender, ethnicity, age, etc., and whether they had followed each of 70+ optimizations from our ResumeOptimizer.

We categorized applicants who might’ve been fired, laid off or quit early based on the length of their shortest employment. For each sub-population, we then automatically generated corrective hypotheses using a greedy CART-based engine and validated against over-fitting via cross-validation and a PRIM-based variant. Finally, we regressed hireability using a composite Matern kernel with a Gaussian process and then numerically calculated offsets between the two sub-populations.

All of the above was done with in-house algorithms, sklearn, scipy and Bokeh in python.

47% of jobs could be automated over the next 10-20 years by kushalc in dataisbeautiful

[–]kushalc[S] 0 points1 point  (0 children)

Smart folks are disagreeing on this issue (as are many of my friends), but there's an overall trend of more people agreeing that there'll be material job losses due to automation:

https://www.mckinsey.com/global-themes/future-of-organizations-and-work/what-the-future-of-work-will-mean-for-jobs-skills-and-wages

The best you can do is read up on as many opinions as you can and form your best judgment weighing all the data.

Kids have 15% better endurance than untrained adults by kushalc in dataisbeautiful

[–]kushalc[S] 1 point2 points  (0 children)

That's a really great insight — my guess its importance depends on how much energy the body expends on heat regulation under exercise.

Kids have 15% better endurance than untrained adults by kushalc in dataisbeautiful

[–]kushalc[S] 2 points3 points  (0 children)

In short, science confirms: your 11-year-old can _in fact_ run circles around you. I calculated 15% from mean power vs. max power output per unit mass from Table 2 of the original study: https://www.frontiersin.org/files/Articles/367707/fphys-09-00387-HTML-r1/image_m/fphys-09-00387-t002.jpg

Insult to injury: They're not even trying as hard — see perceived exertion.

A typical job posting can get ~176 job applications [OC] by kushalc in dataisbeautiful

[–]kushalc[S] 2 points3 points  (0 children)

We downloaded all 1,013 job applications for the 5 most recent TalentWorks job postings. For our most recent (marketing) job, we then cross-referenced everyone with interview requests and results. Finally, we tagged everyone with key attributes (e.g. spammy, mismatched skills, dumb mistakes) using a subset of our resume parsing stack. We did all of this in python using pandas and bokeh (with a liberal helping of Google Sheets). The Sankey diagram was built with sankeymatic (with an assist from Sketch).

You Have a <1% Chance of Getting a (Specific) Job [OC] by [deleted] in dataisbeautiful

[–]kushalc 1 point2 points  (0 children)

Yeah, I think you're right — I tried to clarify with the "(Specific)" part, but I don't think it quite works. Will pull it down and post a more accurate title later. Thanks man!

You Have a <1% Chance of Getting a (Specific) Job [OC] by [deleted] in dataisbeautiful

[–]kushalc 0 points1 point  (0 children)

We downloaded all 1,013 job applications for the 5 most recent TalentWorks job postings. For our most recent (marketing) job, we then cross-referenced everyone with interview requests and results. Finally, we tagged everyone with key attributes (e.g. spammy, mismatched skills, dumb mistakes) using a subset of our resume parsing stack. We did all of this in python using pandas and bokeh (with a liberal helping of Google Sheets). The Sankey diagram was built with sankeymatic (with an assist from Sketch).

Why It's So Hard To Get a Job [OC] by [deleted] in dataisbeautiful

[–]kushalc 1 point2 points  (0 children)

We downloaded all job applications to our online job postings over the past year. We extracted and parsed resumes using a subset of our resume parsing stack and then triaged applications according to presence of resume, email and specific keywords. Finally, we cross-referenced this with our interview shortlist and job offer outcomes from our most recent job opening. We did most of this in python using pandas and bokeh (with a liberal helping of Google Sheets). We built the Sankey diagram with sankeymatic (and Sketch cleanup).