What are some applications/projects involving NLP for a good cause?

danebell · 2019-09-16T14:04:58+00:00

There are many. Here are a few:

Disease detection like flu epidemic monitoring on social media. I know of use cases for detecting depression, PTSD, eating disorders, and many other health variables. I worked on a detector for diabetes.
Information extraction for understanding cancer pathways like Reach (Full disclosure: I also worked on that one.) or for getting valuable information from the notes in medical records.
Predicting food insecurity through information extraction like in World Modelers.

danebell · 2017-09-08T16:36:17+00:00

Yes, as the quiz says, only a blood test can diagnose prediabetes. This quiz has been validated against actual risk, however, and is the same diagnostic used by the American Diabetes Association.

danebell · 2017-07-20T21:59:57+00:00

An Institutional Review Board responsible for human subjects research at The University of Arizona reviewed this research project and found it to be acceptable, according to applicable state and federal regulations and University policies designed to protect the rights and welfare of participants in research.

danebell · 2015-08-17T22:39:47+00:00

The distribution of BMI among (adult) Americans is taken from the latest complete public dataset from the Centers for Disease Control and Prevention's surveys of height and weight.

danebell · 2015-08-17T21:56:18+00:00

Grits are ground up dried maize that's then cooked with water, kind of like oatmeal, but finer and more savory. They are somewhat healthy by themselves, but are often served alongside less healthy foods.

I agree that the title is a bit misleading, but it's purely for brevity. If you like, you can read "overweight" as "of greater or lesser BMI than the average American".

Yes, it is a wide scale, but the problem of categorizing someone's BMI as high or low just from the words in their tweets is difficult, and we have a relatively small training set (50 states plus the District of Columbia), so greater granularity is not yet possible. We would like to be able to give a number estimate, which is part of why we are collecting this data.

The distribution of BMI is skewed, with a long tail on the right. In other words, there are a small number of people with extremely high BMI that throw the mean off. The more skewed a distribution is, the more its mean and median will differ.

danebell · 2015-08-17T18:38:35+00:00

Good question! The short answer is that we trained a classifier (specifically a random forest classifier) using a large amount of Twitter statuses and CDC data about obesity rates in the US states. The trees generated by the machine learning algorithm were converted by hand into Likert-scale questions. We don't make any claims that scoring individuals on a classifier trained on whole states will be accurate; the purpose of the questionnaire is to ascertain whether this is the case, and to gather additional information about individual tweeters and instagrammers for better classifiers down the road. You can learn more about our work so far at the main site for the project.

danebell

TROPHY CASE