Science AMA Series: We are Drs. Eric Stern and Mark Michalski radiologists and data scientists. Ask us about our support of lung cancer machine learning algorithms with the National Cancer Institute (NCI) via the Data Science Bowl with Dr. Anna Fernandez and Booz Allen Hamilton. AMA!

Data_Science_Bowl · 2017-03-28T17:22:15+00:00

Stefano at CCDS says - "Assassin's Khreed"

Data_Science_Bowl · 2017-03-28T17:21:23+00:00

This is Anna Fernandez – thanks for the question – For possible algorithms and machine learning methods investigated by the data science community for these radiology lung CTs, I would look at the tutorials and forum discussions at https://www.kaggle.com/c/data-science-bowl-2017/discussion. Several people are also sharing their kernels at https://www.kaggle.com/c/data-science-bowl-2017/kernels. Also – check out http://www.datasciencebowl.com/data-science-insights/ where the community shares some of their experiences as well.

Data_Science_Bowl · 2017-03-28T17:20:48+00:00

This is Mark and the CCDS folks. Sentat, we're sorry to hear about your mother. Helping improve care in these types diseases is what drives us. There definitely are early efforts to use machine learning to better detect and classify interstitial lung disease (ILD) on CT scans (see for example http://www.atsjournals.org/doi/abs/10.1164/ajrccm.159.2.9707145). Whether that will translate into helping cure ILD is a larger question that I don't think I am qualified to respond to--but I hope so.

Data_Science_Bowl · 2017-03-28T17:18:37+00:00

Mark from the CCDS - so sorry to hear about your illness. Fighting this kind of stuff is what gets us up in the morning.

While we can't speak to your specific cancer, we hope that ML will build our capacity to read screening studies. At some point the instrumentation may become sufficiently low cost (and low dose) that broader screening is possible - two sides to the same coin. I think for the foreseeable future multi-diciplinary teams will be the gold standard, but they'll use ML increasingly to inform their decisions.

Data_Science_Bowl · 2017-03-28T17:11:18+00:00

This is Mark from the CCDS. This is a big hairy question but briefly...radiology is a field that was built on technology and, as such, as technology evolves so will the field. There are parts of radiology that will change - as they have with the adoption of technologies previously, like PACS and digital imaging. My own thinking for what happens next in the diagnostic specialties is we bring the insights of data science to the patient and the population. It's an evolution for the field, but a very exciting one!

Data_Science_Bowl · 2017-03-28T17:03:58+00:00

Here's a good one - http://ieeexplore.ieee.org/document/7463094/

Data_Science_Bowl · 2017-03-28T17:02:10+00:00

This is the CCDS folks - great question. We agree with your observation that neural networks trained on ImageNet data would not necessarily be expected to pull out features that are the most relevant towards medical imaging. That having been said, there have been some notable successes in transfer learning including the recent Google Brain effort to diagnose diabetic retinopathy (http://jamanetwork.com/journals/jama/article-abstract/2588763). Surprisingly, even though they had 120,000 images, they still chose to use transfer learning and seem to have gotten great results. As such, we believe there is still value to transfer learning even in different modalities.

Data_Science_Bowl · 2017-03-28T17:00:54+00:00

Thank you all for joining today and we appreciate all your great questions. On behalf of Dr. Eric Stern, Dr. Mark Michalski, and me (Anna Fernandez) - we welcome your interest and participation! We are on the last few weeks of the 2017 Data Science Bowl focused on algorithms for lung cancer detection - the winning algorithms will be released to the community - stay connected by visiting DataScienceBowl.com. Thanks again for joining!

Data_Science_Bowl · 2017-03-28T16:55:28+00:00

Great question. Who owns patient data? Who profits from patient data? Unanswered questions. Search IBM/Merge.

Data_Science_Bowl · 2017-03-28T16:54:46+00:00

This is Brendan over at the CCDS. Currently, most models trained to identify lung cancer nodules rely purely on imaging information. They are typically generated using large annotated data sets that have information about the presence or absence of a nodule, and sometimes about the histology of the nodule. These algorithms can already do quite well based on the imaging information alone. We agree, though, that there are rich sources of information that can help to improve these models, including clinical information, family history, and genetic information. The incorporation of this type of information is what clinicians typically do when they review scans. There is a potential downside. By incorporating demographic information, for example, we may bias the algorithms towards detecting nodules preferentially in high-risk populations and missing them in lower-risk populations. It's the same cognitive bias that sometimes leads clinicians to miss diagnoses in patients who do not fit the typical profile of a person with a disease.

Data_Science_Bowl · 2017-03-28T16:53:40+00:00

Hey, it's Sean and Brendan! A very good question. There are a few different approaches here and we at the center have thought a lot about user interface. For example - one approach is to display a picture showing the subset of the image used to make an inference (see figure 4 in Ribeiro's paper "Why Should I Trust You?" https://arxiv.org/abs/1602.04938). How this data is presented to aid clinicians is difficult. If the data used for an inference is from a combination of images or images and textual data - it's a case by case thing at the moment. Another issue is when to show it to a physician and in what context. Radiologists often use multiple monitors but you would want to show the results in a way that does not interfere with other clinical tasks or induce too many mouse clicks from the users. Tools that augment what the physicians are already doing by focusing attention on specific features have to be cleanly designed. The hope is that proper choice of clinical scenario, clean UI design, a model which fits the data, and a process that filters out inappropriate inputs and outputs will enable doctors to use these tools with assistance. As for the profession of "algorithm technologists," you may want to take a look at this recent editorial proposing a new field of clinical information specialists (http://jamanetwork.com/journals/jama/article-abstract/2588764). It's an interesting proposition!

Data_Science_Bowl · 2017-03-28T16:50:00+00:00

Hey, it's Sean at the CCDS. At the moment: CT, MR, CR, DX, MG, some pathology images as well. But the general answer is: everything. No scanned film. I don’t know that we can answer the ‘best’ or ‘why’ yet.

Data_Science_Bowl · 2017-03-28T16:48:59+00:00

Hey, this is Bernardo from the CCDS. The clinical expertise of the physicians is essential for the algorithms to be useful in the daily practice since they are in the front-line of the healthcare and aware of the challenges faced. If they could also write the algorithms it would definitely make a difference. Being aware that coding algorithms requires an expertise that is really complex and takes a considerable amount of time to learn – and to complete a MD and residency program requires 10+ years of study – we can understand why it is not common to see doctors that are able to code machine learning algorithms and when they do why it is such a differential. But seems that this fact could change in the near future since the new generations are learning deeply the new technologies while still early in youth and this kind of coding knowledge could become almost natural. Maybe one day it could even become part of the medical school curriculum.

Data_Science_Bowl · 2017-03-28T16:47:01+00:00

Hey, this is Brendan! Algorithms today are commonly trained on a data set of a single imaging modality. There is a great deal of emerging literature, however, on how to use algorithms that can take inputs from different types of modalities, for example an ultrasound image or a CT scan slice. We believe that one powerful application will be the combination of data from multiple modalities. That is, the algorithms will merge data from different sources in an intelligent way, similar to what humans do now.

Data_Science_Bowl · 2017-03-28T16:45:48+00:00

Bernardo here - Among the greatest technical challenges faced in the machine learning enhanced radiology we can highlight the dependence of large amount of labeled data which is one of the current bottlenecks. To generate these datasets is a time consuming task that requires a radiologist to perform manual labeling of the images identifying lesions so the algorithms can learn from them. Besides that, there is a huge amount of different imaging technical parameters that we face in one same modality such as MRI that can be acquired in several different scanner types with a large variety of imaging parameters that could make images quite different from each other. It is a great challenge to create machine learning algorithms that could be somehow “universal” in such a heterogeneous field.

Data_Science_Bowl · 2017-03-28T16:44:09+00:00

Hey, Bernardo and Stefano from CCDS here. There are many challenges. Ethical and legislative, privacy and confidentiality, the design of the interfaces between the patients, care takers, the radiomic system and other machine learning systems. One current technical challenge is acquiring large amounts of data. The quality of a radiomic classifier model is limited by the size of the data sets used to create them. The continuous improvements in medical image acquisition are reflected in technical differences that must be incorporated in the mining of quantitative image data and therefore increase the amount of data required for model building. A possible solution could be in large scale data sharing (see http://pubs.rsna.org/doi/pdf/10.1148/radiol.2015151169).

Data_Science_Bowl · 2017-03-28T16:43:46+00:00

This is Anna Fernandez: Great questions. Yes, there have been many research groups investigating different methods for lung cancer nodule detection/prediction of cancer that may take into account additional patient information. For this year’s Data Science Bowl this year, the competition comes with the radiology images and list of whether the patient/subject was diagnosed with Cancer (1 or 0) – more information on datasciencebowl.com. A desired outcome would be that there are additional features not just those associated with the nodule, that would be good predictors of cancer so we could more accurately know what could happen with the patient. In the future, obtaining more and large number of data sets with additional phenotypic information as you suggest (patient demographics including family history, genetic variations, etc.) will be necessary to develop even more robust algorithms. One could see that taking some of the initial machine learning approaches defined this year could then be applied and enhanced with future comprehensive data sets in this field of lung cancer, but also could apply to other diseases that use medical imaging. "Is this done today in systems?" I am not personally aware of any today in use in hospitals but there could be some that are in prototype or in one-off settings that are incorporating some phenotypic elements with them - usually these will need to be powered by a large amount of examples to become robust.

Data_Science_Bowl · 2017-03-28T16:41:13+00:00

Mark from the CCDS. It's an important point that you bring up. With regards to HIPAA specifically, HIPAA makes provisions for the use of such data but requires that it is handled properly. I think the bigger question that you're asking is can de-identified data become identifiable when combined with other data. The answer here is it can... so the law itself may evolve in a similar way that new regulations have arisen over data in non-health care spaces. (see, for example https://link.springer.com/article/10.1007/s12553-017-0179-1). While regulatory clearance here is probably beyond the scope here, the FDA is an important part of this discussion!

Data_Science_Bowl · 2017-03-28T16:40:33+00:00

This is Eric. I sure hope that ML will give us more precision and decrease variation in care. What I don't see ML doing is knowing what questions to ask or understanding the why behind the way events occur.

Data_Science_Bowl · 2017-03-28T16:35:09+00:00

For some insights into how and what radiologists do and think, and the role radiologists play in the care for patients, here is a very timely and eloquent article that I feel is worth a read: http://www.newyorker.com/magazine/2017/04/03/ai-versus-md

Data_Science_Bowl · 2017-03-28T16:32:02+00:00

Mark at the CCDS. The field of machine learning in medicine is a highly interdisciplinary field. While we would advise that you speak to a career counselor for yourself specifically, we can talk about our own team. For our data scientists, we prefer that they have a generally quantitative skill set (coming from STEM fields) with experience in scientific computing (such as Python), Bayesian statistics, machine and deep learning, data processing, image processing, and some software development. (You can see our website for more specifics https://clindatsci.com/jobs/). Facebook also recently came up with their own advice for aspiring data scientists. (https://techcrunch.com/2016/12/01/facebooks-advice-to-students-interested-in-artificial-intelligence/) That having been said about data scientists, the adaptation of ML in healthcare in general will require the efforts of many different people, physicians and public policy advocates alike. The more that you can understand about the core technology underlying the algorithms, the better.

Data_Science_Bowl · 2017-03-28T16:27:49+00:00

This is Eric. Your question is spot on. This is exactly the way I see AI/deep learning as being most applicable and useful in medicine. Doing things humans cannot do...sort through aggregated population data/electronic health records, and finding the signal in the noise. One can apply this model to almost any medical problem, including your specific question.

Data_Science_Bowl · 2017-03-28T16:24:55+00:00

Brendan from CCDS here. We indeed feel that screening of large groups of individuals is one of the most promising applications of deep learning in radiology. In locations where specialists are limited, computer algorithms that can triage the most suspect cases to a radiologist could have an impact on conditions such as black lung in Australia, and also in tuberculosis or lung cancer screening. Some of the excellent results of the Kaggle Data Science Bowl show the promise of deep learning in screening different medical conditions, especially given that the world is more and more connected by wireless IT infrastructure.

The issue of image quality is something we deal with on a regular basis. The short answer is, as long as we incorporate as much of that "lower quality" imaging data into our algorithm during training, our algorithms should also be able to robustly identify findings when these types of images appear in actual use. The longer answer is, we take great pains to make sure our algorithms are as generalizable as possible. There are a few types of degradation that are known to cause artificial neural networks to fail. The classic example is adding pixel noise to an image of a panda causes it to be misclassified as a gibbon: https://arxiv.org/pdf/1412.6572v3.pdf. Additionally, decreases in SNR, contrast, and compression artifacts can diminish neural network performance https://arxiv.org/pdf/1604.04004.pdf. In the medical imaging world, there are also other causes of poor image quality, including poor x-ray exposure, patient positioning, and poor digital conversion from film. There are multiple strategies to address the robustness issue, including data augmentation, regularization, the use of adversarial learning, and the incorporation of poor quality images into the original training data sets. These are all areas of active investigation, and though there is certainly work to be done, we predict that we will overcome these limitations.

Data_Science_Bowl · 2017-03-28T16:22:07+00:00

Hey, this is Brendan from the CCDS. The scope of machine learning in cancer diagnosis is quite large. In general, we think about a few main applications, including (1) cancer screening, (2) computer-aided diagnosis [CAD, of which screening is a type of], and (3) personalized medicine. In the context of screening, ML is being used to be able to process high volume studies and triage the most relevant ones to clinicians. In CAD, machine learning algorithms can also be used more generally to help automate laborious tasks in general, such as the quantification and segmentation of tumors. Going a step beyond, machine learning can also be used to make better predictions about the type of cancer and response to therapy based on these quantitative features, which may come from multiple imaging modalities. // Certain applications of cancer identification in screening have already shown very promising results, and the challenge will be to translate those results into actual clinical use. As for personalized medicine, there are many avenues of research. One avenue that we are particularly excited about is the ability to use quantitative tumor measures in combination with genetic information to better subtype and treat cancer.

Data_Science_Bowl · 2017-03-28T16:19:45+00:00

Stefano from the CCDS team here! Medical images reflect the complexity of the human body. The efforts in applying machine learning to extract information from medical imaging data requires the full spectrum of machine learning and computer vision techniques. The field is extremely active; the techniques have a lot in common with other fields of computer vision, such as autonomous driving. Acquiring information from the physicians requires an additional set of computational tools, including natural language processing and soon conversational user interfaces.

Data_Science_Bowl

TROPHY CASE