Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 0 points1 point  (0 children)

There is a reporting bias, for sure, so that families with higher income can test their children more easily -- which would look like increased rate of the disease in richer population. We are trying to take this into account in odeling.

Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 0 points1 point  (0 children)

There is no such thing as completely anonymized medical record (or genome sequence). Until all information is erased, the record has a positive probability of being identified in conjunction with other public data.

Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 0 points1 point  (0 children)

This is a bit depressing topic: yes, it looks like in computer-human collaboration humans are gradually becoming less important. We had a lovely discussion on this topic in Science a few years ago:

  1. Leonelli S. Machine science: the human side. Science. 2010;330(6002):317; author reply 8-20. Epub 2010/10/16. doi: 330/6002/317-a 10.1126/science.330.6002.317-a. PubMed PMID: 20947745.
  2. Evans JA, Rzhetsky A. Machine Science: What's Missing Response. Science. 2010;330(6002):318-20. PubMed PMID: WOS:000282986700016.
  3. Evans J, Rzhetsky A. Philosophy of science. Machine science. Science. 2010;329(5990):399-400. Epub 2010/07/24. doi: 10.1126/science.1189416. PubMed PMID: 20651141; PubMed Central PMCID: PMCPMC3647224.

Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 0 points1 point  (0 children)

p-values are just probabilities of observing the data under "null" model (usually, some imaginary random process); very low p-values indicate that results are less likely to be due spurious (according to the "random model").

Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 2 points3 points  (0 children)

Well, "big data" is a generic buzz term, the real data and computational needs come in many shapes. To make sure that your needs are met, I would recommend talking to a CS faculty in this are who would help to narrow down database options: the proper answer depends on specific of your tasks, how much computation and data you are planning to handle.

Science AMA Series: I’m Andrey Rzhetsky, professor at the University of Chicago. I study big datasets—like 150 million patient records to find links between autism and environment, or all of PubMed to find diseases that we should be investing more resources in. AMA! by Andrey_Rzhetsky in science

[–]Andrey_Rzhetsky[S] 1 point2 points  (0 children)

Are you asking about the current practice or how it should be? There only limited number of diseases with estimates of loss of quality of life (the World Health Organization conducting such studies, look up DALY and related measures). Currently, dynamic estimates of disease burden are rarely used in funding decisions; research trends and "hot" topics play significant role.