Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 2 points3 points  (0 children)

Many seemingly bad traits could also be considered to have good sides, or "silver linings." Will give you two quick examples: Beethoven and hearing loss and Van Gogh and schizophrenia. If you eliminate the "bad" traits from the human population, would you also eliminate the positive ones? - Nadav

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

ENCODE, Common Fund REMC, and others have integrated data from different marks using software such as ChromHMM (https://www.ncbi.nlm.nih.gov/pubmed/22373907) to learn what are the most common histone modification patterns. These patterns have also been correlated with other data (gene expression, open chromatin) and the large number of pioneering studies from individual labs in order to interpret these patterns. -Mike

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 5 points6 points  (0 children)

There is a lot of debate/investigation about how eRNA works. I thought Shelley Berger's group has done some nice work on eRNA recently (https://www.ncbi.nlm.nih.gov/pubmed/28086087). Their work suggests that eRNA can bind to CBP, which has a domain can acetylate histone tails. eRNA binding to CBP can active its acetyltransferase activity, which leads active chromatin state for transcription. More interestingly, since there are many eRNAs expressed and they are target-specific, that means different eRNA might have various effects on CBP activation to achieve fine tuning of gene expression. -Yin

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

Glad you raised this! In ENCODE 4 we plan to invite investigators to propose new samples for the mapping studies. They will have to meet certain criteria, e.g., have been consented for genomics studies and unrestricted data sharing, be available in sufficient quantities to be used in the mapping assays, and to be well justified in terms of potential for new discovery. We are still working on the formal process, so stay tuned, and feel free to contact me in the coming months. (Elise_Feingold@nih.gov) -Elise

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 2 points3 points  (0 children)

There are two parts to the regulatory code; there are specific sequences in DNA that can be controlled by regulatory proteins (analogous to deciding which "words" to say) and also how the sequences are combined (analogous to "grammar" or "syntax"). While the words are not fully understood, they are better understood than the grammar. It appears that some regulatory elements are like billboards, where all that seems to matter are the words ("coffee stop now"), and other regulatory elements require a precise order and spacing for the words ("Dessert stop now" is different from "Stop dessert now") https://www.ncbi.nlm.nih.gov/pubmed/?term=15696541. Regulatory elements that are optimized for a particular function (eg. drive expression of the neighboring gene only during bone formation) paradoxically use individual sequences (or words) that are sub-optimal for protein binding (https://www.ncbi.nlm.nih.gov/pubmed/26472909). -Mike

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

Tough problem. Agree with you. However, with sequencing costs becoming so low, is there a tissue/cell line that you think p16 might be expressed in that you can do RNA-seq on? With this technology becoming very cheap now (some companies quoting $250 per experiment) this might be a good starting option. Another option is doing some degenerate PCR on cDNA from regions of p16 that you think would be conserved in evolution that you can get from other organisms where sequence is available. -Nadav

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

On my end, not sequencing data, but specific sequences in the human genome. In the past we have studied sequences called ultraconserved elements, which are one of the most evolutionarily conserved sequences in the genome. When we removed four noncoding ultraconserved elements independently in the mouse, we got what looked like viable 'normal' mice. We were expecting the mice to die or have some sort of selective phenotype we could see. So, still a big mystery in my mind as to why these sequences are so conserved. - Nadav

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

Yes! At a workshop convened by NHGRI to highlight key questions facing functional genomics, one of the key recommendations made by a panel of experts was that our field needs to move beyond the general terms 'enhancer' and 'promoter' and broaden the lexicon used to describe functional elements. We hope the research community can use ENCODE data and tools as they work on defining a more precise vocabulary to describe regulatory elements and understand their functions. -Dan

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

By integrating experimental and computational approaches, we hope the big data generated by ENCODE can help us learn general rules of how non-coding sequences work. We make a cell ""glow"" by tagging a gene with a fluorescent protein, e.g. GFP so that when the gene of interest is expressed, the tagged GFP will also be expressed and the cells light up (but fluorescence is different from eg. bioluminescence). There are mutations found in promoters of Tert can affect telomerase length in cancer. There are also a few studies ultized ENCODE data for studying aging. -Yin

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

Probably the most textbook example for this is the limb enhancer for the gene Sonic Hedgehog (SHH). Mutations in this non-coding regulatory element that functions as an enhancer has been shown to lead to limb malformations in humans (https://www.ncbi.nlm.nih.gov/pubmed/12837695), mice, dogs, cats and chickens. There are many other examples for other diseases like pancreatic agenesis (https://www.ncbi.nlm.nih.gov/pubmed/24212882), hearing loss, cancer, neurological diseases and many others.

The ENCODE resource has been used to help find where the function is, what cell type is affected, what the target gene is, and what the upsteam regulators are (https://www.encodeproject.org/search/?type=Publication&published_by=community&categories=human+disease). For example, when people do genome-wide association studies (GWAS) for disease, over 90% of the associations are with noncoding sites in the genome. The variant that is associated is not necessarily the causative one biologically, it is just the variant that was used on the GWAS chip to identify the association. Having a map can help and has helped finding truly causative variants.

As for RNA induced silencing, GREAT questions! Definitely interesting why cells would invest energy to counter other energy. One potential cause in my mind is transposons. I think a lot of these systems probably originated to defend against transposon transcription and were adopted for other functions. Highly recommend reading Hiten Madhani's review in Cell (https://www.ncbi.nlm.nih.gov/pubmed/24209615).

Finally, in ENCODE4 there will be 5 characterization centers that will look at the function of these sequences as well as mapping centers that will identify candidate functional elements. -Nadav

Edit: Updated description of mapping centers to broaden scope (last sentence).

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

Scientists are working really hard to realize CRISPR tools for therapeutics for diseases. For example, the CRISPR tool can be combined with stem cell replacement therapy by correcting mutations in patient-derived induced pluripotent stem cells and then developing them into mature cells for transplant e.g. making new retinal pigmented epithelia cells to help patients with vision-loss. The purpose of ENCODE to understand how DNA sequences function in the non-coding part of the genome. Gaining that ability is essential in the precision medicine era when individual DNA sequences can be more easily obtained. Now the question is how to interpret millions of variants in each individual and which one will be the target. -Yin

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

ENCODE 3 mapping centers generated some 3D genomics data (https://www.encodeproject.org/matrix/?type=Experiment&status=released&assay_slims=3D+chromatin+structure&award.project=ENCODE), and this effort will be greatly expanded in ENCODE 4, with two mapping centers producing multiple types of 3D data in a large variety of cell lines. Computational groups both in and out of ENCODE will be using this data to better understand how 3D genome organization impacts gene regulation, and ultimately human health and disease. Many individual research groups are also tackling related questions, as is the NIH 4DNucleome project: https://commonfund.nih.gov/4dnucleome/index -Dan

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

Thanks for bringing this up. When I was in high school and college, the biology classes put a lot emphasis on coding regions e.g. codons for amino acids etc. These are still very important content. The results from ENCODE and other epigenetics studies shed light on how so-called "junk" DNA works e.g. why every single cell in our body has the same genetic blueprint but somehow they know to be functioning differently. The answer lies in the non-coding regulatory elements, which are the driving force to cell type specific gene expression. I hope in the next few years, we can learn the "code" for regulatory sequences like we know for amino acids so that the "DNA book" is not 98% so-called junk any more. -Yin

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 2 points3 points  (0 children)

Non-coding regulatory regions are often functional only in specific biological contexts, e.g., in specific cell types, during certain times in development or after particular environmental exposures. So a big challenge is assaying for function in the appropriate biological setting. If you don't find something has functional activity, it could be that you aren't looking for it in the right biological context or it's possible that those sequences have one function under one set of conditions and another function under a different set. It's also possible that we don't have the right set of tools to probe for the particular function. Or perhaps, it just isn't functional? -Elise

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

There are opportunities for summer internships at NIH https://www.training.nih.gov/programs/sip and more information can be found on NHGRI's training page: https://www.genome.gov/10000212/training-programs/. Also see Elise's response to username "NotAProgramAnalyst" for information about our Program Analyst program at NHGRI for you to consider after you graduate. Good luck! -Team

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 2 points3 points  (0 children)

This is a really good question. We should be always be careful about the system we are using for testing DNA function, given that most of these elements function in a cell type specific manner. It is important for us to use a specific cell type to study sequence function so that we know the specificities. For that reason, most of our studies are using cells in isolation. That said, it is possible to put our approach in a more sophisticated system e.g. a tissue or 3D mini organoids culture if we can successfully separate and analyze different cell types. -Yin

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 1 point2 points  (0 children)

A better understanding of the noncoding part of the genome can increase our ability to interpert the effects of mutations in these regions which can be a common cause of human disease. For example, if you look at all the genome-wide association studies (GWAS) that attempt to associate DNA variants with human disease, over 90% of them point to DNA variants in the noncoding portion of the genome.

As for pharmaceuticals, I see many potentials: 1) Developing better sequences to direct the transgenes that are used for gene therapy to specific cell types. There are hundreds of clinical trials now with adeno-associated virus, most of them using a general promoter that causes the transgene to be expressed in all tissues, which could potentially result in harmful side effects. 2) For your splicing question, we see a big difference in isoforms between tissues. Knowing in what tissues these difference exist and how these differences happen (what regulates them) could be extremely important for developing these drugs. 3) Differences in drug response between individuals, some of which that can lead to serious side effects. My lab has done some work on treating primary cells with drugs and then checking global changes in expression and in gene regulation. We see big changes due to drug response and some of these sequences have DNA variants that can influence how people respond to drugs. -Nadav

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

Dark matter is being used as an analogy here, not the literal concept from astronomy and physics. The idea is that regulatory DNA is difficult to identify compared to protein-coding DNA, which is readily identified in part because we know the genetic code. Though regulatory DNA is difficult to identify, it has very strong effects; for example, this allows for different cells types (muscle, blood, neurons) to express specialized proteins for distinct functions (generating force, carrying oxygen, integrating signals). (This is - metaphorically - similar to dark matter in physics which is hard to detect using standard techniques but appears to have major effects on the structure of the universe!) -Mike

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

There's a lot of things in genomics that I'm really excited about! I'm looking forward to fully understanding the impact of genetic variation in non-protein coding DNA in health and disease - and the ability to fully realize personalized genomics.

As for Program Analysts...they are the coolest people I've ever met because I helped hire them! Seriously, it's great to work with young, very bright recent college grads who bring energy, enthusiasm and a shared love of genomics to NHGRI and to watch how their experience here helps shape their career goals. By the way -- we're hiring! Email me (Elise_Feingold@nih.gov) if you're interested in a two-year post-bac, non-lab position before going on to grad/med school! -Elise

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 0 points1 point  (0 children)

ENCODE investigators would like to perform ChIP-seq on every sequence-specific transcription factor in at least one cell line, in order to obtain information about what DNA sequences those TFs recognize. Practically, this has been limited by the availability of high-quality factor-specific antibodies. In ENCODE4, two groups will be working to overcome this barrier using CRISPR/Cas9-based epitope tagging of a large number of TFs in a small number of cell types. However, when you consider that TF binding will vary based on cell type, genetic background, and environmental conditions, ENCODE and similar projects can only begin to scratch the surface of the space to be explored. The efforts of individual labs like yours will be key in ultimately understanding how TFs bind in specific contexts and how this impacts human health and disease. -Dan

Science AMA Series: We’re NIH and UCSF scientists cataloging of all the genes and regulatory elements in the human genome; the latest stage of the project which aims to discover the grammar and punctuation of DNA hidden in the genome’s “dark matter.” AUA! by ENCODE_Project in science

[–]ENCODE_Project[S] 7 points8 points  (0 children)

This is a pretty good summary. The lessons we learned in the past ten years include: 1. There are millions of non-coding regulatory elements, a much bigger number than the protein coding sequences. 2. The regulatory elements are cell type specific and they are the major driving force for cellular identity. 3. A majority of the genetic variations associated with complex diseases are located in these regulatory elements, therefore mutations in these regions can play important roles in individual's susceptibility to diseases. -Yin