Since the completion of the Human Genome Project in 2003, research efforts have been aimed at analyzing the functions of various sequences in the genome, using both experimental and computational strategies. The June issue of Genome Research (www.genome.org) is devoted to The ENCODE (ENCyclopedia Of DNA Elements) Project, whose goal is to characterize all functional elements in the human genome. Genome Research's ENCODE issue includes 25 research papers, which report on the validation of the main results of the pilot project and are essential the community as they scale up to cataloguing functional elements in the whole genome.
In addition, the issue also contains commentary and perspectives on how our views of the genome have changed as a result of the ENCODE investigations. The major findings span the areas of chromatin and replication, gene transcription and regulation, and evolutionary constraint, some of which are highlighted below. The entire issue will be freely available online on June 14 to coordinate with the ENCODE consortium publication in the journal Nature.
1. Pervasive transcription, fewer boundaries
Dr. Alexandre Reymond and colleagues conducted a series of experiments to annotate all 399 protein-coding genes in the ENCODE regions. In doing so, they found that more than half of the genes produced transcripts that contained sequences mapping outside of the known boundaries of these genes. Interestingly, these transcribed sequences often overlapped with other genes, were located a significant distance from the main portion of the coding sequence, and spanned large genomic segments.
"Our results modify our current understanding of the architecture and regulation of protein-coding genes," explains Dr. Reymond. "Furthermore, some sequence polymorphisms hitherto considered to be located in 'non-coding' regions may ultimately be related to disease."
Alexandre Reymond, Ph.D.
University of Lausanne, Switzerland
Denoeud F. et al. 2007. Prominent use of 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 17: 746-759. (doi:10.1101/gr.5660607)
2. More green lights
Promoters, or DNA sequences that bind proteins to engage gene transcription, were the topic of focus for a team of researchers led by Drs. Zhiping Weng and Richard Myers. Using an integrated computational and experimental approach, they estimated that at least 35% more promoters exist in the human genome than are currently annotated.
Interestingly, about one-fourth of the newly identified and validated promoters were located on the antisense strand of annotated transcripts, mostly in terminal exons. The authors speculate that these promoters may regulate transcription that occurs in the reverse direction (antisense) to protein-coding genes.
Future research will be necessary to determine whether these newly identified promoters are alternate transcription start sites of known genes, or whether they represent the first evidence of heretofore unidentified genes.
Zhiping Weng, Ph.D.
Richard M. Myers, Ph.D.
Stanford University School of Medicine
Trinklein, N.D. et al. 2007. Integrated analysis of experimental datasets reveals many novel promoters in 1% of the human genome. Genome Res. 17: 720-731. (doi:10.1101/gr.5716607)
3. Evolving notions of biological function
Dr. Elliott Margulies and colleagues sequenced the ENCODE regions in 23 mammalian species, aligned the sequences, and identified regions of evolutionary constraint (in other words, sequences that have changed little during evolutionary time). They used four different methods to align the sequences, and three different algorithms to assess constraint. The comparison among the different approaches, as well as the newly generated genomic data from the 23 mammalian species, will be valuable resources for the genomics community.
They also determined which evolutionarily constrained regions overlapped with ENCODE experimental annotations. "The signature of conservation was most apparent in protein-coding regions," explained Margulies. "In other regions, the situation was more complex, with different annotations showing different patterns of constraint."
"We were quite surprised by the fact that many experimental annotations had no evidence of mammalian sequence constraint," he said. Their manuscript describes several possibilities for this low correlation, the most intriguing of which suggests that smaller portions of annotated regions than expected are evolutionarily constrained.
Elliott H. Margulies, Ph.D.
National Human Genome Research Institute (NHGRI)
Margulies, E.H. et al. 2007. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17: 760-774. (doi:10.1101/gr.6034307)
4. Genome activity in black and white
By integrating ENCODE experimental data, Dr. John Stamatoyannopoulos and his colleagues developed and employed a computational approach to classify genomic regions as "active" or "repressed." Remarkably, they found that the pattern of active versus repressed domains was strikingly conserved between different cell types, and thus may be a universal feature of human genome architecture.
"For over four decades, we have speculated that human chromosomes are partitioned into discrete functional territories that either facilitate or inhibit gene activity," explains Stamatoyannopoulos. "But for the first time, the availability of the ENCODE data has allowed us to systematically evaluate this concept at high resolution and to produce the first human 'domain map.'"
The methodology developed in this study simplifies the interpretation of large amounts of genomic data and will be employed in future genome-wide analyses aimed at understanding the large-scale functional organization of complex genomes.
John A. Stamatoyannopoulos, M.D.
Depts. of Genome Sciences and Medicine
University of Washington, Seattle
Thurman, R.E. et al. 2007. Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 17: 917-927. (doi:10.1101/gr.6081407)
5. Classified transcripts
The ENCODE Project produced an enormous amount of data on transcriptionally active regions (TARs). Because TARs are difficult to wrangle with, Dr. Mark Gerstein and colleagues constructed the Database of Active Regions and Tools (DART; dart.gersteinlab.org), which is a Web resource for classifying, storing, manipulating, and visualizing TARs.
Using the DART classification system, the scientists categorized 6,988 unannotated TARs based on expression profiles, sequence composition, relatedness to similar sequences from other organisms, and genomic location. Of the new TARs identified, approximately 20% were produced from previously unidentified potential genes. In addition, many of the TARs associated with known genes were found to have the potential to form functional secondary structures.
Mark Gerstein, Ph.D.
Joel Rozowsky, Ph.D.
Rozowsky, J. et al. 2007. The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci. Genome Res. 17: 732-745. (doi:10.1101/gr.5696007)
Please direct requests for pre-print copies of the manuscripts to Peggy Calicchia, the Editorial Secretary for Genome Research (firstname.lastname@example.org; +1-516-422-4012). In addition to the five articles highlighted above, the following will also appear in the issue:
6. Weinstock, G.M. 2007. ENCODE: More genomic empowerment. Genome Res. 17: 667-668. (doi:10.1101/gr.6534207)
7. Gerstein, M. et al. 2007. What is a gene, post-ENCODE: A history culminating in an updated definition. Genome Res. 17: 669-681. (doi:10.1101/gr.6339607)
8. Gingeras, T.R. 2007. Origin of phenotypes: Genes and transcripts. Genome Res. 17: 682-690. (doi:10.1101/gr.6525007)
9. Koch, C.M. et al. 2007. The landscape of histone modifications across 1% of the human genome in five human cell lines. Genome Res. 17: 691-707. (doi:10.1101/gr.5704207)
10. Rada-Iglesias, A. et al. 2007. Butyrate mediates decrease of histone acetylation centered on transcription start sites and down-regulation of associated genes. Genome Res. 17: 708-719. (doi:10.1101/gr.5540007)
11. King, D.C. et al. 2007. Finding cis-regulatory elements using comparative genomics: Some lessons from ENCODE data. Genome Res. 17: 775-786. (doi:10.1101/gr.5592107)
12. Zhang, Z.D. et al. 2007. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 17: 787-797. (doi:10.1101/gr.5573107)
13. Xi, H. et al. 2007. Analysis of overrepresented motifs in human core promoters reveals dual regulatory roles of YY1. Genome Res. 17: 798-806. (doi:10.1101/gr.5754707)
14. Jin, V.X. et al. 2007. Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches. Genome Res. 17: 807-817. (doi:10.1101/gr.6006107)
15. Lin, J.M. et al. 2007. Transcription factor binding and modified histones in human bidirectional promoters. Genome Res. 17: 818-827. (doi:10.1101/gr.5623407)
16. Ruan, Y. et al. 2007. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res. 17: 828-838. (doi:10.1101/gr.6018607)
17. Zheng, D. et al. 2007. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Res. 17: 839-851. (doi:10.1101/gr.5586307)
18. Washietl, S. et al. 2007. Structured RNAs in the ENCODE selected regions of the human genome. Genome Res. 17: 852-864. (doi:10.1101/gr.5650707)
19. Karnani, N. et al. 2007. Pan-S replication patterns and chromosomal domains defined by genome-tiling arrays of ENCODE genomic areas. Genome Res. 17: 865-876. (doi:10.1101/gr.5427007)
20. Giresi, P.G. et al. 2007. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res. 17: 877-885. (doi:10.1101/gr.5533507)
21. Emanuelsson, O. et al. 2007. Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome. Genome Res. 17: 886-897. (doi:10.1101/gr.5014607)
22. Euskirchen, G.M. et al. 2007. Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies. Genome Res. 17: 898-909. (doi:10.1101/gr.5583007)
23. Bhinge, A.A. et al. 2007. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 17: 910-916. (doi:10.1101/gr.5574907)
24. Dennis, J.H. et al. 2007. Independent and complementary methods for large-scale structural analysis of mammalian chromatin. Genome Res. 17: 928-939. (doi:10.1101/gr.5636607)
25. Greenbaum, J.A. et al. 2007. Detection of DNA structural motifs in functional genomic elements. Genome Res. 17: 940-946. (doi:10.1101/gr.5602807)
26. Greenbaum, J.A. et al. 2007. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 17: 947-953. (doi:10.1101/gr.6073107)
27. Elnitski, L.L. et al. 2007. The ENCODEdb portal: Simplified access to ENCODE Consortium data. Genome Res. 17: 954-959. (doi:10.1101/gr.5582207)
28. Blankenberg, D. et al. 2007. A framework for collaboratoive analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Res. 17: 960-964. (doi:10.1101/gr.5578007)
About Genome Research
Genome Research (www.genome.org) is an international, monthly, peer-reviewed journal published by Cold Spring Harbor Laboratory Press. Launched in 1995, it is one of the five most highly cited primary research journals in genetics and genomics.
About Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory Press is an internationally renowned publisher of books, journals, and electronic media located on Long Island, New York. It is a division of Cold Spring Harbor Laboratory, an innovator in life science research and the education of scientists, students, and the public. For more information, visit www.cshlpress.com.