May 26, 2015, Shenzhen, China - Researchers from BGI reported the most complete haploid-resolved diploid genome (HDG) sequence based on de novo assembly with NGS technology and the pipeline developed lays the foundation for de novo assembly of genomes with high levels of heterozygosity. The latest study was published online today in Nature Biotechnology.
The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. In this study, researchers presented the assembly of a haplotype-resolved diploid genome without using a reference genome. They developed a pipeline that combined fosmid-pooling strategy with whole-genome shotgun strategies, based solely on next generation sequencing (NGS) and hierarchical assembly methods.
In the study, researchers applied the pipeline to sequence the genome of an Asian individual (YH) and generated a 5.15 Gb assembled genome with a haplotype N50 of 484 kb. The analysis provided exhaustive variants information of a diploid genome including intermediate sized heterozygous indels (51~200bp) and novel sequences/genes that are difficult or impossible to detect in previous studies and revealed their impact on genes function.
Haplotype-resolved information for the human genome is essential for understanding the relationship between genotype and phenotype. This HDG genome represented the most complete de novo genome assembly to date, and with other omics data resources available from this individual, the work can be used as a benchmark for developing new sequencing and assembly techniques, and for functional studies involving RNA or protein analysis.
Hongzhi Cao, Principal Investigator of this project at BGI, said, our study revealed the importance of comprehensive genome information in translating genotypes to phenotypes in personalized medicine. Moreover, the method reported here opens a door to assemble complex genomes with high heterozygosity and polyploidy.