Article Highlight | 27-Dec-2022

Chinese researchers build cell atlas using scattered single-cell datasets

Higher Education Press

Imagine a virtual human body, rich in complexity and detail, that enables scientists to simulate experiments that can’t be conducted in vivo or in vitro. A team of Chinese researchers brought this vision closer to reality by developing a framework for seamless cell-centric data assembly and built the human Ensemble Cell Atlas (hECA) using data collected from scattered public datasets.

They presented their unified informatics framework in a study published April 28 in iScience. A famous bioinformatics scientist, Zemin Zhang from Peking University commented that hECA has made a landmark contribution to integrating human single-cell data from multiple sources and performing downstream analysis, which published in Quantitative Biology on July 4

“Case studies of the hECA demonstrated the revolution that such a cell-centric ensemble cell atlas can bring to biomedical research,” said study author Xuegong Zhang from Tsinghua University.

The rapid development of single-cell sequencing technologies, especially an RNA-sequencing method known as single-cell transcriptomics, has allowed scientists to profile individual cells and examine which genes are switched on in different types of cells.

Scientists around the world are engaged in building single-cell-resolution “atlases” of all the different cell types in projects such as the Human Cell Atlas (HCA) and the Human BioMolecular Atlas Program. But there is still some uncertainty about how a cell atlas should be defined and assembled.

“The key point of cell atlas assembly is the organization of cell information,” Zhang said.

Since the launch of the HCA project in 2017, many papers about cell atlases have been published, and most of them are collections of a large variety of single-cell data documented and indexed on a project-by-project basis. Previous studies argued that cell mapping is about creating a three-dimensional skeleton of the human body and simply assembling the observed cells into their corresponding positions. However, a human body is too complex for this type of assembly.

Instead, “the assembly of a cell atlas should convey the multifaceted nature of the data and allow users to search with customized conditions among different indexing methods,” Zhang said.

In the meantime, massive amounts of single-cell transcriptomic data are pouring into the public domain from multi-institutional collaborations, generating petabytes of data covering all major adult human organs as well as key developmental or pathological stages.

To Zhang’s team, these scattered public single-cell data suggested an alternative approach to building a cell atlas: start from the bottom-up by assembling data from multiple sources.

To assemble data of this scale from multiple sources into an ensemble atlas, the researchers developed
a unified informatics framework, which included a special database infrastructure for storing single-cell data with ultra-high dimensionality and volume, as well as a unified hierarchical annotation framework to make cell type labels from different datasets comparable and consistent. The researchers also designed an application programming interface to efficiently retrieve cells in the atlas.

With these technologies, the team developed three new schemes for applying the assembled atlas. First, they enabled in data cell sorting for selecting cells from the virtual human body of assembled cells using flexible combinations of logic expressions. They created a “quantitative portraiture” system for representing the complete information of genes, cell types, and organs. They also built a customizable reference creation for users to customize their references for cell type annotation tasks.

The researchers conducted a series of experiments to verify and illustrate the quality and usability of the assembled data in multiple application scenarios. Case examples included the investigation of drug off-targets — unintended biological consequences of a drug — throughout the whole body, which demonstrated the power of the ensemble cell atlas to open new possibilities in biomedical research.

According to the study, this type of in data cell sorting can reveal important organ-specific patterns and help scientists determine organs that are more susceptible to side effects of targeted drug therapy.

The researchers have developed strategies and technologies to integrate more high-quality data from other comprehensive datasets and will continue to improve and update future versions of the hECA.

This work was supported by the National Key R&D Program of China and the Tsinghua-Fuzhou Institute of Data Technologies.

Other contributors include Sijie Chen, Yanting Luo, Haoxiang Gao, Fanhong Li, Yixin Chen, Jiaqi Li, Minsheng Hao, Haiyang Bian, Xi Xi, Wenrui Li, Qiuchen Meng, Ziheng Zou, Chen Li, Yangyuan Zhang, Yanfei Cui, Lei Wei, Xiaowo Wang, Hairong Lv, Haochen Li, Kui Hua and Rui Jiang from Tsinghua University; Renke You, Weiyu Li, Mingli Ye and Fufeng Chen from Fuzhou Institute of Data Technology; Hairong Lv is also affiliated with Fuzhou Institute of Data Technology.

###

About Higher Education Press

Founded in May 1954, Higher Education Press Limited Company (HEP), affiliated with the Ministry of Education, is one of the earliest institutions committed to educational publishing after the establishment of P. R. China in 1949. After striving for six decades, HEP has developed into a major comprehensive publisher, with products in various forms and at different levels. Both for import and export, HEP has been striving to fill in the gap of domestic and foreign markets and meet the demand of global customers by collaborating with more than 200 partners throughout the world and selling products and services in 32 languages globally. Now, HEP ranks among China's top publishers in terms of copyright export volume and the world's top 50 largest publishing enterprises in terms of comprehensive strength.

The Frontiers Journals series published by HEP includes 28 English academic journals, covering the largest academic fields in China at present. Among the series, 13 have been indexed by SCI, 6 by EI, 2 by MEDLINE, 1 by A&HCI. HEP's academic monographs have won about 300 different kinds of publishing funds and awards both at home and abroad.

About Quantitative Biology

Quantitative Biology is an interdisciplinary journal that focuses on original research that uses quantitative approaches and technologies to analyze and integrate biological systems, construct and model engineered life systems, and to gain a deeper understanding of life sciences. It aims to provide a platform for not only the analysis but also the integration and construction of biological systems. It is a quarterly journal seeking to provide an inter- and multi-disciplinary forum for a broad blend of peer-reviewed academic papers, in order to promote rapid communications among scientists.

Quantitative Biology is a prestigious and authoritative international journal, abstracted/indexed in: Emerging Science Citation Index (ESCI), SCOPUS, EMBASE, SCImago, CSCD, etc.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.