A new methodology that allows for the categorisation and organisation of single-cell data has been launched. It can be used to create a harmonised dataset for the study of human health and disease.
Researchers at the Wellcome Sanger Institute, the University of Cambridge, EMBL’s European Bioinformatics Institute (EMBL-EBI), and collaborators developed the tool, known as CellHint. CellHint uses machine learning to unify data produced across the world, allowing it to be accessed by the wider research community, potentially driving new discoveries.
In a new study, published today (21 December) in Cell, researchers applied CellHint to reveal underexplored connections between healthy and diseased lung cell states. They looked at eight diseases, such as interstitial lung disease and chronic obstructive pulmonary lung disease, and showed the possible benefits of this tool. They also applied CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with around 3.7 million cells.
Cellhint is freely available worldwide and was created as part of the Human Cell Atlas initiative1 which aims to map every cell type in the human body to transform understanding of health and disease.
Single-cell genomics enables the understanding of every cell in the context of the human body at high resolution. Currently, a challenge in assembling the diverse datasets produced by single-cell research is that there is no unified system for naming and organising data.
To address this, researchers from the Wellcome Sanger Institute, and collaborators developed CellHint, which can unify cell types produced by independent laboratories. CellHint then places the data into a defined graph that shows the relationships between cell subtypes, giving a full picture of all the cells identified across different datasets.
The team applied CellHint to current data and revealed underexplored relationships between healthy and diseased lung cell states in eight diseases. It also identified cell types in adult human hippocampus that could be of potential interest for future research.
The researchers also applied CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with around 3.7 million cells. Each cell was annotated, which is the process of labelling cells with particular information. They also showed how it can create various models for automatic cell annotation across human tissues.
Dr Chuan Xu, first author from the Wellcome Sanger Institute, said: “CellHint stands out from other tools because it makes full use of the often inconsistent but valuable cell annotation information from individual studies, to achieve biologically-driven data integration. We are excited that with CellHint, cells from independent laboratories can be re-annotated and researchers can utilise the resulting information to put each cell into different contexts beyond the original study. We hope that this tool will greatly facilitate the reuse of molecular and cellular data and information across laboratories, potentially driving new discoveries in biology.”
Dr Sarah Teichmann, senior author from the Wellcome Sanger Institute and co-founder of the Human Cell Atlas, said: “The Human Cell Atlas is creating detailed reference maps of all cells in the human body to transform our understanding of biology, health and disease, and single-cell technologies underpin this hugely ambitious project. Global collaboration and open data sharing are vital to achieve the aim of a representative Human Cell Atlas that will benefit humanity worldwide. CellHint enables the unification and sharing of single-cell data, which allows the global research community to contribute to and benefit from the ongoing research that is happening around the world, and help drive advances in health and healthcare.”
ENDS
Notes to Editors:
Contact details:
Rachael Smith
Press Office
Wellcome Sanger Institute
Cambridge, CB10 1SA
Email: press.office@sanger.ac.uk
- This study is part of the international Human Cell Atlas (HCA) consortium, which is aiming to map every cell type in the human body as a basis for both understanding human health and for diagnosing, monitoring, and treating disease. An open, scientist-led consortium, the HCA is a collaborative effort of researchers, institutes, and funders worldwide, with more than 3,100 members from 99 countries across the globe. The HCA is likely to impact every aspect of biology and medicine, propelling translational discoveries and applications and ultimately leading to a new era of precision medicine. More information can be found at https://www.humancellatlas.org/
CellHint can be found at https://github.com/Teichlab/cellhint
Publication: C. Xu, M. Prete, S. Webb, et al. (2023) Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell.
Funding: This research is part-funded by Wellcome and the Engineering and Physical Sciences Research Council (EPSRC).
Selected websites:
The Wellcome Sanger Institute
The Wellcome Sanger Institute is a world leader in genomics research. We apply and explore genomic technologies at scale to advance understanding of biology and improve health. Making discoveries not easily made elsewhere, our research delivers insights across health, disease, evolution and pathogen biology. We are open and collaborative; our data, results, tools, technologies and training are freely shared across the globe to advance science.
Funded by Wellcome, we have the freedom to think long-term and push the boundaries of genomics. We take on the challenges of applying our research to the real world, where we aim to bring benefit to people and society.
Find out more at www.sanger.ac.uk or follow us on Twitter, Instagram, Facebook, LinkedIn and on our Blog.
About Wellcome
Wellcome supports science to solve the urgent health challenges facing everyone. We support discovery research into life, health and wellbeing, and we’re taking on three worldwide health challenges: mental health, infectious disease and climate and health. https://wellcome.org/
Journal
Cell
Subject of Research
Cells
Article Title
Automatic cell-type harmonization and integration across Human Cell Atlas datasets
Article Publication Date
21-Dec-2023