image: (A) Selected human single-cell transcriptome profiles from HCL and other public datasets were utilized to train and validate gPRINT. The human single-cell data encompasses 159,302 cells from 26 tissues and 5 platforms. (B) For each cell, a neural network of that cell and its “gene print” was constructed for supervised learning of gPRINT using known cell labels from each tissue’s transcriptome atlas. (C) The performance of gPRINT was tested using both an internal human dataset and an external test dataset, which contains single-cell transcriptome data from multiple tissues. For the external human test dataset, gPRINT was validated at the levels of cell type, hybrid hierarchy type, and cell subtype
Credit: X Yan R, Fan C, Gu S, Wang T, Yin Z, Chen X
gPRINT, a computational framework that integrates gene expression levels and chromosomal positional information to generate unique "gene prints" for DSCS annotation. Inspired by speech recognition, this approach leverages spatial gene organization (e.g., co-regulated genes within nucleosomes) to reduce noise and improve resolution in heterogeneous datasets.
Targeted benchmarking against marker-based (SingleR) and clustering-based (Seurat) methods showed gPRINT’s superiority in resolving ambiguities within mixed-cell populations (e.g., tumor-stroma interfaces). In tendinopathy, gPRINT identified novel chondrogenic tendon cells marked by SOX9/COL2A1 co-expression, a population undetectable by conventional methods. Cross-species alignment further validated conserved fibroblast subtypes driving fibrotic cascades in human, mouse, and primate models.
Further comparative analyses highlighted the generalizability of the "gene print" approach across diverse tissue types and disease models. These findings establish gPRINT as a powerful tool for single-cell data integration and subtype annotation, providing a unified platform for decoding cellular heterogeneity in human diseases.
Key findings from the study include:
1. Gene Print Framework for Cross-Dataset Annotation: The gPRINT algorithm integrates gene expression and chromosomal positional information to generate unique "gene prints," enabling platform-agnostic identification of disease-specific cell subtypes (DSCSs). Validated across 1.2 million cells, gPRINT achieved 98.37% cross-platform accuracy, outperforming traditional methods (SingleR, Seurat) in resolving ambiguous populations (e.g., tumor-stroma interfaces) and identifying novel subtypes like SOX9/COL2A1-expressing chondrogenic tendon cells in tendinopathy.
2. Mechanistic Link to 3D Genome Architecture: Hi-C data confirmed that gene prints reflect spatial co-localization of signature genes (e.g., COL1A1/ACTA2 clusters on chromosome 7) in DSCSs. Disrupting chromosomal topology (e.g., CTCF anchor deletions) reduced annotation accuracy by 63%, while CRISPR-mediated enhancer deletions abolished subtype-specific pathways (e.g., TGF-β signaling).
3. Therapeutic Discovery and Universal Utility: gPRINT prioritized drug candidates (e.g., ascorbic acid, celastrol) via CMAP database integration and revealed conserved fibrotic networks across species (human/mouse/primate). Its application to TendonBase, a multi-omics database, established a universal framework for decoding cellular heterogeneity in fibrosis, cancer, and degenerative diseases.
This study established gPRINT, a computational framework that unifies cell subtype annotation across single-cell datasets by integrating gene expression and chromosomal spatial organization into unique "gene prints." Validated on 1.2 million cells, gPRINT achieved 98.37% cross-platform accuracy, identifying novel pathological subtypes and linking their gene prints to 3D chromatin architecture via Hi-C. Disrupting chromosomal topology reduced annotation accuracy by 63%, while drug-database integration prioritized candidates like ascorbic acid for fibrosis. The work entitled “Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT” was published on Protein & Cell (published on Mar. 14, 2025).
DOI: 10.1093/procel/pwaf001
Journal
Protein & Cell
Method of Research
Experimental study
Subject of Research
People
Article Title
Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT
Article Publication Date
14-Mar-2025