image: Framework of the proposed model SIKE for source-incremental missing data recovery
Credit: HIGHER EDUCATON PRESS
Heterogeneous graphs organize data with nodes and edges, and have been widely used in various graph-centric applications. Often, some data are omitted during manual construction, leading to data reduction and performance degeneration on downstream tasks. Existing methods recover the missing data based on the data already within a single graph, neglecting the fact that graphs from different sources share some common nodes due to scope overlap.
To solve the problems, a research team led by Wei Hu published their new research on 15 December 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature. The team concentrated on the missing data recovery task on multi-source heterogeneous graphs under the incremental scenario and designed a novel framework to recover the missing data by fusing multi-source complementary data from previously appeared graphs.
In the research, the team proposes a model, namely SIKE, which is present with a pre-trained language model and graph-specific adapters. To take advantage of the complementary data of multi-source graphs, the team further designs an embedding-based data fusion method to gather data among graphs.
For evaluation, the team builds two new datasets, DWY15Kand CFW, from real-world heterogeneous graphs. The experimental results on these two datasets show the superiority of the proposed model. Compared with the most competitive model EWC, SIKE achieves MRR improvements of 7.79% on DWY15K and 10.25% on CFW. This demonstrates the effectiveness of the proposed model and sheds light on multi-source data fusion for data governance.
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Missing data recovery for heterogeneous graphs with incremental multi-source data fusion
Article Publication Date
15-Dec-2025