News Release

Accelerated streaming subgraph matching framework is faster, more robust, and scalable

Innovative framework offers better caching and performance for high-volume queries

Peer-Reviewed Publication

Intelligent Computing

Framework Overview

image: 

The new method works in two phases: offline and online. In the offline phase, data graph embeddings are created and stored in a vector database with an index for fast search. In the online phase, streaming query graphs are embedded and matched through k-nearest neighbor (KNN) search. The inverted file (IVF) and product quantization (PQ) indexing methods speed up similarity search and improve scalability.

view more 

Credit: Liuyi Chen et al.

Graphs are widely used to represent complex relationships in everyday applications such as social networks, bioinformatics, and recommendation systems, where they model how people or things (nodes) are connected through interactions (edges). Subgraph matching—the task of finding a smaller pattern, or query subgraph, within a larger graph—is crucial for detecting fraud, recognizing patterns, and performing semantic searches. However, current research on streaming subgraph, a similar task where timing is important, matching faces major challenges in scalability and latency, including difficulties in handling large graphs, low cache efficiency, limited query result reuse, and slow indexing performance. To address these issues, Liuyi Chen et al. presented a new framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results while demonstrating robustness and consistency across varying batch sizes and datasets. Their work was published in Intelligent Computing, a Science Partner Journal, under the title “Accelerating Streaming Subgraph Matching via Vector Databases”.

The results show that the new method reduces processing time by an average of 87.7% compared to the state-of-the-art approach, which is the baseline “efficient streaming” method by Duong CT et al., and achieves cache hit rates between 70% and 90%. The framework demonstrated linear scalability, with search times increasing from 40 ms to 176 ms as batch sizes grew from 5,000 to 20,000. This shows that the new method can handle high query loads efficiently. The authors conducted experiments on six real-world datasets: WordNet, Human, Yeast, CiteSeer, Cora, and PubMed. These datasets are in the areas of semantic graphs, biological networks, and social network graphs, confirming the new framework’s suitability for real-world applications.

Streaming subgraph matching requires performing the static subgraph matching task rapidly and repeatedly, as graph queries arrive in a stream. Previous attempts to adapt the static task to a dynamic environment include the “efficient streaming” method. The core change in the new method is the use of indexing of graph embeddings stored in a vector database. Specifically, the database uses inverted file and product quantization indexing to speed up the return of the k-nearest neighbors’ algorithm.

The authors showcase improvements in efficiency and system robustness by selecting evaluation metrics like processing time, cache hit rates, and scalability, and robustness. The authors also used varied batch sizes ranging from 1,000 to 20,000 queries per batch to demonstrate the scalability of their new method. Additionally, the authors adjusted the number of vertices and edges of the input graph and evaluated the impact of periodically updating embeddings to mimic the varied complexity of graph tasks in the real world.

As streaming subgraph matching is increasingly used in fraud detection, cybersecurity, network traffic and user behavior analysis, dynamic social and communication analysis, and bioinformatics and medical research, advancements in its technologies bring more possibilities. In the future, the authors may improve their framework to further decrease processing times and increase its real-world robustness.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.