image: BingoCGN employs cross-partition message quantization to summarize inter-partition message flow, which eliminates the need for irregular off-chip memory access and utilizes a fine-grained structured strong lottery theory-based training algorithm to improve computational efficiency.
Credit: Institute of Science Tokyo, Japan
BingoCGN, a scalable and efficient graph neural network accelerator that enables inference of real-time, large-scale graphs through graph partitioning, has been developed by researchers at Institute of Science Tokyo, Japan. This breakthrough framework utilizes an innovative cross-partition message quantization technique and a novel training algorithm to significantly reduce memory demands and increase computational and energy efficiency.
Graph neural networks (GNNs) are powerful artificial intelligence (AI) models designed for analyzing complex, unstructured graph data. In such data, entities are represented as nodes and relationships between them are the edges. GNNs have been successfully employed in many real-world applications, including social networks, drug discovery, autonomous driving, and recommendation systems. Despite their potential, achieving real-time, large-scale GNN inference, critical for tasks like autonomous driving, remains challenging.
Large graphs require extensive memory, often overflowing on-chip buffers, which are memory regions integrated into a chip. This forces the system to rely on slower off-chip memory. Since graph data is stored irregularly, this leads to irregular memory access patterns, degrading computational efficiency and increasing energy consumption. One promising solution is graph partitioning, where large graphs are divided into smaller graphs, each assigned its own on-chip buffer. This results in more localized memory access patterns and smaller buffer size requirements as the number of partitions increases. However, this is only partially effective. As the number of partitions grows, the links between the partitions and inter-partition edges grow substantially. This requires increased off-chip memory access, limiting scalability.
To address this issue, a research team led by Associate Professor Daichi Fujiki from Institute of Science Tokyo, Japan, developed a novel, scalable and efficient GNN accelerator called BingoCGN. “BingoCGN employs a new technique called cross-partition message quantification (CMQ) that summarizes inter-partition message flow, eliminating irregular off-chip memory access, and a new training algorithm that significantly boosts computational efficiency,” explains Fujiki. Their findings will be presented at the Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA ’25) from June 21–25, 2025.
CMQ uses a technique called vector quantization, which clusters inter-partition nodes and represents them using points called centroids. Nodes are clustered based on their distance, with each node assigned to its nearest centroid. For a given partition, these centroids replace the inter-partition nodes, effectively compressing node data. The centroids are stored in tables called codebooks, which reside directly in the on-chip buffer. CMQ, therefore, allows inter-partition communication without the need for irregular and costly off-chip memory access. Additionally, since this method requires frequent reading and writing of nodes and centroids to memory, this method employs a hierarchical tree-like structure for codebooks, with parent and child centroids, reducing computation demands while maintaining accuracy.
While CMQ solves the memory bottleneck, it shifts the burden to computation. To counter this, the researchers developed a novel training algorithm based on strong lottery ticket theory. In this method, the GNN is initialized with random weights, generated on-chip using random number generators. Then, unnecessary weights are pruned using a mask, forming a smaller, less dense or sparse sub-network that has comparable accuracy to the full GNN but is significantly more efficient to compute. Further, this method incorporates fine-grained (FG) structured pruning, which uses multiple masks with different levels of sparsity, to construct an even smaller and more efficient sub-network.
“Through these techniques, BingoCGN achieves high-performance GNN inference even on finely partitioned graph data, which was previously considered difficult,” remarks Fujiki. “Our hardware implementation, tested on seven real-world datasets, achieves up to 65-fold speedup and up-to 107-fold increase in energy-efficiency compared to state-of-the-art accelerator FlowGNN.”
This breakthrough opens the door to real-time processing of large-scale graph data, paving the way for diverse real-world applications of GNNs.
***
About Institute of Science Tokyo (Science Tokyo)
Institute of Science Tokyo (Science Tokyo) was established on October 1, 2024, following the merger between Tokyo Medical and Dental University (TMDU) and Tokyo Institute of Technology (Tokyo Tech), with the mission of “Advancing science and human wellbeing to create value for and with society.”
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT
Article Publication Date
20-Jun-2025
COI Statement
The authors have no conflict of interest to declare.