WA-GNN: accelerating graph neural networks with tensor core optimization

Liu, Yang

WA-GNN: accelerating graph neural networks with tensor core optimization

Files

LiuY2025m-2b.pdf (4.37 MB)

Date

2025

Authors

Liu, Yang

Abstract

Graph Neural Networks (GNNs) have been widely applied in various domains, such as social network classification, biological prediction, and financial fraud detection, among others, offering effective solutions for non-Euclidean problems. A typical GNN consists of two major phases: combination and aggregation. In the combination phase, the original feature vectors are processed by a deep neural network with learnable weights, typically a multi-layer perceptron (MLP), to generate new embeddings. This phase can efficiently utilize Tensor Cores, specialized matrix computation units in modern Graphics Processing Units (GPUs) optimized for high-throughput computation. In contrast, the aggregation phase collects feature data from neighbouring nodes based on the sparse adjacency structure, leading to irregular data access that significantly limits Tensor Core utilization. Consequently, the overall performance of GNNs on GPUs is primarily constrained by the inefficient aggregation phase, where sparse computation patterns hinder hardware utilization. To address this challenges, we propose WA-GNN (Warp-Specialization Accelerated GNNs), a Tensor Core–accelerated framework designed to fully exploit Tensor Core capabilities for GNN inference. Our approach introduces the K-Concat data format to reorganize the adjacency matrix into a Tensor Core-friendly layout. A warp specialization mechanism is designed to optimize the data loading and computation, while a C-allocation strategy is employed to assign warp workloads. These techniques are integrated into three representative GNN models, Graph Convolutional Network (GCN), Graph Isomorphism Network (GIN), and Graph Attention Network (GAT), each implemented with a customized kernel. Experimental results on multiple benchmark datasets demonstrate that WA-GNN achieves an average of 2 × end-to-end speedup over other baselines across datasets for the GCN model, with the performance gap widening as the dataset size increases. For the GIN model, WAGNN delivers comparable performance to Deep Graph Library (DGL) on the H100 GPU. For the GAT model, WA-GNN achieves an average of 3 × speedup across datasets. These results demonstrate WA-GNN effectiveness in leveraging Tensor Cores for GNN workloads, and similar sparse matrices operations.

Description

Thesis is embargoed until December 12, 2026

URI

https://knowledgecommons.lakeheadu.ca/handle/2453/5553

Collections

Electronic Theses and Dissertations from 2009

Full item page

WA-GNN: accelerating graph neural networks with tensor core optimization

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By