Open AccessPosted Content
Program Classification Using Gated Graph Attention Neural Network for Online Programming Service.
TL;DR: A Graph Neural Network (GNN) based model is proposed, which integrates data flow and function call information to the AST, and an improved GNN model is applied to the integrated graph, so as to achieve the state-of-art program classification accuracy.
read more
Abstract: The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid increasing of source-code repositories, which is difficult to explore manually. The emergence of source-code mining provides a promising way to analyze those source codes, so that those source codes can be relatively easy to understand and share among those service users. Among all the source-code mining attempts,program classification lays a foundation for various tasks related to source-code understanding, because it is impossible for a machine to understand a computer program if it cannot classify the program correctly. Although numerous machine learning models, such as the Natural Language Processing (NLP) based models and the Abstract Syntax Tree (AST) based models, have been proposed to classify computer programs based on their corresponding source codes, the existing works cannot fully characterize the source codes from the perspective of both the syntax and semantic information. To address this problem, we proposed a Graph Neural Network (GNN) based model, which integrates data flow and function call information to the AST,and applies an improved GNN model to the integrated graph, so as to achieve the state-of-art program classification accuracy. The experiment results have shown that the proposed work can classify programs with accuracy over 97%.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 1: An example to illustrate the drawback of the NLPbased method. 
Fig. 2: The function called by the code in Figure 1. 
TABLE I: Data set statistics 
TABLE II: The program classification accuracy of different models on the similar programming tasks. 
Fig. 12: The variation of loss values for the four models along with the iterations. 
Fig. 13: A t-SNE plot of the learned node representations, where different node colors denote different clusters.
Citations
HELoC: Hierarchical Contrastive Learning of Source Code Representation
Xiao Wang,Qiong Wu,Hongyu Zhang,Chen Lyu,Xue Jiang,Zhuoran Zheng,Lei Lyu,Songlin Hu +7 more
- 27 Mar 2022
TL;DR: HELoC, a hierarchical contrastive learning model for source code representation that makes the representation vectors of nodes with greater differences in AST levels farther apart in the embedding space so that the structural similarities between code snippets can be measured more precisely.
29
Heterogeneous tree structure classification to label Java programmers according to their expertise level
TL;DR: A new approach to classify ASTs using traditional supervised-learning algorithms, where a feature learning process selects the most representative syntax patterns for the child subtrees of different syntax constructs are used to enrich the context information of each AST, allowing the classification of compound heterogeneous tree structures.
16
Exploring GNN Based Program Embedding Technologies for Binary Related Tasks
Yixin Guo,Pengcheng Li,Yingwei Luo,Xiaoli Wang,Zhenlin Wang +4 more
- 01 May 2022
TL;DR: This work proposes a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code, and can effectively work around emerging compilation-related problems.
12
GRAPHSPY: Fused Program Semantic Embedding through Graph Neural Networks for Memory Efficiency
Guo Yixin,Pengcheng Li,Yingwei Luo,Xiaolin Wang,Zhenlin Wang +4 more
- 05 Dec 2021
TL;DR: In this paper, a learning-aided approach is proposed to identify unnecessary memory operations, by applying several prevalent graph neural network models to extract program semantics with respect to program structure, execution semantics and dynamic states.
3
Fast selection of compiler optimizations using performance prediction with graph neural networks
Vanderson Martins do Rosario,Anderson Faustino da Silva,André Felipe Zanella,Otávio Oliveira Napoli,Edson Borin +4 more
TL;DR: In this article , the authors proposed a graph neural network (GNN) architecture to predict the performance of applications without executing them quickly, which achieved 91% accuracy in their dataset compared to 79% when using a nongraph-aware architecture.
2
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
•Posted Content
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf,Max Welling +1 more
TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
22.7K
•Proceedings Article
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf,Max Welling +1 more
- 09 Sep 2016
TL;DR: In this paper, a scalable approach for semi-supervised learning on graph-structured data is presented based on an efficient variant of convolutional neural networks which operate directly on graphs.
•Proceedings Article
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot,Yoshua Bengio +1 more
- 31 Mar 2010
TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.