DeMinify: Neural Variable Name Recovery and Type Inference

Published in ESEC/FSE, 2023

To avoid the exposure of original source code, the variable names deployed in the wild are often replaced by short, meaningless names, thus making the code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as the prediction of missing features in a Graph Convolutional Network–Missing Features. The graph represents both the relations among the variables and the relations among their types, in which the names or types of some nodes are missing. Moreover, DeMinify leverages dual-task learning to propagate the mutual impact between the learning of the variable names and that of their types. We conducted experiments to evaluate DeMinify in both name recovery and type prediction on a Python dataset with 180k methods and a JavaScript (JS) dataset with 322k files. For variable name prediction, in 76.7% and 81.6% of the cases in Python and JS code respectively, DeMinify can predict correctly the variables’ names with a single suggested name. DeMinify relatively improves 15.3%–40.7% and 7.7%–49.7% in top-1 accuracy over the state-of-the-art variable name recovery approaches for Python and JS code, respectively. It also relatively improves 14.5%–51.9% in top-1 accuracy over the existing type prediction approaches. Our experimental results showed that learning of data types helps improve variable name recovery and vice versa.

Recommended citation: Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, Tien Nguyen. 2023. DeMinify: Neural Variable Name Recovery and Type Inference. Accepted by ESEC/FSE 2023.

Recommended citation: Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, Tien Nguyen. 2023. DeMinify: Neural Variable Name Recovery and Type Inference. Accepted by ESEC/FSE 2023.