Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Mining Software Repositories
  4. 2019
  1. Home
  2. Conferences
  3. Mining Software Repositories
  4. 2019
Showing papers presented at "Mining Software Repositories in 2019"
Proceedings Article•10.1109/MSR.2019.00077•
A large-scale study about quality and reproducibility of jupyter notebooks

[...]

João Felipe Pimentel1, Leonardo Murta1, Vanessa Braganholo1, Juliana Freire2•
Federal Fluminense University1, New York University2
26 May 2019
TL;DR: To understand good and bad practices used in the development of real notebooks, 1.4 million notebooks from GitHub are studied and a detailed analysis of their characteristics that impact reproducibility is presented.
Abstract: Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.

253 citations

Proceedings Article•10.1109/MSR.2019.00016•
DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction

[...]

Thong Hoang1, Hoa Khanh Dam2, Yasutaka Kamei3, David Lo1, Naoyasu Ubayashi3 •
Singapore Management University1, University of Wollongong2, Kyushu University3
26 May 2019
TL;DR: This paper proposes an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects.
Abstract: Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction - aka. Just-In-Time (JIT) defect prediction - has been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36--11.02% for the project QT and 9.51--13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC).

248 citations

Proceedings Article•10.1109/MSR.2019.00017•
Lessons learned from using a deep tree-based model for software defect prediction in practice

[...]

Hoa Khanh Dam1, Trang Pham2, Shien Wee Ng1, Truyen Tran2, John Grundy3, Aditya Ghose1, Taeksu Kim4, Chul-Joo Kim4 •
University of Wollongong1, Deakin University2, Monash University3, Samsung4
26 May 2019
TL;DR: This paper reports on the experience of deploying a new deep learning tree-based defect prediction model built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code.
Abstract: Defects are common in software systems and cause many problems for software users. Different methods have been developed to make early prediction about the most likely defective modules in large codebases. Most focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and multiple levels of semantics of source code, a potentially important capability for building accurate prediction models. In this paper, we report on our experience of deploying a new deep learning tree-based defect prediction model in practice. This model is built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. We discuss a number of lessons learned from developing the model and evaluating it on two datasets, one from open source projects contributed by our industry partner Samsung and the other from the public PROMISE repository.

144 citations

Proceedings Article•10.1109/MSR.2019.00064•
A manually-curated dataset of fixes to vulnerabilities of open-source software

[...]

Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, Cedric Dangremont 
26 May 2019
TL;DR: In this paper, the authors present a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them, out of which 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46 which do have such identifier assigned by a numbering authority, are not available in the NVD yet.
Abstract: Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.

107 citations

Proceedings Article•10.1109/MSR.2019.00061•
Dependency versioning in the wild

[...]

Jens Dietrich1, David J. Pearce1, Jacob Stringer2, Amjed Tahir2, Kelly Blincoe3 •
Victoria University of Wellington1, Massey University2, University of Auckland3
26 May 2019
TL;DR: There is no evidence that projects switch to semantic versioning on a large scale, and it is found that many package managers support — and the respective community adapts — flexible versioning practices, but this does not always work.
Abstract: Many modern software systems are built on top of existing packages (modules, components, libraries). The increasing number and complexity of dependencies has given rise to automated dependency management where package managers resolve symbolic dependencies against a central repository. When declaring dependencies, developers face various choices, such as whether or not to declare a fixed version or a range of versions. The former results in runtime behaviour that is easier to predict, whilst the latter enables flexibility in resolution that can, for example, prevent different versions of the same package being included and facilitates the automated deployment of bug fixes. We study the choices developers make across 17 different package managers, investigating over 70 million dependencies. This is complemented by a survey of 170 developers. We find that many package managers support --- and the respective community adapts --- flexible versioning practices. This does not always work: developers struggle to find the sweet spot between the predictability of fixed version dependencies, and the agility of flexible ones, and depending on their experience, adjust practices. We see some uptake of semantic versioning in some package managers, supported by tools. However, there is no evidence that projects switch to semantic versioning on a large scale. The results of this study can guide further research into better practices for automated dependency management, and aid the adaptation of semantic versioning.

94 citations

Proceedings Article•10.1109/MSR.2019.00067•
RmvDroid: towards a reliable Android malware dataset with app metadata

[...]

Haoyu Wang1, Junjun Si, Hao Li, Yao Guo2•
Beijing University of Posts and Telecommunications1, Peking University2
26 May 2019
TL;DR: This paper has created a reliable Android malware dataset containing 9,133 samples that belong to 56 malware families with high confidence that will boost a series of research studies including Android malware detection and classification, mining apps for anomalies, and app store mining, etc.
Abstract: A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware classifiers and evaluate the performance of different detection techniques. Although several Android malware benchmarks have been widely used in our research community, these benchmarks face several major limitations. First, most of the existing datasets are outdated and cannot reflect current malware evolution trends. Second, most of them only rely on VirusTotal to label the ground truth of malware, while some anti-virus engines on VirusTotal may not always report reliable results. Third, all of them only contain the apps themselves (apks), while other important app information (e.g., app description, user rating, and app installs) is missing, which greatly limits the usage scenarios of these datasets. In this paper, we have created a reliable Android malware dataset based on Google Play's app maintenance results over several years. We first created four snapshots of Google Play in 2014, 2015, 2017 and 2018 respectively. Then we use VirusTotal to label apps with possible sensitive behaviors, and monitor these apps on Google Play to see whether Google has removed them or not. Based on this approach, we have created a malware dataset containing 9,133 samples that belong to 56 malware families with high confidence. We believe this dataset will boost a series of research studies including Android malware detection and classification, mining apps for anomalies, and app store mining, etc.

82 citations

Proceedings Article•10.1109/MSR.2019.00056•
Generating commit messages from diffs using pointer-generator network

[...]

Qin Liu1, Zihe Liu1, Hongming Zhu1, Hongfei Fan1, Bowen Du2, Yu Qian1 •
Tongji University1, University of Warwick2
26 May 2019
TL;DR: PtrGNCMsg, a novel approach which is based on an improved sequence-to-sequence model with the pointer-generator network to translate code diffs into commit messages outperforms recent approaches based on neural machine translation, and first enables the prediction of OOV words.
Abstract: The commit messages in source code repositories are valuable but not easy to be generated manually in time for tracking issues, reporting bugs, and understanding codes. Recently published works indicated that the deep neural machine translation approaches have drawn considerable attentions on automatic generation of commit messages. However, they could not deal with out-of-vocabulary (OOV) words, which are essential context-specific identifiers such as class names and method names in code diffs. In this paper, we propose PtrGNCMsg, a novel approach which is based on an improved sequence-to-sequence model with the pointer-generator network to translate code diffs into commit messages. By searching the smallest identifier set with the highest probability, PtrGNCMsg outperforms recent approaches based on neural machine translation, and first enables the prediction of OOV words. The experimental results based on the corpus of diffs and manual commit messages from the top 2,000 Java projects in GitHub show that PtrGNCMsg outperforms the state-of-the-art approach with improved BLEU by 1.02, ROUGE-1 by 4.00 and ROUGE-L by 3.78, respectively.

78 citations

Proceedings Article•10.1109/MSR.2019.00075•
Exploratory study of slack Q&A chats as a mining source for software engineering tools

[...]

Preetha Chatterjee1, Kostadin Damevski2, Lori Pollock1, Vinay Augustine, Nicholas A. Kraft •
University of Delaware1, Virginia Commonwealth University2
26 May 2019
TL;DR: The study is designed to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow, and the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.
Abstract: Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential use-fulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.

73 citations

Proceedings Article•10.1109/MSR.2019.00078•
Cross-language clone detection by learning over abstract syntax trees

[...]

Daniel Perez1, Shigeru Chiba2•
Imperial College London1, University of Tokyo2
26 May 2019
TL;DR: This paper presents a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax, using an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones.
Abstract: Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset --- which is to the best of our knowledge the first of its kind --- containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.

66 citations

Proceedings Article•10.1109/MSR.2019.00052•
What do developers know about machine learning: a study of ML discussions on StackOverflow

[...]

Abdul Ali Bangash1, Hareem Sahar1, Shaiful Alam Chowdhury1, Alexander William Wong1, Abram Hindle1, Karim Ali1 •
University of Alberta1
26 May 2019
TL;DR: It is found that topic generation with Latent Dirichlet Allocation can suggest more appropriate tags that can make a machine learning post more visible and thus can help in receiving immediate feedback from sites like SO.
Abstract: Machine learning, a branch of Artificial Intelligence, is now popular in software engineering community and is successfully used for problems like bug prediction, and software development effort estimation. Developers' understanding of machine learning, however, is not clear, and we require investigation to understand what educators should focus on, and how different online programming discussion communities can be more helpful. We conduct a study on Stack Overflow (SO) machine learning related posts using the SOTorrent dataset. We found that some machine learning topics are significantly more discussed than others, and others need more attention. We also found that topic generation with Latent Dirichlet Allocation (LDA) can suggest more appropriate tags that can make a machine learning post more visible and thus can help in receiving immediate feedback from sites like SO.

58 citations

Proceedings Article•10.1109/MSR.2019.00022•
Predicting good configurations for GitHub and stack overflow topic models

[...]

Christoph Treude1, Markus Wagner1•
University of Adelaide1
26 May 2019
TL;DR: In this article, a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, an a-posteriori characterisation of text corpus related to eight programming languages, and an analysis of corpus feature importance via per-corpus LDA configuration.
Abstract: Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories.
Proceedings Article•10.1109/MSR.2019.00063•
Automated software vulnerability assessment with concept drift

[...]

Triet H. M. Le1, Bushra Sabir1, M. Ali Babar1•
University of Adelaide1
26 May 2019
TL;DR: The proposed systematic approach can effectively tackle the concept drift issue of the SVs' descriptions reported from 2000 to 2018 in NVD even without retraining the model and performs competitively compared to the existing word-only method.
Abstract: Software Engineering researchers are increasingly using Natural Language Processing (NLP) techniques to automate Software Vulnerabilities (SVs) assessment using the descriptions in public repositories. However, the existing NLP-based approaches suffer from concept drift. This problem is caused by a lack of proper treatment of new (out-of-vocabulary) terms for the evaluation of unseen SVs over time. To perform automated SVs assessment with concept drift using SVs' descriptions, we propose a systematic approach that combines both character and word features. The proposed approach is used to predict seven Vulnerability Characteristics (VCs). The optimal model of each VC is selected using our customized time-based cross-validation method from a list of eight NLP representations and six well-known Machine Learning models. We have used the proposed approach to conduct large-scale experiments on more than 100,000 SVs in the National Vulnerability Database (NVD). The results show that our approach can effectively tackle the concept drift issue of the SVs' descriptions reported from 2000 to 2018 in NVD even without retraining the model. In addition, our approach performs competitively compared to the existing word-only method. We also investigate how to build compact concept-drift-aware models with much fewer features and give some recommendations on the choice of classifiers and NLP representations for SVs assessment.
Proceedings Article•10.1109/MSR.2019.00055•
Data-driven solutions to detect API compatibility issues in Android: an empirical study

[...]

Simone Scalabrino1, Gabriele Bavota2, Mario Linares-Vasquez3, Michele Lanza2, Rocco Oliveto1 •
University of Molise1, University of Lugano2, University of Los Andes3
26 May 2019
TL;DR: ACRYL learns from changes implemented in other apps in response to API changes, which allows not only to detect compatibility issues, but also to suggest a fix, and points to the future possibility of combining the two approaches, trying to learn detection/fixing rules on both the API and the client side.
Abstract: Android apps are inextricably linked to the official Android APIs. Such a strong form of dependency implies that changes introduced in new versions of the Android APIs can severely impact the apps' code, for example because of deprecated or removed APIs. In reaction to those changes, mobile app developers are expected to adapt their code and avoid compatibility issues. To support developers, approaches have been proposed to automatically identify API compatibility issues in Android apps. The state-of-the-art approach, named CiD, is a data-driven solution learning how to detect those issues by analyzing the changes in the history of Android APIs ("API side" learning). While it can successfully identify compatibility issues, it cannot recommend coding solutions. We devised an alternative data-driven approach, named ACRyL. ACRyL learns from changes implemented in other apps in response to API changes ("client side" learning). This allows not only to detect compatibility issues, but also to suggest a fix. When empirically comparing the two tools, we found that there is no clear winner, since the two approaches are highly complementary, in that they identify almost disjointed sets of API compatibility issues. Our results point to the future possibility of combining the two approaches, trying to learn detection/fixing rules on both the API and the client side.
Proceedings Article•10.1109/MSR.2019.00065•
Negative results on mining crypto-API usage rules in Android apps

[...]

Jun Gao1, Pingfan Kong1, Li Li2, Tegawendé F. Bissyandé1, Jacques Klein1 •
University of Luxembourg1, Monash University2
26 May 2019
TL;DR: This work proposes to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules, and yields negative results on the assumption that API usage updates tend to correct misuses.
Abstract: Android app developers recurrently use crypto-APIs to provide data security to app users. Unfortunately, misuse of APIs only creates an illusion of security and even exposes apps to systematic attacks. It is thus necessary to provide developers with a statically-enforceable list of specifications of crypto-API usage rules. On the one hand, such rules cannot be manually written as the process does not scale to all available APIs. On the other hand, a classical mining approach based on common usage patterns is not relevant in Android, given that a large share of usages include mistakes. In this work, building on the assumption that "developers update API usage instances to fix misuses", we propose to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules. Eventually, our investigations yield negative results on our assumption that API usage updates tend to correct misuses. Actually, it appears that updates that fix misuses may be unintentional: the same misuses patterns are quickly re-introduced by subsequent updates.
Proceedings Article•10.1109/MSR.2019.00053•
Investigating next steps in static API-misuse detection

[...]

Sven Amann, Hoan Anh Nguyen1, Sarah Nadi2, Tien N. Nguyen3, Mira Mezini4 •
Amazon.com1, University of Alberta2, University of Texas at Dallas3, Technische Universität Darmstadt4
26 May 2019
TL;DR: MUDetect is designed, an API-misuse detector that builds on the strengths of existing detectors and tries to mitigate their weaknesses, and uses a new graph representation of API usages that captures different types of API misuses and a systematically designed ranking strategy that effectively improves precision.
Abstract: Application Programming Interfaces (APIs) often impose constraints such as call order or preconditions. API misuses, i.e., usages violating these constraints, may cause software crashes, data-loss, and vulnerabilities. Researchers developed several approaches to detect API misuses, typically still resulting in low recall and precision. In this work, we investigate ways to improve API-misuse detection. We design MuDetect, an API-misuse detector that builds on the strengths of existing detectors and tries to mitigate their weaknesses. MuDetect uses a new graph representation of API usages that captures different types of API misuses and a systematically designed ranking strategy that effectively improves precision. Evaluation shows that MuDetect identifies real-world API misuses with twice the recall of previous detectors and 2.5x higher precision. It even achieves almost 4x higher precision and recall, when mining patterns across projects, rather than from only the target project.
Proceedings Article•10.1109/MSR.2019.00033•
Recommending energy-efficient Java collections

[...]

Wellington Oliveira1, Renato Oliveira1, Fernando Castor1, Benito Fernandes1, Gustavo Pinto2 •
Federal University of Pernambuco1, Federal University of Pará2
26 May 2019
TL;DR: This work proposes an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections.
Abstract: Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.
Proceedings Article•10.1109/MSR.2019.00072•
Assessing diffusion and perception of test smells in scala projects

[...]

Jonas De Bleser1, Dario Di Nucci1, Coen De Roover1•
Vrije Universiteit Brussel1
26 May 2019
TL;DR: Two empirical studies conducted for the combination of Scala and ScalaTest show that test smells have a low diffusion across test classes, that the most frequently occurring test smells are Lazy Test, Eager Test, and Assertion Roulette, and that many developers were able to perceive but not to identify the smells.
Abstract: Test smells are, analogously to code smells, defined as the characteristics exhibited by poorly designed unit tests. Their negative impact on test effectiveness, understanding, and maintenance has been demonstrated by several empirical studies. However, the scope of these studies has been limited mostly to JAVA in combination with the JUNIT testing framework. Results for other language and framework combinations are ---despite their prevalence in practice--- few and far between, which might skew our understanding of test smells. The combination of Scala and ScalaTest, for instance, offers more comprehensive means for defining and reusing test fixtures, thereby possibly reducing the diffusion and perception of fixture-related test smells. This paper therefore reports on two empirical studies conducted for this combination. In the first study, we analyse the tests of 164 open-source Scala projects hosted on GitHub for the diffusion of test smells. This required the transposition of their original definition to this new context, and the implementation of a tool (SoCRATES) for their automated detection. In the second study, we assess the perception and the ability of 14 Scala developers to identify test smells. For this context, our results show (i) that test smells have a low diffusion across test classes, (ii) that the most frequently occurring test smells are Lazy Test, Eager Test, and Assertion Roulette, and (iii) that many developers were able to perceive but not to identify the smells.
Proceedings Article•10.1109/MSR.2019.00060•
The maven dependency graph: a temporal graph-based representation of maven central

[...]

Amine Benelallam1, Nicolas Harrand2, César Soto-Valero2, Benoit Baudry2, Olivier Barais1 •
University of Rennes1, Royal Institute of Technology2
26 May 2019
TL;DR: The Maven Dependency Graph as discussed by the authors is a dataset of 2.8M artifacts from the Maven Central Repository with metadata such as exact version, date of upload and list of dependencies towards other artifacts.
Abstract: The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.
Proceedings Article•10.1109/MSR.2019.00059•
The emergence of software diversity in maven central

[...]

César Soto-Valero1, Amine Benelallam2, Nicolas Harrand1, Olivier Barais2, Benoit Baudry1 •
Royal Institute of Technology1, University of Rennes2
26 May 2019
TL;DR: In this paper, the authors hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central.
Abstract: Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.
Proceedings Article•10.1109/MSR.2019.00086•
Boa meets python: a boa dataset of data science software in python language

[...]

Sumon Biswas1, Johirul Islam1, Yijia Huang1, Hridesh Rajan1•
Iowa State University1
26 May 2019
TL;DR: A new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks, which use a diverse set of machine learning libraries and managed by a variety of users and organizations is created.
Abstract: The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks. By analyzing the metadata and code, we have included the projects in our dataset which use a diverse set of machine learning libraries and managed by a variety of users and organizations. The dataset is made publicly available through Boa infrastructure both as a collection of raw projects as well as in a processed form that could be used for performing large scale analysis using Boa language. We also present two initial applications to demonstrate the potential of the dataset that could be leveraged by the community.
Proceedings Article•10.1109/MSR.2019.00020•
Exploring word embedding techniques to improve sentiment analysis of software engineering texts

[...]

Eeshita Biswas1, K. Vijay-Shanker1, Lori Pollock1•
University of Delaware1
26 May 2019
TL;DR: The impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution are investigated.
Abstract: Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.
Proceedings Article•10.1109/MSR.2019.00040•
Snakes in paradise?: insecure python-related coding practices in stack overflow

[...]

Akond Rahman1, Effat Farhana1, Nasif Imtiaz1•
North Carolina State University1
26 May 2019
TL;DR: An empirical study using 529,054 code blocks collected from Python-related 44,966 answers posted on Stack Overflow finds user reputation not to relate with the presence of insecure code blocks, suggesting that both high and low-reputed users are likely to introduce secure code blocks.
Abstract: Despite being the most popular question and answer website for software developers, answers posted on Stack Overflow (SO) are susceptible to contain Python-related insecure coding practices. A systematic analysis on how frequently insecure coding practices appear in SO answers can help the SO community assess the prevalence of insecure Python code blocks in SO. An insecure coding practice is recurrent use of insecure coding patterns in Python. We conduct an empirical study using 529,054 code blocks collected from Python-related 44,966 answers posted on SO. We observe 7.1% of the 44,966 Python-related answers to include at least one insecure coding practice. The most frequently occurring insecure coding practice is code injection. We observe 9.8% of the 7,444 accepted answers to include at least one insecure code block. We also find user reputation not to relate with the presence of insecure code blocks, suggesting that both high and low-reputed users are likely to introduce insecure code blocks.
Proceedings Article•10.1109/MSR.2019.00012•
SCOR: source code retrieval with semantics and order

[...]

Shayan A. Akbar1, Avinash C. Kak1•
Purdue University1
26 May 2019
TL;DR: This work demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRf to representations based on term and term-term frequencies.
Abstract: Word embeddings produced by the word2vec algorithm provide us with a strong mechanism to discover relationships between the words based on the degree to which they are contextually related to one another. In and of itself, algorithms like word2vec do not give us a mechanism to impose ordering constraints on the embedded word representations. Our main goal in this paper is to exploit the semantic word vectors obtained from word2vec in such a way that allows for the ordering constraints to be invoked on them when comparing a sequence of words in a query with a sequence of words in a file for source code retrieval. These ordering constraints employ the logic of Markov Random Fields (MRF), a framework used previously to enhance the precision of the source-code retrieval engines based on the Bag-of-Words (BoW) assumption. The work we present here demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRF to representations based on term and term-term frequencies. The performance improvement was 30% for the Java AspectJ repository using only the titles of the bug reports provided by iBUGS, and 6% for the case of the Eclipse repository using titles as well as descriptions of the bug reports provided by BUGLinks.
Proceedings Article•10.1109/MSR.2019.00074•
Can issues reported at stack overflow questions be reproduced?: an exploratory study

[...]

Saikat Mondal1, Mohammad Masudur Rahman1, Chanchal K. Roy1•
University of Saskatchewan1
26 May 2019
TL;DR: An exploratory study on the reproducibility of the issues discussed in 400 questions of Stack Overflow finds that 68% of the code segments require minor and major modifications in order to reproduce the issues reported by the developers.
Abstract: Software developers often look for solutions to their code level problems at Stack Overflow. Hence, they frequently submit their questions with sample code segments and issue descriptions. Unfortunately, it is not always possible to reproduce their reported issues from such code segments. This phenomenon might prevent their questions from getting prompt and appropriate solutions. In this paper, we report an exploratory study on the reproducibility of the issues discussed in 400 questions of Stack Overflow. In particular, we parse, compile, execute and even carefully examine the code segments from these questions, spent a total of 200 man hours, and then attempt to reproduce their programming issues. The outcomes of our study are two-fold. First, we find that 68% of the code segments require minor and major modifications in order to reproduce the issues reported by the developers. On the contrary, 22% code segments completely fail to reproduce the issues. We also carefully investigate why these issues could not be reproduced and then provide evidence-based guidelines for writing effective code examples for Stack Overflow questions. Second, we investigate the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues.
Proceedings Article•10.1109/MSR.2019.00039•
Mining rule violations in JavaScript code snippets

[...]

Uriel Ferreira Campos1, Guilherme Smethurst1, João Pedro Moraes1, Rodrigo Bonifácio2, Gustavo Pinto1 •
Federal University of Pará1, University of Brasília2
26 May 2019
TL;DR: It is discovered that there is no single JavaScript code snippet without a rule violation, and rules related to stylistic issues are by far the most violated ones.
Abstract: Programming code snippets readily available on platforms such as StackOverflow are undoubtedly useful for software engineers. Unfortunately, these code snippets might contain issues such as deprecated, misused, or even buggy code. These issues could pass unattended, if developers do not have adequate knowledge, time, or tool support to catch them. In this work we expand the understanding of such issues (or the so called "violations") hidden in code snippets written in JavaScript, the programming language with the highest number of questions on StackOverflow. To characterize the violations, we extracted 336k code snippets from answers to JavaScript questions on StackOverflow and statically analyzed them using ESLinter, a JavaScript linter. We discovered that there is no single JavaScript code snippet without a rule violation. On average, our studied code snippets have 11 violations, but we found instances of more than 200 violations. In particular, rules related to stylistic issues are by far the most violated ones (82.9% of the violations pertain to this category). Possible errors, which developers might be more interested in, represent only 0.1% of the violations. Finally, we found a small fraction of code snippets flagged with possible errors being reused on actual GitHub software projects. Indeed, one single code snippet with possible errors was reused 1,261 times.
Proceedings Article•10.1109/MSR.2019.00070•
git2net: mining time-stamped co-editing networks from large git repositories

[...]

Christoph Gote1, Ingo Scholtes2, Frank Schweitzer1•
ETH Zurich1, University of Zurich2
26 May 2019
TL;DR: Git2Net as mentioned in this paper is a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories, where a link signifies that one developer has edited a block of source code originally written by another developer.
Abstract: Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.
Proceedings Article•10.1109/MSR.2019.00019•
Snoring: a noise in defect prediction datasets

[...]

Aalok Ahluwalia1, Davide Falessi1, Massimiliano Di Penta2•
California Polytechnic State University1, University of Sannio2
26 May 2019
TL;DR: The magnitude of the sleeping defects and of the snoring classes are analyzed, on data from 282 releases of six open source projects from the Apache ecosystem, to indicate that on all projects, most of the defects in a project slept for more than 20% of the existing releases.
Abstract: In order to develop and train defect prediction models, researchers rely on datasets in which a defect is often attributed to a release where the defect itself is discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon as "sleeping defects". We call "snoring" the phenomenon where classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this paper we analyze, on data from 282 releases of six open source projects from the Apache ecosystem, the magnitude of the sleeping defects and of the snoring classes. Our results indicate that 1) on all projects, most of the defects in a project slept for more than 20% of the existing releases, and 2) in the majority of the projects the missing rate is more than 25% even if we remove the last 50% of releases.
Proceedings Article•10.1109/MSR.2019.00066•
A dataset of non-functional bugs

[...]

Aida Radu1, Sarah Nadi1•
University of Alberta1
26 May 2019
TL;DR: NFBugs, a data set of 133 non-functional bug fixes collected from 65 open-source projects written in Java and Python, is introduced, which can be used to support code recommender systems focusing on non- functional properties.
Abstract: While several researchers have published bug data sets in the past, there has been less focus on bugs related to non-functional requirements. Non-functional requirements describe the quality attributes of a program. In this work, we introduce NFBugs, a data set of 133 non-functional bug fixes collected from 65 open-source projects written in Java and Python. NFBugs can be used to support code recommender systems focusing on non-functional properties.
Proceedings Article•10.1109/MSR.2019.00028•
On the effectiveness of manual and automatic unit test generation: ten years later

[...]

Domenico Serra1, Giovanni Grano2, Fabio Palomba2, Filomena Ferrucci1, Harald C. Gall2, Alberto Bacchelli2 •
University of Salerno1, University of Zurich2
26 May 2019
TL;DR: This paper revises an initial case study comparing automatic and manually generated test suites using current tools as well as complementing their research method by evaluating these tools' ability in finding regressions.
Abstract: Good unit tests play a paramount role when it comes to foster and evaluate software quality. However, writing effective tests is an extremely costly and time consuming practice. To reduce such a burden for developers, researchers devised ingenious techniques to automatically generate test suite for existing code bases. Nevertheless, how automatically generated test cases fare against manually written ones is an open research question. In 2008, Bacchelli et.al. conducted an initial case study comparing automatic and manually generated test suites. Since in the last ten years we have witnessed a huge amount of work on novel approaches and tools for automatic test generation, in this paper we revise their study using current tools as well as complementing their research method by evaluating these tools' ability in finding regressions. Preprint [\url{https://doi.org/10.5281/zenodo.2595232}], dataset [\url{https://doi.org/10.6084/m9.figshare.7628642}].
Proceedings Article•10.1109/MSR.2019.00082•
Beyond GumTree: a hybrid approach to generate edit scripts

[...]

Junnosuke Matsumoto1, Yoshiki Higo1, Shinji Kusumoto1•
Osaka University1
26 May 2019
TL;DR: This research proposes to generate easier-to-understand ESs by using not only structures of AST but also information of line differences, and confirmed that ESs generated by this methodology are more helpful to understand the differences of source code than GumTree.
Abstract: On development using a version control system, understanding differences of source code is important. Edit scripts (in short, ES) represent differences between two versions of source code. One of the tools generating ESs is GumTree. GumTree takes two versions of source code as input and generates an ES consisting of insert, delete, update and move nodes of abstract syntax tree (in short, AST). However, the accuracy of move and update actions generated by GumTree is insufficient, which makes ESs more difficult to understand. A reason why the accuracy is insufficient is that GumTree generates ESs from only information of AST. Thus, in this research, we propose to generate easier-to-understand ESs by using not only structures of AST but also information of line differences. To evaluate our methodology, we applied it to some open source software, and we confirmed that ESs generated by our methodology are more helpful to understand the differences of source code than GumTree.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve