Top 87 papers presented at Mining Software Repositories in 2016

Showing papers presented at "Mining Software Repositories in 2016"

Proceedings Article•10.1145/2901739.2903508•

AndroZoo: collecting millions of Android apps for the research community

[...]

Kevin Allix¹, Tegawendé F. Bissyandé¹, Jacques Klein¹, Yves Le Traon¹•Institutions (1)

14 May 2016

TL;DR: This work presents a growing collection of Android Applications collected from several sources, including the official GooglePlay app market, which contains more than three million apps that have been analysed by tens of different AntiVirus products to know which applications are detected as Malware.

...read moreread less

Abstract: We present a growing collection of Android Applications col-lected from several sources, including the official GooglePlay app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has beenanalysed by tens of different AntiVirus products to knowwhich applications are detected as Malware. We provide thisdataset to contribute to ongoing research efforts, as well asto enable new potential research topics on Android Apps.By releasing our dataset to the research community, we alsoaim at encouraging our fellow researchers to engage in reproducible experiments.

...read moreread less

855 citations

Proceedings Article•10.1145/2901739.2901742•

A large-scale empirical study on self-admitted technical debt

[...]

Gabriele Bavota¹, Barbara Russo¹•Institutions (1)

Free University of Bozen-Bolzano¹

14 May 2016

TL;DR: This paper runs a study across 159 software projects to investigate the diffusion and evolution of self-admitted technical debt and its relationship with software quality, and shows that self-altered technical debt is diffused and increases over time due to the introduction of new instances that are not fixed by developers.

...read moreread less

Abstract: Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". Examples of technical debt are code smells and bug hazards. Several techniques have been proposed to detect different types of technical debt. Among those, Potdar and Shihab defined heuristics to detect instances of self-admitted technical debt in code comments, and used them to perform an empirical study on five software systems to investigate the phenomenon. Still, very little is known about the diffusion and evolution of technical debt in software projects.This paper presents a differentiated replication of the work by Potdar and Shihab. We run a study across 159 software projects to investigate the diffusion and evolution of self-admitted technical debt and its relationship with software quality. The study required the mining of over 600K commits and 2 Billion comments as well as a qualitative analysis performed via open coding.Our main findings show that self-admitted technical debt (i) is diffused, with an average of 51 instances per system, (ii) is mostly represented by code (30%), defect, and requirement debt (20% each), (iii) increases over time due to the introduction of new instances that are not fixed by developers, and (iv) even when fixed, it survives long time (over 1,000 commits on average) in the system.

...read moreread less

171 citations

Proceedings Article•10.1145/2901739.2903505•

The emotional side of software developers in JIRA

[...]

Marco Ortu¹, Alessandro Murgia², Giuseppe Destefanis³, Parastou Tourani⁴, Roberto Tonelli¹, Michele Marchesi¹, Bram Adams⁴ - Show less +3 more•Institutions (4)

University of Cagliari¹, University of Antwerp², Brunel University London³, École Polytechnique de Montréal⁴

14 May 2016

TL;DR: This paper manually labeled 2,000 issue comments and 4,000 sen-tences written by developers with emotions such as love,joy, surprise, anger, sadness and fear, allowing the investigation of the role of affects in software development.

...read moreread less

Abstract: Issue tracking systems store valuable data for testing hy-potheses concerning maintenance, building statistical pre-diction models and (recently) investigating developer affec-tiveness. For the latter, issue tracking systems can be minedto explore developers emotions, sentiments and politeness, affects for short. However, research on affect detection insoftware artefacts is still in its early stage due to the lack ofmanually validated data and tools.In this paper, we contribute to the research of affectson software artefacts by providing a labeling of emotionspresent on issue comments.We manually labeled 2,000 issue comments and 4,000 sen-tences written by developers with emotions such as love,joy, surprise, anger, sadness and fear. Labeled commentsand sentences are linked to software artefacts reported inour previously published dataset (containing more than 1Kprojects, more than 700K issue reports and more than 2million issue comments). The enriched dataset presented inthis paper allows the investigation of the role of affects insoftware development.

...read moreread less

152 citations

Proceedings Article•10.1145/2901739.2901770•

Mining duplicate questions in stack overflow

[...]

Muhammad Ahasanuzzaman¹, Muhammad Asaduzzaman², Chanchal K. Roy², Kevin A. Schneider²•Institutions (2)

University of Dhaka¹, University of Saskatchewan²

14 May 2016

TL;DR: A manual investigation is performed to understand why users submit duplicate questions in Stack Overflow and a classification technique is proposed that uses a number of carefully chosen features to identify duplicate questions with reasonable accuracy.

...read moreread less

Abstract: Stack Overflow is a popular question answering site that is focused on programming problems. Despite efforts to prevent asking questions that have already been answered, the site contains duplicate questions. This may cause developers to unnecessarily wait for a question to be answered when it has already been asked and answered. The site currently depends on its moderators and users with high reputation to manually mark those questions as duplicates, which not only results in delayed responses but also requires additional efforts. In this paper, we first perform a manual investigation to understand why users submit duplicate questions in Stack Overflow. Based on our manual investigation we propose a classification technique that uses a number of carefully chosen features to identify duplicate questions. Evaluation using a large number of questions shows that our technique can detect duplicate questions with reasonable accuracy. We also compare our technique with DupPredictor, a state-of-the-art technique for detecting duplicate questions, and we found that our proposed technique has a better recall rate than that technique.

...read moreread less

148 citations

Proceedings Article•10.1145/2901739.2901752•

Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity?

[...]

Mika V. Mäntylä¹, Bram Adams², Giuseppe Destefanis³, Daniel Graziotin⁴, Marco Ortu⁵ - Show less +1 more•Institutions (5)

University of Oulu¹, École Polytechnique de Montréal², Brunel University London³, Free University of Bozen-Bolzano⁴, University of Cagliari⁵

14 May 2016

TL;DR: In this article, the authors explored the VAD metrics and their properties on 700,000 Jira issue reports containing over 2,000,000 comments, since issue reports keep track of a developer's progress on addressing bugs or new features.

...read moreread less

Abstract: Similar to other industries, the software engineering domain is plagued by psychological diseases such as burnout, which lead developers to lose interest, exhibit lower activity and/or feel powerless. Prevention is essential for such diseases, which in turn requires early identification of symptoms. The emotional dimensions of Valence, Arousal and Dominance (VAD) are able to derive a person’s interest (attraction), level of activation and perceived level of control for a particular situation from textual communication, such as emails. As an initial step towards identifying symptoms of productivity loss in software engineering, this paper explores the VAD metrics and their properties on 700,000 Jira issue reports containing over 2,000,000 comments, since issue reports keep track of a developer’s progress on addressing bugs or new features. Using a general-purpose lexicon of 14,000 English words with known VAD scores, our results show that issue reports of different type (e.g., Feature Request vs. Bug) have a fair variation of Valence, while increase in issue priority (e.g., from Minor to Critical) typically increases Arousal. Furthermore, we show that as an issue’s resolution time increases, so does the arousal of the individual the issue is assigned to. Finally, the resolution of an issue increases valence, especially for the issue Reporter and for quickly addressed issues. The existence ofsuch relations between VAD and issue report activities shows promise that text mining in the future could offer an alternative way for work health assessment surveys.

...read moreread less

107 citations

Proceedings Article•10.1145/2901739.2903506•

MUBench: a benchmark for API-misuse detectors

[...]

Sven Amann¹, Sarah Nadi¹, Hoan Anh Nguyen², Tien N. Nguyen², Mira Mezini³ - Show less +1 more•Institutions (3)

Technische Universität Darmstadt¹, Iowa State University², Lancaster University³

14 May 2016

TL;DR: With the dataset MuBench, a dataset of 89 API misuses that is collected from 33 real-world projects and a survey, the prevalence of API misused is analyzed, finding that they are rare, but almost always cause crashes.

...read moreread less

Abstract: Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine a class of bugs that we call API misuses. Evaluations on a variety of software products show both the omnipresence of such misuses and the ability of the approaches to detect them. This work presents MuBench, a dataset of 89 API misuses that we collected from 33 real-world projects and a survey. With the dataset we empirically analyze the prevalence of API misuses compared to other types of bugs, finding that they are rare, but almost always cause crashes. Furthermore, we discuss how to use it to benchmark and compare API-misuse detectors.

...read moreread less

105 citations

Proceedings Article•10.1145/2901739.2901776•

Findings from GitHub: methods, datasets and limitations

[...]

Valerio Cosentino¹, Javier Luis Cánovas Izquierdo, Jordi Cabot²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Canadian Real Estate Association²

14 May 2016

TL;DR: A meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported shows some concerns regarding the dataset collection process and size.

...read moreread less

Abstract: GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe it is worthwhile to reflect how research papers have addressed the task of mining GitHub repositories over the last years. In this regard, we present a meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported. Results of our meta-analysis show some concerns regarding the dataset collection process and size, the low level of replicability, poor sampling techniques, lack of longitudinal studies and scarce variety of methodologies.

...read moreread less

104 citations

Proceedings Article•10.1145/2901739.2901767•

From query to usable code: an analysis of stack overflow code snippets

[...]

Di Yang¹, Aftab Hussain¹, Cristina V. Lopes¹•Institutions (1)

University of California, Irvine¹

14 May 2016

TL;DR: In this paper, Stack Overflow code snippets are analyzed across four languages: C\#,Java, JavaScript, and Python, and a qualitative analysis on usable Python snippets show the characteristics of the answers that solve the original question.

...read moreread less

Abstract: Enriched by natural language texts, Stack Overflow code snippets arean invaluable code-centric knowledge base of small units ofsource code. Besides being useful for software developers, theseannotated snippets can potentially serve as the basis for automatedtools that provide working code solutions to specific natural languagequeries. With the goal of developing automated tools with the Stack Overflowsnippets and surrounding text, this paper investigates the followingquestions: (1) How usable are the Stack Overflow code snippets? and(2) When using text search engines for matching on the naturallanguage questions and answers around the snippets, what percentage ofthe top results contain usable code snippets?A total of 3M code snippets are analyzed across four languages: C\#,Java, JavaScript, and Python. Python and JavaScript proved to be thelanguages for which the most code snippets are usable. Conversely,Java and C\# proved to be the languages with the lowest usabilityrate. Further qualitative analysis on usable Python snippets showsthe characteristics of the answers that solve the original question. Finally,we use Google search to investigate the alignment ofusability and the natural language annotations around code snippets, andexplore how to make snippets in Stack Overflow anadequate base for future automatic program generation.

...read moreread less

103 citations

Proceedings Article•10.1145/2901739.2901751•

Using dynamic and contextual features to predict issue lifetime in GitHub projects

[...]

Riivo Kikas¹, Marlon Dumas¹, Dietmar Pfahl¹•Institutions (1)

University of Tartu¹

14 May 2016

TL;DR: This work analyzes issues from more than 4000 GitHub projects and builds models to predict, at different points in an issue's lifetime, whether or not the issue will close within a given calendric period, by combining static, dynamic and contextual features.

...read moreread less

Abstract: Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Previous studies on issue lifetime prediction have focused on models built from static features, meaning features calculated at one snapshot of the issue's lifetime based on data associated to the issue itself. However, during its lifetime, an issue typically receives comments from various stakeholders, which may carry valuable insights into its perceived priority and difficulty and may thus be exploited to update lifetime predictions. Moreover, the lifetime of an issue depends not only on characteristics of the issue itself, but also on the state of the project as a whole. Hence, issue lifetime prediction may benefit from taking into account features capturing the issue's context (contextual features). In this work, we analyze issues from more than 4000 GitHub projects and build models to predict, at different points in an issue's lifetime, whether or not the issue will close within a given calendric period, by combining static, dynamic and contextual features. The results show that dynamic and contextual features complement the predictive power of static ones, particularly for long-term predictions.

...read moreread less

96 citations

Proceedings Article•10.1145/2901739.2901745•

Feature toggles: practitioner practices and a case study

[...]

Tajmilur Rahman¹, Louis-Philippe Querel¹, Peter C. Rigby¹, Bram Adams²•Institutions (2)

Concordia University¹, École Polytechnique de Montréal²

14 May 2016

TL;DR: It is found that toggles can reconcile rapid releases with long-term feature development and allow flexible control over which features to deploy and however they also introduce technical debt and additional maintenance for developers.

...read moreread less

Abstract: Continuous delivery and rapid releases have led to innovative techniques for integrating new features and bug fixes into a new release faster. To reduce the probability of integration conflicts, major software companies, including Google, Facebook and Netflix, use feature toggles to incrementally integrate and test new features instead of integrating the feature only when it’s ready. Even after release, feature toggles allow operations managers to quickly disable a new feature that is behaving erratically or to enable certain features only for certain groups of customers. Since literature on feature toggles is surprisingly slim, this paper tries to understand the prevalence and impact of feature toggles. First, we conducted a quantitative analysis of feature toggle usage across 39 releases of Google Chrome (spanning five years of release history). Then, we studied the technical debt involved with feature toggles by mining a spreadsheet used by Google developers for feature toggle maintenance. Finally, we performed thematic analysis of videos and blog posts of release engineers at major software companies in order to further understand the strengths and drawbacks of feature toggles in practice. We also validated our findings with four Google developers. We find that toggles can reconcile rapid releases with long-term feature development and allow flexible control over which features to deploy. However they also introduce technical debt and additional maintenance for developers.

...read moreread less

84 citations

Proceedings Article•10.1145/2901739.2901774•

Studying the effectiveness of application performance management (APM) tools for detecting performance regressions for web applications: an experience report

[...]

Tarek M. Ahmed¹, Cor-Paul Bezemer¹, Tse-Hsun Chen¹, Ahmed E. Hassan¹, Weiyi Shang² - Show less +1 more•Institutions (2)

Queen's University¹, Concordia University²

14 May 2016

TL;DR: It is found that APM tools can detect most of the injected performance regressions, making them good candidates to detect performance regression in practice.

...read moreread less

Abstract: Performance regressions, such as a higher CPU utilization than in the previous version of an application, are caused by software application updates that negatively affect the performance of an application.Although a plethora of mining software repository research has been done to detect such regressions, research tools are generally not readily available to practitioners. Application Performance Management (APM) tools are commonly used in practice for detecting performance issues in the field by mining operational data.In contrast to performance regression detection tools that assume a changing code base and a stable workload, APM tools mine operational data to detect performance anomalies caused by a changing workload in an otherwise stable code base.Although APM tools are widely used in practice, no research has been done to understand 1) whether APM tools can identify performance regressions caused by code changes and 2) how well these APM tools support diagnosing the root-cause of these regressions.In this paper, we explore if the readily accessible APM tools can help practitioners detect performance regressions. We perform a case study using three commercial (AppDynamics, New Relic and Dynatrace) and one open source (Pinpoint) APM tools. In particular, we examine the effectiveness of leveraging these APM tools in detecting and diagnosing injected performance regressions (excessive memory usage, high CPU utilization and inefficient database queries) in three open source applications. We find that APM tools can detect most of the injected performance regressions, making them good candidates to detect performance regressions in practice. However, there is a gap between mining approaches that are proposed in state-of-the-art performance regression detection research and the ones used by APM tools. In addition, APM tools lack the ability to be extended, which makes it hard to enhance them when exploring novel mining approaches for detecting performance regressions.

...read moreread less

Proceedings Article•10.1145/2901739.2901763•

GreenOracle: estimating software energy consumption with energy measurement corpora

[...]

Shaiful Alam Chowdhury¹, Abram Hindle¹•Institutions (1)

University of Alberta¹

14 May 2016

TL;DR: This paper presents a model that can estimate software energy consumption mostly within 10% error (in joules) and does not require the developer to train on energy measurements of their own applications and applies the model to estimate any foreign application’s energy consumption for a test run.

...read moreread less

Abstract: Software energy consumption is a relatively new concern for mobile application developers. Poor energy performance can harm adoption and sales of applications. Unfortunately for the developers, the measurement of software energy con-sumption is expensive in terms of hardware and difficult in terms of expertise. Many prior models of software energy consumption assume that developers can use hardware instrumentation and thus cannot evaluate software runningwithin emulators or virtual machines. Some prior modelsrequire actual energy measurements from the previous versions of applications in order to model the energy consumption of later versions of the same application.In this paper, we take a big-data approach to software energy consumption and present a model that can estimate software energy consumption mostly within 10% error (in joules) and does not require the developer to train on energy measurements of their own applications. This model leverages a big-data approach whereby a collection of prior applications’ energy measurements allows us to train, trans-mit, and apply the model to estimate any foreign application’s energy consumption for a test run. Our model is based on the dynamic traces of system calls and CPU utilization.

...read moreread less

Proceedings Article•10.1145/2901739.2901777•

Recognizing gender of stack overflow users

[...]

Bin Lin¹, Alexander Serebrenik¹•Institutions (1)

Eindhoven University of Technology¹

14 May 2016

TL;DR: This paper evaluates the applicability of different gender guessing approaches on several datasets derived from Stack Overflow, and suggests that the approaches combining different data sources perform the best.

...read moreread less

Abstract: Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.To better understand the reasons behind this disparity, and off er support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profi le picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online pro le.In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.

...read moreread less

Proceedings Article•10.1145/2901739.2901748•

How android app developers manage power consumption?: an empirical study by mining power management commits

[...]

Lingfeng Bao¹, David Lo², Xin Xia¹, Xinyu Wang¹, Cong Tian³ - Show less +1 more•Institutions (3)

Zhejiang University¹, Singapore Management University², Xidian University³

14 May 2016

TL;DR: An empirical study of power management commits in Android applications reveals that for different kinds of Android application (e.g., Games, Connectivity, Navigation, etc.), the dominant power management activities differ.

...read moreread less

Abstract: As Android platform becomes more and more popular, a large amount of Android applications have been developed. When developers design and implement Android applications, power consumption management is an important factor to consider since it affects the usability of the applications. Thus, it is important to help developers adopt proper strategies to manage power consumption. Interestingly, today, there is a large number of Android application repositories made publicly available in sites such as GitHub. These repositories can be mined to help crystalize common power management activities that developers do. These in turn can be used to help other developers to perform similar tasks to improve their own Android applications.In this paper, we present an empirical study of power management commits in Android applications. Our study extends that of Moura et al. who perform an empirical studyon energy aware commits; however they do not focus on Android applications and only a few of the commits that they study come from Android applications. Android applications are often different from other applications (e.g., those running on a server) due to the issue of limited battery life and the use of specialized APIs. As subjects of our empirical study, we obtain a list of open source Android applications from F-Droid and crawl their commits from Github. We get 468 power management commits after we filter the commits using a set of keywords and by performing manual analysis. These 468 power management commits are from 154 different Android applications and belong to 15 different application categories. Furthermore, we use open card sort to categorize these power management commits and we obtain 6 groups which correspond to different power management activities. Our study also reveals that for different kinds of Android application (e.g., Games, Connectivity, Navigation, etc.), the dominant power management activities differ.For example, the percentageof power management commits belonging to Power Adaptation activity is larger for Navigation applications than those belonging to other categories.

...read moreread less

Proceedings Article•10.1145/2901739.2901757•

Understanding the exception handling strategies of Java libraries: an empirical study

[...]

Demóstenes Sena¹, Roberta Coelho¹, Uirá Kulesza¹, Rodrigo Bonifácio²•Institutions (2)

Federal University of Rio Grande do Norte¹, University of Brasília²

14 May 2016

TL;DR: This study used an existing static analysis tool to reason about exception flows and handler actions of 656 Java libraries selected from 145 categories in the Maven Central Repository and identified a current trend of a high number of undocumented API runtime exceptions.

...read moreread less

Abstract: This paper presents an empirical study whose goal was to investigate the exception handling strategies adopted by Java libraries and their potential impact on the client applications. In this study, exception flow analysis was used in combination with manual inspections in order: (i) to characterize the exception handling strategies of existing Java libraries from the perspective of their users; and (ii) to identify exception handling anti-patterns. We extended an existing static analysis tool to reason about exception flows and handler actions of 656 Java libraries selected from 145 categories in the Maven Central Repository. The study findings suggest a current trend of a high number of undocumented API runtime exceptions (i.e., @throws in Javadoc) and Unintended Handler problem. Moreover, we could also identify a considerable number of occurrences of exception handling anti-patterns (e.g. Catch and Ignore). Finally, we have also analyzed 647 bug issues of the 7 most popular libraries and identified that 20.71% of the reports are defects related to the problems of the exception strategies and anti-patterns identified in our study. The results of this study point to the need of tools to better understand and document the exception handling behavior of libraries.

...read moreread less

Proceedings Article•10.1145/2901739.2903500•

How developers use exception handling in Java

[...]

Muhammad Asaduzzaman¹, Muhammad Ahasanuzzaman², Chanchal K. Roy¹, Kevin A. Schneider¹•Institutions (2)

University of Saskatchewan¹, University of Dhaka²

14 May 2016

TL;DR: This paper uses the Boa language and infrastructure to analyze 274k open source Java projects in GitHub to discover how developers use exception handling and finds that bad exception handling coding practices are common in open sourceJava projects and regardless of experience all developers use bad exception Handling coding practices.

...read moreread less

Abstract: Exception handling is a technique that addresses exceptional conditions in applications, allowing the normal flow of execution to continue in the event of an exception and/or to report on such events. Although exception handling techniques, features and bad coding practices have been discussed both in developer communities and in the literature, there is a marked lack of empirical evidence on how developers use exception handling in practice. In this paper we use the Boa language and infrastructure to analyze 274k open source Java projects in GitHub to discover how developers use exception handling. We not only consider various exception handling features but also explore bad coding practices and their relation to the experience of developers. Our results provide some interesting insights. For example, we found that bad exception handling coding practices are common in open source Java projects and regardless of experience all developers use bad exception handling coding practices.

...read moreread less

Proceedings Article•10.1145/2901739.2903496•

The relationship between commit message detail and defect proneness in Java projects on GitHub

[...]

Jacob G. Barnett¹, Charles K. Gathuru¹, Luke S. Soldano¹, Shane McIntosh¹•Institutions (1)

McGill University¹

14 May 2016

TL;DR: This paper analyzes the relationship between the defect proneness of commits and commit message volume and finds that JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.96 and 0.21.

...read moreread less

Abstract: Just-In-Time (JIT) defect prediction models aim to predict the commits that will introduce defects in the future. Traditionally, JIT defect prediction models are trained using metrics that are primarily derived from aspects of the code change itself (e.g., the size of the change, the author’s prior experience). In addition to the code that is submitted during a commit, authors write commit messages, which describe the commit for archival purposes. It is our position that the level of detail in these commit messages can provide additional explanatory power to JIT defect prediction models. Hence, in this paper, we analyze the relationship between the defect proneness of commits and commit message volume (i.e., the length of the commit message) and commit message content (approximated using spam filtering technology). Through analysis of JIT models that were trained using 342 GitHub repositories, we find that our JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.63-0.96 and 0.01-0.21, respectively. Furthermore, our metrics that are derived from commit message detail provide a statistically significant boost to the explanatory power to the JIT models in 43%-80% of the studied systems, accounting for up to 72% of the explanatory power. Future JIT studies should consider adding commit message detail metrics.

...read moreread less

Proceedings Article•10.1145/2901739.2903497•

Examining programmer practices for locally handling exceptions

[...]

Mary Beth Kery¹, Claire Le Goues¹, Brad A. Myers¹•Institutions (1)

Carnegie Mellon University¹

14 May 2016

TL;DR: It is found that programmers handle exceptions locally in catch blocks much of the time, rather than propagating by throwing an Exception, and face a tension between handlers that directly address local program statement failure and handlers that consider the program-wide implications of an exception.

...read moreread less

Abstract: Many have argued that the current try/catch mechanism for handling exceptions in Java is flawed. A major complaint is that programmers often write minimal and low quality handlers. We used the Boa tool to examine a large number of Java projects on GitHub to provide empirical evidence about how programmers currently deal with exceptions. We found that programmers handle exceptions locally in catch blocks much of the time, rather than propagating by throwing an Exception. Programmers make heavy use of actions like Log, Print, Return, or Throw in catch blocks, and also frequently copy code between handlers. We found bad practices like empty catch blocks or catching Exception are indeed widespread. We discuss evidence that programmers may misjudge risk when catching Exception, and face a tension between handlers that directly address local program statement failure and handlers that consider the program-wide implications of an exception. Some of these issues might be ad-dressed by future tools which autocomplete more complete handlers.

...read moreread less

Proceedings Article•10.1145/2901739.2901753•

Adressing problems with external validity of repository mining studies through a smart data platform

[...]

Fabian Trautsch¹, Steffen Herbold¹, Philip Makedonski¹, Jens Grabowski¹•Institutions (1)

University of Göttingen¹

14 May 2016

TL;DR: A potential solution to problems within the current state-of-the-art that pose a threat to the external validity of results is discussed through a cloud-based platform that integrates data collection and analytics.

...read moreread less

Abstract: Research in software repository mining has grown considerably the last decade. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the external validity of results. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created the prototype SmartSHARK that implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of SmartSHARK and the mentioned problems.

...read moreread less

Proceedings Article•10.1145/2901739.2901764•

The impact of switching to a rapid release cycle on the integration delay of addressed issues: an empirical study of the mozilla firefox project

[...]

Daniel Alencar da Costa¹, Shane McIntosh², Uirá Kulesza¹, Ahmed E. Hassan³•Institutions (3)

Federal University of Rio Grande do Norte¹, McGill University², Queen's University³

14 May 2016

TL;DR: It is observed that addressed issues take a median of 50 days longer to be integrated in rapid Firefox releases than the traditional ones, suggesting that rapid release cycles may not be a silver bullet for the rapid delivery of addressed issues to users.

...read moreread less

Abstract: The release frequency of software projects has increased in recent years. Adopters of so-called rapid release cycles claim that they can deliver addressed issues (i.e., bugs, enhancements, and new features) to users more quickly. However, there is little empirical evidence to support these claims. In fact, in our prior work, we found that code integration phases may introduce delays in rapidly releasing software - 98% of addressed issues in the rapidly releasing Firefox project had their integration delayed by at least one release. To better understand the impact that rapid release cycles have on the integration delay of addressed issues, we perform a comparative study of traditional and rapid release cycles. Through an empirical study of 72,114 issue reports from the Firefox system, we observe that, surprisingly, addressed issues take a median of 50 days longer to be integrated in rapid Firefox releases than the traditional ones. To investigate the factors that are related to integration delay in traditional and rapid release cycles, we train regression models that explain if an addressed issue will have its integration delayed or not. Our explanatory models achieve good discrimination (ROC areas of 0.81-0.83) and calibration scores (Brier scores of 0.05-0.16). Deeper analysis of our explanatory models indicates that traditional releases prioritize the integration of backlog issues, while rapid releases prioritize issues that were addressed during the current release cycle. Our results suggest that rapid release cycles may not be a silver bullet for the rapid delivery of addressed issues to users.

...read moreread less

Proceedings Article•10.1145/2901739.2901775•

Locating bugs without looking back

[...]

Tezcan Dilshener¹, Michel Wermelinger¹, Yijun Yu¹•Institutions (1)

Open University¹

14 May 2016

TL;DR: This work presents a novel approach that directly scores each current file against the given report, thus not requiring past code and reports, and shows the applicability of this approach to software projects without history.

...read moreread less

Abstract: Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, most of state-of-the-art IR approaches rely on project history, in particular previously fixed bugs and previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring is based on heuristics identified through manual inspection of a small set of bug reports. We compare our approach to five others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 28. For example, on average we find one or more affected files in the top 10 ranked files for 77% of the bug reports. These results show the applicability of our approach to software projects without history.

...read moreread less

Proceedings Article•10.1145/2901739.2901750•

Grouping android tag synonyms on stack overflow

[...]

Stefanie Beyer¹, Martin Pinzger¹•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

14 May 2016

TL;DR: This work represents their synonyms as directed, weighted graphs, and investigates several graph community detection algorithms to build meaningful groups of tags, also called tag communities, and shows how these tag communities can be used to derive trends of topics of Android-related questions on Stack Overflow.

...read moreread less

Abstract: On Stack Overflow, more than 38,000 diverse tags are used to classify posts. The Stack Overflow community provides tag synonyms to reduce the number of tags that have the same or similar meaning. In our previous research, we used those synonym pairs to derive a number of strategies to create tag synonyms automatically.In this work, we continue this line of research and present an approach to group tag synonyms to meaningful topics. We represent our synonyms as directed, weighted graphs, and investigate several graph community detection algorithms to build meaningful groups of tags, also called tag communities.We apply our approach to the tags obtained from Android-related Stack Overflow posts and evaluate the resulting tag communities quantitatively with various community metrics. In addition, we evaluate our approach qualitatively through a manual inspection and comparison of a random sample of tag communities. Our results show that we can cluster the Android tags to 2,481 meaningful tag communities. We also show how these tag communities can be used to derive trends of topics of Android-related questions on Stack Overflow.

...read moreread less

Proceedings Article•10.1145/2901739.2901760•

Topic modeling of NASA space system problem reports: research in practice

[...]

Lucas Layman, Allen P. Nikora¹, Joshua Meek, Tim Menzies²•Institutions (2)

California Institute of Technology¹, North Carolina State University²

14 May 2016

TL;DR: topic modeling is applied to a corpus of NASA problem reports to extract trends in testing and operational failures, and finds that hardware material and flight software issues are common during the integration and testing phase, while ground station software and equipment issues are more common During the operations phase.

...read moreread less

Abstract: Problem reports at NASA are similar to bug reports: they capture defects found during test, post-launch operational anomalies, and document the investigation and corrective action of the issue. These artifacts are a rich source of lessons learned for NASA, but are expensive to analyze since problem reports are comprised primarily of natural language text. We apply {topic modeling to a corpus of NASA problem reports to extract trends in testing and operational failures. We collected 16,669 problem reports from six NASA space flight missions and applied Latent Dirichlet Allocation topic modeling to the document corpus. We analyze the most popular topics within and across missions, and how popular topics changed over the lifetime of a mission. We find that hardware material and flight software issues are common during the integration and testing phase, while ground station software and equipment issues are more common during the operations phase. We identify a number of challenges in topic modeling for trend analysis: 1) that the process of selecting the topic modeling parameters lacks definitive guidance, 2) defining semantically-meaningful topic labels requires non-trivial effort and domain expertise, 3) topic models derived from the combined corpus of the six missions were biased toward the larger missions, and 4) topics must be semantically distinct as well as cohesive to be useful. Nonetheless, topic modeling can identify problem themes within missions and across mission lifetimes, providing useful feedback to engineers and project managers.

...read moreread less

Proceedings Article•10.1145/2901739.2901762•

Inter-app communication in Android: developer challenges

[...]

Waqar Ahmad¹, Christian Kästner¹, Joshua Sunshine¹, Jonathan Aldrich¹•Institutions (1)

Carnegie Mellon University¹

14 May 2016

TL;DR: A broad corpus of apps was studied and the findings suggest that design limitations do in- deed cause development problems and possible mitigation strategies are proposed.

...read moreread less

Abstract: The Android platform is designed to support mutually un-trusted third-party apps, which run as isolated processes but may interact via platform-controlled mechanisms, called Intents. Interactions among third-party apps are intended and can contribute to a rich user experience, for example, the ability to share pictures from one app with another. The Android platform presents an interesting point in a design space of module systems that is biased toward isolation, extensibility, and untrusted contributions. The Intent mech- anism essentially provides message channels among modules, in which the set of message types is extensible. However, the module system has design limitations including the lack of consistent mechanisms to document message types, very limited checking that a message conforms to its specifica- tions, the inability to explicitly declare dependencies on other modules, and the lack of checks for backward compatibility as message types evolve over time. In order to understand the degree to which these design limitations result in real issues, we studied a broad corpus of apps and cross-validated our results against app documentation and Android support forums. Our findings suggest that design limitations do in- deed cause development problems. Based on our results, we outline further research questions and propose possible mitigation strategies.

...read moreread less

Proceedings Article•10.1145/2901739.2901759•

A large-scale study on repetitiveness, containment, and composability of routines in open-source projects

[...]

Anh Tuan Nguyen¹, Hoan Anh Nguyen¹, Tien N. Nguyen¹•Institutions (1)

Iowa State University¹

14 May 2016

TL;DR: A large-scale study on the repetitiveness, containment, and composability of source code at the semantic level by collecting 8,764,971 unique subroutines as basic units for code searching/synthesis.

...read moreread less

Abstract: Source code in software systems has been shown to have a good degree of repetitiveness at the lexical, syntactical, and API usage levels. This paper presents a large-scale study on the repetitiveness, containment, and composability of source code at the semantic level. We collected a large dataset consisting of 9,224 Java projects with 2.79M class files, 17.54M methods with 187M SLOCs. For each method in a project, we build the program dependency graph (PDG) to represent a routine, and compare PDGs with one another as well as the subgraphs within them. We found that within a project, 12.1% of the routines are repeated, and most of them repeat from 2–7 times. As entirety, the routines are quite project-specific with only 3.3% of them exactly repeating in 1–4 other projects with at most 8 times. We also found that 26.1% and 7.27% of the routines are contained in other routine(s), i.e., implemented as part of other routine(s) elsewhere within a project and in other projects, respectively. Except for trivial routines, their repetitiveness and containment is independent of their complexity. Defining a subroutine via a per-variable slicing subgraph in a PDG, we found that 14.3% of all routines have all of their subroutines repeated. A high percentage of subroutines in a routine can be found/reused elsewhere. We collected 8,764,971 unique subroutines (with 323,564 unique JDK subroutines) as basic units for code searching/synthesis. We also provide practical implications of our findings to automated tools.

...read moreread less

Proceedings Article•10.1145/2901739.2901773•

Software ingredients: detection of third-party component reuse in Java software release

[...]

Takashi Ishio¹, Raula Gaikovina Kula¹, Tetsuya Kanda¹, Daniel M. German², Katsuro Inoue¹ - Show less +1 more•Institutions (2)

Osaka University¹, University of Victoria²

14 May 2016

TL;DR: A method to automatically select the most likely origin of components reused in a product, based on an assumption that a product tends to include an entire copy of a component rather than a partial copy, is proposed.

...read moreread less

Abstract: A software product is often dependent on a large number of third-party components.To assess potential risks, such as security vulnerabilities and license violations, a list of components and their versions in a product is important for release engineers and security analysts.Since such a list is not always available, a code comparison technique named Software Bertillonage has been proposed to test whether a product likely includes a copy of a particular component or not.Although the technique can extract candidates of reused components, a user still has to manually identify the original components among the candidates.In this paper, we propose a method to automatically select the most likely origin of components reused in a product, based on an assumption that a product tends to include an entire copy of a component rather than a partial copy.More concretely, given a Java product and a repository of jar files of existing components, our method selects jar files that can provide Java classes to the product in a greedy manner.To compare the method with the existing technique, we have conducted an evaluation using randomly created jar files including up to 1,000 components.The Software Bertillonage technique reports many candidates; the precision and recall are 0.357 and 0.993, respectively.Our method reports a list of original components whose precision and recall are 0.998 and 0.997.

...read moreread less

Proceedings Article•10.1145/2901739.2901766•

The unreasonable effectiveness of traditional information retrieval in crash report deduplication

[...]

Joshua Charles Campbell¹, Eddie Antonio Santos¹, Abram Hindle¹•Institutions (1)

University of Alberta¹

14 May 2016

TL;DR: In this article, a variety of crash report bucketing methods are evaluated using data collected by Ubuntu'sApport automated crash reporting system and a set of criteria that acrash deduplication method must meet is presented and several methods that meet these criteria are evaluated on anew dataset.

...read moreread less

Abstract: Organizations like Mozilla, Microsoft, and Apple are floodedwith thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examineindividually. Therefore, in industry, crash reports are oftenautomatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systemsavailable with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’sApport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash deduplication techniques is explored. A set of criteria that acrash deduplication method must meet is presented and several methods that meet these criteria are evaluated on anew dataset. The evaluations presented in this paper showthat using off-the-shelf information retrieval techniques, thatwere not designed to be used with crash reports, outperformother techniques which are specifically designed for the taskof crash bucketing at realistic industrial scales. This researchindicates that automated crash bucketing still has a lot ofroom for improvement, especially in terms of identifier tokenization.

...read moreread less

Proceedings Article•10.1145/2901739.2901741•

Interactive exploration of developer interaction traces using a hidden markov model

[...]

Kostadin Damevski¹, Hui Chen², David C. Shepherd, Lori Pollock³•Institutions (3)

Virginia Commonwealth University¹, Virginia State University², University of Delaware³

14 May 2016

TL;DR: This paper describes a technique that leverages Hidden Markov Models (HMMs) as a means of mining high-level developer behavior from low-level IDE interaction traces of many developers in the field, using a large IDE interaction dataset collected from nearly 200 developers at ABB, Inc.

...read moreread less

Abstract: Using IDE usage data to analyze the behavior of software developers in the field, during the course of their daily work, can lend support to (or dispute) laboratory studies of devel- opers. This paper describes a technique that leverages Hidden Markov Models (HMMs) as a means of mining high-level developer behavior from low-level IDE interaction traces of many developers in the field. HMMs use dual stochastic processes to model higher-level hidden behavior using observable input sequences of events. We propose an interactive approach of mining interpretable HMMs, based on guiding a human expert in building a high quality HMM in an iterative, one state at a time, manner. The final result is a model that is both representative of the field data and captures the field phenomena of interest. We apply our HMM construction approach to study debugging behavior, using a large IDE interaction dataset collected from nearly 200 developers at ABB, Inc. Our results highlight the different modes and constituent actions in debugging, exhibited by the developers in our dataset.

...read moreread less

Journal Article•10.21858/MSR.18.08•

Kapitał terytorialny jako cel zintegrowanego planowania rozwoju

[...]

Tadeusz Markowski

1 Jun 2016

TL;DR: The author shows that the adoption of the concept of development based on the building of the territorial capital will not be possible without the introduction of an integrated planning procedures and integrated development strategy in relation to defined functional areas.

...read moreread less

Abstract: The article is an attempt to show the possibility and necessity of the introduction the new development policy to the implementationof new development policy – based on the so-calledthe territorial dimension – the integrated planning model and integrated system projects. The author tries to show that the territorial dimensionof development policy imposes an integrated development planning. A leading factor for this concept is orientation of the policies on creation of a new resource in the form of territorial capital. The author defines the notion of the territorial capital, as a special type of relational human capital, the quality of which is conditional on the sustainable development of three related factors; the economic, social and environmental. A special accent has been put onthe relationship between physical planning , quality of economic and social relations and productivity of human capital. In the conclusions of the author shows that the adoption of the concept of development based on the building of the territorial capital, it will not be possible without the introduction of an integrated planning procedures and integrated development strategy in relation to defined functional areas.

...read moreread less

Proceedings Article•10.1145/2901739.2901756•

Improving change recommendation using aggregated association rules

[...]

Thomas Rolfsnes¹, Leon Moonen¹, Stefano Di Alesio¹, Razieh Behjati¹, Dave Binkley² - Show less +1 more•Institutions (2)

Simula Research Laboratory¹, Loyola University Maryland²

14 May 2016

TL;DR: An empirical study on the change histories of two large industrial systems and four large open source systems shows that aggregation provides a significant impact on most measure’s value and leads to a significant improvement in the resulting recommendation.

...read moreread less

Abstract: Past research has proposed association rule mining as a means to uncover the evolutionary coupling from a system’s change history. These couplings have various applications, such as improving system decomposition and recommending related changes during development. The strength of the coupling can be characterized using a variety of interestingness measures. Existing recommendation engines typically use only the rule with the highest interestingness value in situations where more than one rule applies. In contrast, we argue that multiple applicable rules indicate increased evidence, and hypothesize that the aggregation of such rules can be exploited to provide more accurate recommendations.To investigate this hypothesis we conduct an empirical study on the change histories of two large industrial systems and four large open source systems. As aggregators we adopt three cumulative gain functions from information retrieval. The experiments evaluate the three using 39 different rule interestingness measures. The results show that aggregation provides a significant impact on most measure’s value and, furthermore, leads to a significant improvement in the resulting recommendation.

...read moreread less