Top 74 papers presented at Mining Software Repositories in 2012

Showing papers presented at "Mining Software Repositories in 2012"

Proceedings Article•10.1109/MSR.2012.6224306•

App store mining and analysis: MSR for app stores

[...]

1 Jan 2012

TL;DR: We use data mining to extract feature information, which we then combine with more readily available information to analyse apps' technical, customer and business aspects.

...read moreread less

336 citations

Proceedings Article•10.1109/MSR.2012.6224294•

GHTorrent: Github's data from a firehose

[...]

Gousios, Spinellis

1 Jan 2012

190 citations

Proceedings Article•10.5555/2664446.2664455•

Think locally, act globally: improving defect and effort prediction models

[...]

Nicolas Bettenburg¹, Meiyappan Nagappan¹, Ahmed E. Hassan¹•Institutions (1)

Queen's University¹

2 Jun 2012

TL;DR: A comparison of three different approaches for creating statistical regression models to model and predict software defects and development effort finds that for both types of data, local models show a significantly increased fit to the data compared to global models.

...read moreread less

Abstract: Much research energy in software engineering is focused on the creation of effort and defect prediction models. Such models are important means for practitioners to judge their current project situation, optimize the allocation of their resources, and make informed future decisions. However, software engineering data contains a large amount of variability. Recent research demonstrates that such variability leads to poor fits of machine learning models to the underlying data, and suggests splitting datasets into more fine-grained subsets with similar properties. In this paper, we present a comparison of three different approaches for creating statistical regression models to model and predict software defects and development effort. Global models are trained on the whole dataset. In contrast, local models are trained on subsets of the dataset. Last, we build a global model that takes into account local characteristics of the data. We evaluate the performance of these three approaches in a case study on two defect and two effort datasets. We find that for both types of data, local models show a significantly increased fit to the data compared to global models. The substantial improvements in both relative and absolute prediction errors demonstrate that this increased goodness of fit is valuable in practice. Finally, our experiments suggest that trends obtained from global models are too general for practical recommendations. At the same time, local models provide a multitude of trends which are only valid for specific subsets of the data. Instead, we advocate the use of trends obtained from global models that take into account local characteristics, as they combine the best of both worlds.

...read moreread less

133 citations

Proceedings Article•10.5555/2664446.2664472•

Inferring semantically related words from software context

[...]

Jinqiu Yang¹, Lin Tan¹•Institutions (1)

University of Waterloo¹

2 Jun 2012

TL;DR: This paper proposes a simple and general technique to automatically infer semantically related words in software by leveraging the context of words in comments and code and achieves a reasonable accuracy in seven large and popular code bases written in C and Java.

...read moreread less

Abstract: Code search is an integral part of software development and program comprehension. The difficulty of code search lies in the inability to guess the exact words used in the code. Therefore, it is crucial for keyword-based code search to expand queries with semantically related words, e.g., synonyms and abbreviations, to increase the search effectiveness. However, it is limited to rely on resources such as English dictionaries and WordNet to obtain semantically related words in software, because many words that are semantically related in software are not semantically related in English. This paper proposes a simple and general technique to automatically infer semantically related words in software by leveraging the context of words in comments and code. We achieve a reasonable accuracy in seven large and popular code bases written in C and Java. Our further evaluation against the state of art shows that our technique can achieve a higher precision and recall.

...read moreread less

103 citations

Proceedings Article•10.5555/2664446.2664458•

Green mining: a methodology of relating software change to power consumption

[...]

Abram Hindle¹•Institutions (1)

University of Alberta¹

2 Jun 2012

TL;DR: It is demonstrated that software change can effect power consumption using the Firefox web-browser and the Azureus/Vuze BitTorrent client and there is evidence of a potential relationship between some software metrics and power consumption.

...read moreread less

Abstract: Power consumption is becoming more and more important with the increased popularity of smart-phones, tablets and laptops. The threat of reducing a customer's battery-life now hangs over the software developer who asks, "will this next change be the one that causes my software to drain a customer's battery?" One solution is to detect power consumption regressions by measuring the power usage of tests, but this is time-consuming and often noisy. An alternative is to rely on software metrics that allow us to estimate the impact that a change might have on power consumption thus relieving the developer from expensive testing. This paper presents a general methodology for investigating the impact of software change on power consumption, we relate power consumption to software changes, and then investigate the impact of static OO software metrics on power consumption. We demonstrated that software change can effect power consumption using the Firefox web-browser and the Azureus/Vuze BitTorrent client. We found evidence of a potential relationship between some software metrics and power consumption. In conclusion, we explored the effect of software change on power consumption on two projects; and we provide an initial investigation on the impact of software metrics on power consumption.

...read moreread less

97 citations

Proceedings Article•10.1109/MSR.2012.6224279•

Do faster releases improve software quality? An empirical case study of Mozilla Firefox

[...]

Khomh, Dhaliwal, Zou, Adams

1 Jan 2012

90 citations

Proceedings Article•10.1109/MSR.2012.6224300•

Think locally, act globally: Improving defect and effort prediction models

[...]

Bettenburg, Nagappan, Hassan

1 Jan 2012

88 citations

Proceedings Article•10.1109/MSR.2012.6224281•

A qualitative study on performance bugs

[...]

Zaman, Adams, Hassan

1 Jan 2012

83 citations

Proceedings Article•10.5555/2664446.2664457•

Are faults localizable

[...]

Lucia¹, Ferdian Thung¹, David Lo¹, Lingxiao Jiang¹•Institutions (1)

Singapore Management University¹

2 Jun 2012

TL;DR: This work investigates hundreds of real faults in several software systems, and finds that many faults may not be localizable to a few lines of code and these include faults with high severity level.

...read moreread less

Abstract: Many fault localization techniques have been proposed to facilitate debugging activities. Most of them attempt to pinpoint the location of faults (i.e., localize faults) based on a set of failing and correct executions and expect debuggers to investigate a certain number of located program elements to find faults. These techniques thus assume that faults are localizable, i.e., only one or a few lines of code that are close to one another are responsible for each fault. However, in reality, are faults localizable? In this work, we investigate hundreds of real faults in several software systems, and find that many faults may not be localizable to a few lines of code and these include faults with high severity level.

...read moreread less

56 citations

Proceedings Article•10.5555/2664446.2664470•

Why do software packages conflict

[...]

Cyrille Artho¹, Kuniyasu Suzaki¹, Roberto Di Cosmo², Ralf Treinen², Stefano Zacchiroli² - Show less +1 more•Institutions (2)

National Institute of Advanced Industrial Science and Technology¹, Paris Diderot University²

2 Jun 2012

TL;DR: An extensive case study of conflict defects extracted from the bug tracking systems of Debian and Red Hat shows that with more detailed package meta-data, about 30 % of all conflict defects could be prevented relatively easily, while another 30 % could be found by targeted testing of packages that share common resources or characteristics.

...read moreread less

Abstract: Determining whether two or more packages cannot be installed together is an important issue in the quality assurance process of package-based distributions. Unfortunately, the sheer number of different configurations to test makes this task particularly challenging, and hundreds of such incompatibilities go undetected by the normal testing and distribution process until they are later reported by a user as bugs that we call “conflict defects”. We performed an extensive case study of conflict defects extracted from the bug tracking systems of Debian and Red Hat. According to our results, conflict defects can be grouped into five main categories. We show that with more detailed package meta-data, about 30 % of all conflict defects could be prevented relatively easily, while another 30 % could be found by targeted testing of packages that share common resources or characteristics. These results allow us to make precise suggestions on how to prevent and detect conflict defects in the future.

...read moreread less

45 citations

Proceedings Article•10.5555/2664446.2664452•

How distributed version control systems impact open source software projects

[...]

Christian Rodriguez-Bustos¹, Jairo Aponte¹•Institutions (1)

National University of Colombia¹

2 Jun 2012

TL;DR: An analysis of the Mozilla repositories, which migrated from CVS to Mercurial in 2007, reveals both expected and unexpected aspects of the contributors' activities.

...read moreread less

Abstract: Centralized Version Control Systems have been used by many open source projects for a long time. However, in recent years several widely-known projects have migrated their repositories to Distributed Version Control Systems, such as Mercurial, Bazaar, and Git. Such systems have technical features that allow contributors to work in new ways, as various different workflows are possible. We plan to study this migration process to assess how developers' organization and their contributions are affected. As a first step, we present an analysis of the Mozilla repositories, which migrated from CVS to Mercurial in 2007. This analysis reveals both expected and unexpected aspects of the contributors' activities.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224276•

Inferring semantically related words from software context

[...]

Yang, Tan

1 Jan 2012

Proceedings Article•10.5555/2664446.2664448•

Towards improving bug tracking systems with game mechanisms

[...]

Rafael Lotufo¹, Leonardo Passos¹, Krzysztof Czarnecki¹•Institutions (1)

University of Waterloo¹

2 Jun 2012

TL;DR: This work investigates the use of game mechanisms in Stack Overflow, an online community organized to resolve computer programming related problems, and finds that most benefits are applicable to open-source bug tracking systems.

...read moreread less

Abstract: Low bug report quality and human conflicts pose challenges to keep bug tracking systems productive. This work proposes to address these issues by applying game mechanisms to bug tracking systems. We investigate the use of game mechanisms in Stack Overflow, an online community organized to resolve computer programming related problems, for which the improvements we seek for bug tracking systems also turn out to be relevant. The results of our Stack Overflow investigation show that its game mechanisms could be used to address these issues by motivating contributors to increase contribution frequency and quality, by filtering useful contributions, and by creating an agile and dependable moderation system. We proceed by mapping these mechanisms to open-source bug tracking systems, and find that most benefits are applicable. Additionally, our results motivate tailoring a reward and reputation system and summarizing bug reports as future directions for increasing the benefits of game mechanisms in bug tracking systems.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224296•

A Linked Data platform for mining software repositories

[...]

Keivanloo, Hmood, Erfani, Neal, Peristerakis, Rilling - Show less +2 more

1 Jan 2012

Proceedings Article•10.5555/2664446.2664462•

Mining challenge 2012: the Android platform

[...]

Emad Shihab¹, Yasutaka Kamei², Pamela Bhattacharya³•Institutions (3)

Queen's University¹, Kyushu University², University of California, Riverside³

2 Jun 2012

TL;DR: The role of the MSR Challenge is described, the change and bug report data provided are highlighted and the papers accepted for inclusion in this year's challenge are summarized.

...read moreread less

Abstract: The MSR Challenge offers researchers and practitioners in the area of Mining Software Repositories a common data set and asks them to put their mining tools and approaches on a dare. This year, the challenge is on the Android platform. We provided the change and bug report data for the Android platform asked researchers to uncover interesting findings related to the Android platform. In this paper, we describe the role of the MSR Challenge, highlight the data provided and summarize the papers accepted for inclusion in this year's challenge.

...read moreread less

Proceedings Article•10.5555/2664446.2664463•

Bug introducing changes: a case study with Android

[...]

Muhammad Asaduzzaman¹, Michael. C. Bullock¹, Chanchal K. Roy¹, Kevin A. Schneider¹•Institutions (1)

University of Saskatchewan¹

2 Jun 2012

TL;DR: In this paper, the authors mine the bug introducing changes in the Android platform by mapping bug reports to the changes that introduced the bugs and then use the change information to look for both potential problematic parts and dynamics in development that can cause maintenance implications.

...read moreread less

Abstract: Changes, a rather inevitable part of software development can cause maintenance implications if they introduce bugs into the system. By isolating and characterizing these bug introducing changes it is possible to uncover potential risky source code entities or issues that produce bugs. In this paper, we mine the bug introducing changes in the Android platform by mapping bug reports to the changes that introduced the bugs. We then use the change information to look for both potential problematic parts and dynamics in development that can cause maintenance implications. We believe that the results of our study can help better manage Android software development.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224307•

Mining challenge 2012: The Android platform

[...]

Shihab, Kamei, Bhattacharya

1 Jan 2012

Proceedings Article•10.5555/2664446.2664464•

Trendy bugs: topic trends in the Android bug reports

[...]

Lee Martie¹, Vijay Krishna Palepu¹, Hitesh Sajnani¹, Cristina V. Lopes¹•Institutions (1)

University of California, Irvine¹

2 Jun 2012

TL;DR: An approach to analyze the development of the Android open source project by observing trends in the bug discussions in the Androidopen source project public issue tracker, which informs us of the features or parts of the project that are more problematic at any given point of time.

...read moreread less

Abstract: Studying vast volumes of bug and issue discussions can give an understanding of what the community has been most concerned about, however the magnitude of documents can overload the analyst. We present an approach to analyze the development of the Android open source project by observing trends in the bug discussions in the Android open source project public issue tracker. This informs us of the features or parts of the project that are more problematic at any given point of time. In turn, this can be used to aid resource allocation (such as time and man power) to parts or features. We support these ideas by presenting the results of issue topic distributions over time using statistical analysis of the bug descriptions and comments for the Android open source project. Furthermore, we show relationships between those time distributions and major development releases of the Android OS.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224287•

What does software engineering community microblog about

[...]

Tian, Achananuparp, Lubis, Lo, Lim - Show less +1 more

1 Jan 2012

TL;DR: The authors' experiments show that microblogs commonly contain job openings, news, questions and answers, or links to download new tools and code, and it is found that micro blogs concerning real-world events are more widely diffused in the Twitter network.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224268•

Trendy bugs: Topic trends in the Android bug reports

[...]

Martie, Palepu, Sajnani, Lopes

1 Jan 2012

Proceedings Article•10.1109/MSR.2012.6224298•

An empirical study of supplementary bug fixes

[...]

Park, Kim, Ray, Bae

1 Jan 2012

Proceedings Article•10.1109/MSR.2012.6224293•

Towards improving bug tracking systems with game mechanisms

[...]

Lotufo, Passos, Czarnecki

1 Jan 2012

Proceedings Article•10.5555/2664446.2664479•

Co-evolution of logical couplings and commits for defect estimation

[...]

Maximilian Steff¹, Barbara Russo¹•Institutions (1)

Free University of Bozen-Bolzano¹

2 Jun 2012

TL;DR: The history of logical couplings is correlated to the history of defects for every commit in the graph and sub-structures of bug-fixing commits over sub-Structures of normal commits are identified, indicating that co-evolutionary graphs are a promising new instrument for detecting defective software structures.

...read moreread less

Abstract: Logical couplings between files in the commit history of a software repository are instances of files being changed together. The evolution of couplings over commits' history has been used for the localization and prediction of software defects in software reliability. Couplings have been represented in class graphs and change histories on the class-level have been used to identify defective modules. Our new approach inverts this perspective and constructs graphs of ordered commits coupled by common changed classes. These graphs, thus, represent the co-evolution of commits, structured by the change patterns among classes. We believe that co-evolutionary graphs are a promising new instrument for detecting defective software structures. As a first result, we have been able to correlate the history of logical couplings to the history of defects for every commit in the graph and to identify sub-structures of bug-fixing commits over sub-structures of normal commits.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224297•

How Distributed Version Control Systems impact open source software projects

[...]

Rodriguez-Bustos, Aponte

1 Jan 2012

Proceedings Article•10.5555/2664446.2664465•

Do the stars align?: multidimensional analysis of Android's layered architecture

[...]

Victor Guana¹, Fabio Rocha¹, Abram Hindle¹, Eleni Stroulia¹•Institutions (1)

University of Alberta¹

2 Jun 2012

TL;DR: This paper has identified the locality of the Android bugs in the architectural layers of the its infrastructure, and analysed the bug lifetime patterns in each one of them, and identified one particular layer that is more important to developers and users alike.

...read moreread less

Abstract: In this paper we mine the Android bug tracker repository and study the characteristics of the architectural layers of the Android system. We have identified the locality of the Android bugs in the architectural layers of the its infrastructure, and analysed the bug lifetime patterns in each one of them. Additionally, we mined the bug tracker reporters and classified them according to its social centrality in the Android bug tracker community. We report three interesting findings, firstly while some architectural layers have a diverse interaction of people, attracting not only non-central reporters but highly important ones, other layers are mostly captivating for peripheral actors. Second, we exposed that even the bug lifetime is similar across the architectural layers, some of them have higher bug density and differential percentages of unsolved bugs. Finally, comparing the popularity distribution between layers, we have identified one particular layer that is more important to developers and users alike.

...read moreread less

Proceedings Article•10.5555/2664446.2664460•

Mining usage data and development artifacts

[...]

Olga Baysal¹, Reid Holmes¹, Michael W. Godfrey¹•Institutions (1)

University of Waterloo¹

2 Jun 2012

TL;DR: This work explores how usage data that has been extracted from web server logs can be unified with product release history to study questions that concern both users' detailed dynamic behaviour as well as broad adoption trends across different deployment environments.

...read moreread less

Abstract: Software repository mining techniques generally focus on analyzing, unifying, and querying different kinds of development artifacts, such as source code, version control meta-data, defect tracking data, and electronic communication. In this work, we demonstrate how adding real-world usage data enables addressing broader questions of how software systems are actually used in practice, and by inference how development characteristics ultimately affect deployment, adoption, and usage. In particular, we explore how usage data that has been extracted from web server logs can be unified with product release history to study questions that concern both users' detailed dynamic behaviour as well as broad adoption trends across different deployment environments. To validate our approach, we performed a study of two open source web browsers: Firefox and Chrome. We found that while Chrome is being adopted at a consistent rate across platforms, Linux users have an order of magnitude higher rate of Firefox adoption. Also, Firefox adoption has been concentrated mainly in North America, while Chrome users appear to be more evenly distributed across the globe. Finally, we detected no evidence in age-specific differences in navigation behaviour among Chrome and Firefox users; however, we hypothesize that younger users are more likely to have more up-to-date versions than more mature users.

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224286•

Who? Where? What? Examining distributed development in two large open source projects

[...]

Nagappan

1 Jan 2012

Proceedings Article•10.1109/MSR.2012.6224269•

Do the stars align? Multidimensional analysis of Android's layered architecture

[...]

Guana, Rocha, Hindle, Stroulia

1 Jan 2012

Proceedings Article•10.5555/2664446.2664469•

The evolution of the social programmer

[...]

Margaret-Anne Storey¹•Institutions (1)

University of Victoria¹

2 Jun 2012

TL;DR: This paradigm shift is particularly evident in software engineering in three distinct ways: firstly, in how software stakeholders co-develop and form communities of practice; secondly, in the complex and distributed software ecosystems enabled through insourcing, outsourcing, open sourcing and crowdsourcing of components and related artifacts; and thirdly, by the emergence of socially-enabled software repositories and collaborative development environments.

...read moreread less

Abstract: Social media has revolutionized how humans create and curate knowledge artifacts [1]. It has increased individual engagement, broadened community participation and led to the formation of new social networks. This paradigm shift is particularly evident in software engineering in three distinct ways: firstly, in how software stakeholders co-develop and form communities of practice; secondly, in the complex and distributed software ecosystems that are enabled through insourcing, outsourcing, open sourcing and crowdsourcing of components and related artifacts; and thirdly, by the emergence of socially-enabled software repositories and collaborative development environments [2].

...read moreread less

Proceedings Article•10.1109/MSR.2012.6224283•

Co-evolution of logical couplings and commits for defect estimation

[...]

Steff, Russo

1 Jan 2012