Conference
Mining Software Repositories
About: Mining Software Repositories is an academic conference. The conference publishes majorly in the area(s): Computer science & Source code. Over the lifetime, 1061 publications have been published by the conference receiving 43279 citations.
Papers published on a yearly basis
Papers
17 May 2005
TL;DR: In a first investigation of the MOZILLA and ECLIPSE history, it turns out that fix-inducing changes show distinct patterns with respect to their size and the day of week they were applied.
Abstract: As a software system evolves, programmers make changes that sometimes cause problems. We analyze CVS archives for fix-inducing changes---changes that lead to problems, indicated by fixes. We show how to automatically locate fix-inducing changes by linking a version archive (such as CVS) to a bug database (such as BUGZILLA). In a first investigation of the MOZILLA and ECLIPSE history, it turns out that fix-inducing changes show distinct patterns with respect to their size and the day of week they were applied.
1,067 citations
16 May 2009
TL;DR: It is shown, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were.
Abstract: We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as “How do contributions flow between developers to the official project repository?” However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data.
979 citations
14 May 2016
TL;DR: This work presents a growing collection of Android Applications collected from several sources, including the official GooglePlay app market, which contains more than three million apps that have been analysed by tens of different AntiVirus products to know which applications are detected as Malware.
Abstract: We present a growing collection of Android Applications col-lected from several sources, including the official GooglePlay app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has beenanalysed by tens of different AntiVirus products to knowwhich applications are detected as Malware. We provide thisdataset to contribute to ongoing research efforts, as well asto enable new potential research topics on Android Apps.By releasing our dataset to the research community, we alsoaim at encouraging our fellow researchers to engage in reproducible experiments.
855 citations
2 May 2010
TL;DR: The absence of an established benchmark makes it hard, if not impossible, to compare approaches to bug prediction, and this paper aims to address this problem.
Abstract: Reliably predicting software defects is one of software engineering's holy grails. Researchers have devised and implemented a plethora of bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches.
701 citations
22 May 2006
TL;DR: This paper begins with a discussion of the infrastructure (including a novel use of Scientific Workflow software) and then discusses the approach to mining the email archives, and presents some preliminary results from the data analysis.
Abstract: Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium,and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure (including a novel use of Scientific Workflow software) and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis.
661 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 77 |
| 2020 | 83 |
| 2019 | 87 |
| 2018 | 116 |
| 2017 | 98 |
| 2016 | 87 |