Top 42 papers presented at Mining Software Repositories in 2006

Showing papers presented at "Mining Software Repositories in 2006"

Proceedings Article•10.1145/1137983.1138016•

Mining email social networks

[...]

Christian Bird¹, Alex Gourley¹, Prem Devanbu¹, Michael Gertz¹, Anand Swaminathan¹ - Show less +1 more•Institutions (1)

22 May 2006

TL;DR: This paper begins with a discussion of the infrastructure (including a novel use of Scientific Workflow software) and then discusses the approach to mining the email archives, and presents some preliminary results from the data analysis.

...read moreread less

Abstract: Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium,and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure (including a novel use of Scientific Workflow software) and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis.

...read moreread less

661 citations

Proceedings Article•10.1145/1137983.1137997•

MAPO: mining API usages from open source repositories

[...]

Tao Xie¹, Jian Pei²•Institutions (2)

North Carolina State University¹, Simon Fraser University²

22 May 2006

TL;DR: An API usage mining framework and its supporting tool called MAPO, which leverages the existing source code search engines to gather relevant source files and conducts data mining and the preliminary results show that the framework is practical for providing informative and succinct API usage patterns.

...read moreread less

Abstract: To improve software productivity, when constructing new software systems, developers often reuse existing class libraries or frameworks by invoking their APIs. Those APIs, however, are often complex and not well documented, posing barriers for developers to use them in new client code. To get familiar with how those APIs are used, developers may search the Web using a general search engine to find relevant documents or code examples. Developers can also use a source code search engine to search open source repositories for source files that use the same APIs. Nevertheless, the number of returned source files is often large. It is difficult for developers to learn API usages from a large number of returned results. In order to help developers understand API usages and write API client code more effectively, we have developed an API usage mining framework and its supporting tool called MAPO (for Mining API usages from Open source repositories). Given a query that describes a method, class, or package for an API, MAPO leverages the existing source code search engines to gather relevant source files and conducts data mining. The mining leads to a short list of frequent API usages for developers to inspect. MAPO currently consists of five components: a code search engine, a source code analyzer, a sequence preprocessor, a frequent sequence miner, and a frequent sequence post processor. We have examined the effectiveness of MAPO using a set of various queries. The preliminary results show that the framework is practical for providing informative and succinct API usage patterns.

...read moreread less

256 citations

Proceedings Article•10.1145/1137983.1138027•

How long did it take to fix bugs

[...]

Sunghun Kim¹, E. James Whitehead¹•Institutions (1)

University of California, Santa Cruz¹

22 May 2006

TL;DR: This report compute the bug-fix time of files in ArgoUML and PostgreSQL by identifying when bugs are introduced and when the bugs are fixed by identifying the top 20 bug- fix time files of two projects.

...read moreread less

Abstract: The number of bugs (or fixes) is a common factor used to measure the quality of software and assist bug related analysis. For example, if software files have many bugs, they may be unstable. In comparison, the bug-fix time--the time to fix a bug after the bug was introduced--is neglected. We believe that the bug-fix time is an important factor for bug related analysis, such as measuring software quality. For example, if bugs in a file take a relatively long time to be fixed, the file may have some structural problems that make it difficult to make changes. In this report, we compute the bug-fix time of files in ArgoUML and PostgreSQL by identifying when bugs are introduced and when the bugs are fixed. This report includes bug-fix time statistics such as average bug-fix time, and distributions of bug-fix time. We also list the top 20 bug-fix time files of two projects.

...read moreread less

208 citations

Proceedings Article•10.1145/1137983.1138012•

Predicting defect densities in source code files with decision tree learners

[...]

Patrick Knab¹, Martin Pinzger¹, Abraham Bernstein¹•Institutions (1)

University of Zurich¹

22 May 2006

TL;DR: This work focuses on defect density prediction and presents an approach that applies a decision tree learner on evolution data extracted from the Mozilla open source web browser project, which includes different source code, modification, and defect measures computed from seven recent Mozilla releases.

...read moreread less

Abstract: With the advent of open source software repositories the data available for defect prediction in source files increased tremendously. Although traditional statistics turned out to derive reasonable results the sheer amount of data and the problem context of defect prediction demand sophisticated analysis such as provided by current data mining and machine learning techniques.In this work we focus on defect density prediction and present an approach that applies a decision tree learner on evolution data extracted from the Mozilla open source web browser project. The evolution data includes different source code, modification, and defect measures computed from seven recent Mozilla releases. Among the modification measures we also take into account the change coupling, a measure for the number of change-dependencies between source files. The main reason for choosing decision tree learners, instead of for example neural nets, was the goal of finding underlying rules which can be easily interpreted by humans. To find these rules, we set up a number of experiments to test common hypotheses regarding defects in software entities. Our experiments showed, that a simple tree learner can produce good results with various sets of input data.

...read moreread less

152 citations

Proceedings Article•10.1145/1137983.1138000•

Detecting similar Java classes using tree algorithms

[...]

Tobias Sager¹, Abraham Bernstein¹, Martin Pinzger¹, Christoph Kiefer¹•Institutions (1)

University of Zurich¹

22 May 2006

TL;DR: Initial results of the technique indicate that it is indeed useful to identify similar Java classes, and it successfully identifies the ex ante and ex post versions of refactored classes and provides some interesting insights into within-version and between-version dependencies of classes within a Java project.

...read moreread less

Abstract: Similarity analysis of source code is helpful during development to provide, for instance, better support for code reuse. Consider a development environment that analyzes code while typing and that suggests similar code examples or existing implementations from a source code repository. Mining software repositories by means of similarity measures enables and enforces reusing existing code and reduces the developing effort needed by creating a shared knowledge base of code fragments. In information retrieval similarity measures are often used to find documents similar to a given query document. This paper extends this idea to source code repositories. It introduces our approach to detect similar Java classes in software projects using tree similarity algorithms. We show how our approach allows to find similar Java classes based on an evaluation of three tree-based similarity measures in the context of five user-defined test cases as well as a preliminary software evolution analysis of a medium-sized Java project. Initial results of our technique indicate that it (1) is indeed useful to identify similar Java classes, (2)successfully identifies the ex ante and ex post versions of refactored classes, and (3) provides some interesting insights into within-version and between-version dependencies of classes within a Java project.

...read moreread less

101 citations

Proceedings Article•10.1145/1137983.1138030•

Examining the evolution of code comments in PostgreSQL

[...]

Zhen Ming Jiang¹, Ahmed E. Hassan¹•Institutions (1)

University of Waterloo¹

22 May 2006

TL;DR: Using data recovered from CVS, this study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer.

...read moreread less

Abstract: It is common, especially in large software systems, for developers to change code without updating its associated comments due to their unfamiliarity with the code or due to time constraints. This is a potential problem since outdated comments may confuse or mislead developers who perform future development. Using data recovered from CVS, we study the evolution of code comments in the PostgreSQL project. Our study reveals that over time the percentage of commented functions remains constant except for early fluctuation due to the commenting style of a particular active developer.

...read moreread less

92 citations

Proceedings Article•10.1145/1137983.1138014•

Tracking defect warnings across versions

[...]

Jaime Spacco¹, David Hovemeyer², William Pugh¹•Institutions (2)

University of Maryland, College Park¹, Vassar College²

22 May 2006

TL;DR: Two different techniques the authors have implemented in FindBugs for tracking defects across versions are discussed, their relative merits and how they can be incorporated into the software development process, and the results of tracking defect warnings across Sun's Java runtime library are discussed.

...read moreread less

Abstract: Various static analysis tools will analyze a software artifact in order to identify potential defects, such as misused APIs, race conditions and deadlocks, and security vulnerabilities. For a number of reasons, it is important to be able to track the occurrence of each potential defect over multiple versions of a software artifact understudy: in other words, to determine when warnings reported in multiple versions of the software all correspond the same underlying issue. One motivation for this capability is to remember decisions about code that has been reviewed and found to be safe despite the occurrence of a warning. Another motivation is constructing warning deltas between versions, showing which warnings are new, which have persisted,and which have disappeared. This allows reviewers to focus their efforts on inspecting new warnings. Finally, tracking warnings through a series of software versions reveals where potential defects are introduced and fixed, and how long they persist, exposing interesting trends and patterns.We will discuss two different techniques we have implemented in FindBugs (a static analysis tool to find bugs in Java programs) for tracking defects across versions, discuss their relative merits and how they can be incorporated into the software development process, and discuss the results of tracking defect warnings across Sun's Java runtime library.

...read moreread less

83 citations

Proceedings Article•10.1145/1137983.1138001•

Mining version archives for co-changed lines

[...]

Thomas Zimmermann¹, Sunghun Kim², Andreas Zeller¹, E. James Whitehead²•Institutions (2)

Saarland University¹, University of California, Santa Cruz²

22 May 2006

TL;DR: The annotation graph provides more fine-grained software evolution information such as life cycles of each line and related changes: "Whenever a developer changed line 1 of version.txt she also changed line 25 of Library.java."

...read moreread less

Abstract: Files, classes, or methods have frequently been investigated in recent research on co-change. In this paper, we present a first study at the level of lines. To identify line changes across several versions, we define the annotation graph which captures how lines evolve over time. The annotation graph provides more fine-grained software evolution information such as life cycles of each line and related changes: "Whenever a developer changed line 1 of version.txt she also changed line 25 of Library.java."

...read moreread less

79 citations

Proceedings Article•10.1145/1137983.1138017•

Geographic location of developers at SourceForge

[...]

Gregorio Robles¹, Jesus M. Gonzalez-Barahona¹•Institutions (1)

King Juan Carlos University¹

22 May 2006

TL;DR: This paper has taken the database of users registered at SourceForge, the largest libre software development web-based platform, and inferred their geographical locations, and shows a snapshot of the regional distribution of SourceForge users, which may be a good proxy of the actual distribution oflibre software developers.

...read moreread less

Abstract: The development of libre (free/open source) software is usually performed by geographically distributed teams. Participation in most cases is voluntary, sometimes sporadic, and often not framed by a pre-defined management structure. This means that anybody can contribute, and in principle no national origin has advantages over others, except for the differences in availability and quality of Internet connections and language. However, differences in participation across regions do exist, although there are little studies about them. In this paper we present some data which can be the basis for some of those studies. We have taken the database of users registered at SourceForge, the largest libre software development web-based platform, and have inferred their geographical locations. For this, we have applied several techniques and heuristics on the available data (mainly e-mail addresses and time zones), which are presented and discussed in detail. The results show a snapshot of the regional distribution of SourceForge users, which may be a good proxy of the actual distribution of libre software developers. In addition, the methodology may be of interest for similar studies in other domains, when the available data is similar (as is the case of mailing lists related to software projects).

...read moreread less

43 citations

Proceedings Article•10.1145/1137983.1137993•

An open framework for CVS repository querying, analysis and visualization

[...]

Lucian Voinea¹, Alexandru Telea¹•Institutions (1)

Eindhoven University of Technology¹

22 May 2006

TL;DR: An open framework for visual mining of CVS software repositories is presented and a new technique to enrich the raw data with information about artifacts showing similar evolution is presented.

...read moreread less

Abstract: We present an open framework for visual mining of CVS software repositories. We address three aspects: data extraction, analysis and visualization. We first discuss the challenges of CVS data extraction and storage, and propose a flexible way to deal with CVS implementation inconsistencies. We next present a new technique to enrich the raw data with information about artifacts showing similar evolution. Finally, we propose a visualization backend and show its applicability on industry-size repositories.

...read moreread less

41 citations

Proceedings Article•10.1145/1137983.1137995•

Micro pattern evolution

[...]

Sunghun Kim¹, Kai Pan¹, E. James Whitehead¹•Institutions (1)

University of California, Santa Cruz¹

22 May 2006

TL;DR: This work performs micro-pattern evolution analysis on three open source projects, ArgoUML, Columba, and jEdit to identify micro pattern frequencies, common kinds of pattern evolution, and bug-prone patterns.

...read moreread less

Abstract: When analyzing the evolution history of a software project, we wish to develop results that generalize across projects. One approach is to analyze design patterns, permitting characteristics of the evolution to be associated with patterns, instead of source code. Traditional design patterns are generally not amenable to reliable automatic extraction from source code, yet automation is crucial for scalable evolution analysis. Instead, we analyze "micro pattern" evolution; patterns whose abstraction level is closer to source code, and designed to be automatically extractable from Java source code or bytecode. We perform micro-pattern evolution analysis on three open source projects, ArgoUML, Columba, and jEdit to identify micro pattern frequencies, common kinds of pattern evolution, and bug-prone patterns. In all analyzed projects, we found that the micro patterns of Java classes do not change often. Common bug-prone pattern evolution kinds are 'Pool → Pool', 'Implementor → NONE', and 'Sampler → Sampler'. Among all pattern evolution kinds, 'Box', 'CompoundBox', 'Pool', 'CommonState', and 'Outline' micro patterns have high bug rates, but they have low frequencies and a small number of changes. The pattern evolution kinds that are bug-prone are somewhat similar across projects. The bug-prone pattern evolution kinds of two different periods of the same project are almost identical.

...read moreread less

Proceedings Article•10.1145/1137983.1137990•

TA-RE: an exchange language for mining software repositories

[...]

Sunghun Kim¹, Thomas Zimmermann², Miryung Kim³, Ahmed E. Hassan⁴, Audris Mockus⁵, Tudor Gîrba⁶, Martin Pinzger⁷, E. James Whitehead¹, Andreas Zeller² - Show less +5 more•Institutions (7)

University of California, Santa Cruz¹, Saarland University², University of Washington³, University of Waterloo⁴, Avaya⁵, University of Bern⁶, University of Zurich⁷

22 May 2006

TL;DR: The TA-RE corpus is presented, which collects extracted data from software repositories in order to build a collection of projects that will simplify extraction process and an exchange language capable of making sharing and reusing data as simple as possible is proposed.

...read moreread less

Abstract: Software repositories have been getting a lot of attention from researchers in recent years. In order to analyze software repositories, it is necessary to first extract raw data from the version control and problem tracking systems. This poses two challenges: (1) extraction requires a non-trivial effort, and (2) the results depend on the heuristics used during extraction. These challenges burden researchers that are new to the community and make it difficult to benchmark software repository mining since it is almost impossible to reproduce experiments done by another team. In this paper we present the TA-RE corpus. TA-RE collects extracted data from software repositories in order to build a collection of projects that will simplify extraction process. Additionally the collection can be used for benchmarking. As the first step we propose an exchange language capable of making sharing and reusing data as simple as possible.

...read moreread less

Proceedings Article•10.1145/1137983.1138013•

Information theoretic evaluation of change prediction models for large-scale software

[...]

Mina Askari¹, Ric Holt¹•Institutions (1)

University of Waterloo¹

22 May 2006

TL;DR: This paper analyzes the data extracted from several open source software repositories and develops three probabilistic models to predict which files will have changes or bugs, and evaluates the performance of different prediction models empirically using the proposed information-theoretic approach.

...read moreread less

Abstract: In this paper, we analyze the data extracted from several open source software repositories. We observe that the change data follows a Zipf distribution. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events, i.e., changes or bugs, that happen to each file and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED) in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The third model is called RED-Co-Change. With each modification to a given file, the RED-Co-Change model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the RED-Co-Change model predicts the distribution that is closest to the actual distribution for all the studied systems.

...read moreread less

Proceedings Article•10.1145/1137983.1138018•

Textual Allusions to Artifacts in Software-Related Repositories

[...]

Gina Venolia¹•Institutions (1)

Microsoft¹

22 May 2006

TL;DR: To effectively implement full-text search in the absence of hyperlinks, a proposed method for detecting textual allusions to software artifacts in natural-language prose is proposed.

...read moreread less

Abstract: Much of what is written about a software project is soon forgotten. Software repositories are full of valuable information about the project: Bug descriptions, check-in messages, email and newsgroup archives, specifications, design documents, product documentation, and product support logs contain a wealth of information that can potentially help software developers resolve crucial questions about the history, rationale, and future plans for source code. For a variety of reasons, developers rarely turn to these resources when trying to answer these questions. We are building a full-text search that encompasses multiple repositories. To effectively implement full-text search in the absence of hyperlinks we propose detecting textual allusions to software artifacts in natural-language prose. Allusions are shown to contribute a significant portion of the relationships represented in the graph.

...read moreread less

Proceedings Article•10.1145/1137983.1138022•

A study of the contributors of PostgreSQL

[...]

Daniel M. German¹•Institutions (1)

University of Victoria¹

22 May 2006

TL;DR: This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository.

...read moreread less

Abstract: This report describes some characteristics of the development team of PostgreSQL that were uncovered by analyzing the history of its software artifacts as recorded by the project's CVS repository.

...read moreread less

Proceedings Article•10.1145/1137983.1138024•

Mining software repositories with CVSgrab

[...]

Lucian Voinea¹, Alexandru Telea¹•Institutions (1)

Eindhoven University of Technology¹

22 May 2006

TL;DR: The CVSgrab tool is used to acquire the data and interactively visualize the evolution of ArgoUML and PostgreSQL, in order to answer three relevant questions about the process and team analysis categories of the MSR Mining Challenge 2006.

...read moreread less

Abstract: In this paper we address the process and team analysis categories of the MSR Mining Challenge 2006. We use our CVSgrab tool to acquire the data and interactively visualize the evolution of ArgoUML and PostgreSQL, in order to answer three relevant questions. We conclude summarizing the strong and weak points of using CVSgrab for mining large software repositories.

...read moreread less

Proceedings Article•10.1145/1137983.1138031•

Analyzing OSS developers' working time using mailing lists archives

[...]

Masateru Tsunoda¹, Akito Monden¹, Takeshi Kakimoto¹, Yasutaka Kamei¹, Kenichi Matsumoto¹ - Show less +1 more•Institutions (1)

Nara Institute of Science and Technology¹

22 May 2006

TL;DR: This work used mailing lists (MLs) archives of Postgres to identify developers’ working time and found that the ML of hackers had many more messages than other MLs.

...read moreread less

Abstract: 2. INPUT DATA We used mailing lists (MLs) archives of PostgreSQL, downloaded from http://www.postgresql.org/community/lists/. The MLs mainly consist of user lists and developer lists. We used developer lists archive since we needed developers’ working time. Table 1 explains details of each ML. Figure 1 shows amounts of messages of each ML in the developer lists. Amounts of messages were increasing year by year. The ML of hackers had many more messages than other MLs. We extracted MLs archives till December 2005. Note that most of committers’ messages were automatically generated when source code was checked into software configuration management repository. We picked up “mail sent time” to identify developers’ working time. Getting mail sent time from the MLs archives consists of the following two steps: First, we downloaded the MLs archives with 0 500

...read moreread less

Proceedings Article•10.1145/1137983.1138019•

Enriching revision history with interactions

[...]

Chris Parnin¹, Carsten Görg¹, Spencer Rugaber¹•Institutions (1)

Georgia Institute of Technology¹

22 May 2006

TL;DR: This work suggests augmenting revision histories with the interaction history of programmers, where all historical artifacts associated with the program are included and enables the development of several interesting applications including an influence-recommendation system and a task-mining system.

...read moreread less

Abstract: Revision history provides a rich source of information to improve the understanding of changes made to programs, but it yields only limited insight into how these changes occurred. We explore an additional source of information - program viewing and editing history - where all historical artifacts associated with the program are included. In particular, we suggest augmenting revision histories with the interaction history of programmers. Using this additional information source enables the development of several interesting applications including an influence-recommendation system and a task-mining system. We present some results from a case study in which interaction histories from professional programmers were obtained and analyzed.

...read moreread less

Proceedings Article•10.1145/1137983.1137988•

Productivity analysis of Japanese enterprise software development projects

[...]

Masateru Tsunoda¹, Akito Monden¹, Hiroshi Yadohisa², Nahomi Kikuchi, Kenichi Matsumoto¹ - Show less +1 more•Institutions (2)

Nara Institute of Science and Technology¹, Doshisha University²

22 May 2006

TL;DR: In this article, a software project repository (SEC repository) consisting of 253 enterprise software development projects in Japanese companies, established by Software Engineering Center (SEC), Information-technology Promotion Agency, Japan.

...read moreread less

Abstract: To clarify the relation between controllable attributes of a software development and its productivity, this paper experimentally analyzed a software project repository (SEC repository), consisting of 253 enterprise software development projects in Japanese companies, established by Software Engineering Center (SEC), Information-technology Promotion Agency, Japan. In the experiment, as controllable attributes, we focused on the outsourcing ratio of a software project, defined as an effort outsourced to subcontract companies divided by a whole development effort, and on the effort allocation balance among development phases. Our major findings include both larger outsourcing ratio and smaller upstream process effort leads to worse productivity.

...read moreread less

Proceedings Article•10.1145/1137983.1138026•

Using software birthmarks to identify similar classes and major functionalities

[...]

Takeshi Kakimoto¹, Akito Monden¹, Yasutaka Kamei¹, Haruaki Tamada¹, Masateru Tsunoda¹, Kenichi Matsumoto¹ - Show less +2 more•Institutions (1)

Nara Institute of Science and Technology¹

22 May 2006

TL;DR: Analysis of the similarity of birthmarks for all pairs of classes in ArgoUML and visualized them using Multi-Dimensional Scaling (MDS) identified three pairs of very similar class files that seem to be made by copy-and-paste programming.

...read moreread less

Abstract: Software birthmarks are unique and native characteristics of every software component. Two components having similar birthmarks indicate that they are similar in functionality, structure and im-plementation. Questions addressed in this paper include: Which are similar class files? Can they be gathered into one class file? What are major functionalities among class files? To answer to these questions, this paper analyzed the similarity of birthmarks for all pairs of classes in ArgoUML, and visualized them using Multi-Dimensional Scaling (MDS). As a result, three pairs of very similar class files, which seem to be made by copy-and-paste programming, were identified. Also, four major functionalities were identified in the MDS space.

...read moreread less

Proceedings Article•

The Evolution Radar: Integrating Fine-grained and Coarse-grained Logical Coupling Information

[...]

Marco D'Ambros, Michele Lanza, Mircea Lungu

1 Jan 2006

Proceedings Article•10.1145/1137983.1138029•

Applying the evolution radar to PostgreSQL

[...]

Marco D'Ambros¹, Michele Lanza¹•Institutions (1)

University of Lugano¹

22 May 2006

TL;DR: The database populating process, performed in batch mode, consists in doing a checkout of the system, parsing it and storing the structure information in the database, and parsing the CVS logs and storing all the commit-related information.

...read moreread less

Abstract: 2. INPUT DATA To analyze the target system, i.e., PostgreSQL, we use its whole history, as recorded by the CVS version control system, stored in a database called Release History Database (RHDB) [1, 3]. The database populating process, performed in batch mode, consists in (i) doing a checkout of the system, parsing it and storing the structure information in the database, (ii) parsing the CVS logs and storing all the commit-related information. The RHDB includes information about all the files in the system,i.e., source code, documentation, make-files, etc. For our analysis we consider only the source code data, i.e.,.c and .h files (since PostgreSQL is written in c). We decompose the system using the top-most directories in the src directory tree, i.e., we define a module as all the files belonging to a directory subtree.

...read moreread less

Proceedings Article•10.1145/1137983.1138020•

Using evolutionary annotations from change logs to enhance program comprehension

[...]

Daniel M. German¹, Peter C. Rigby¹, Margaret-Anne Storey¹•Institutions (1)

University of Victoria¹

22 May 2006

TL;DR: A method to automatically create evolutionary annotations from change logs, defect tracking systems and mailing lists is described and the design of a prototype for Eclipse that can filter and present these annotations alongside their corresponding source code and in workbench views is described.

...read moreread less

Abstract: Evolutionary annotations are descriptions of how source code evolves over time. Typical source comments, given their static nature, are usually inadequate for describing how a program has evolved over time; instead, source code comments are typically a description of what a program currently does. We propose the use of evolutionary annotations as a way of describing the rationale behind changes applied to a given program (for example "These lines were added to ..."). Evolutionary annotations can assist a software developer in the understanding of how a given portion of source code works by showing him how the source has evolved into its current form.In this paper we describe a method to automatically create evolutionary annotations from change logs, defect tracking systems and mailing lists. We describe the design of a prototype for Eclipse that can filter and present these annotations alongside their corresponding source code and in workbench views. We use Apache as a test case to demonstrate the feasibility of this approach.

...read moreread less

Proceedings Article•10.1145/1137983.1138028•

Mining refactorings in ARGOUML

[...]

Peter Weißgerber¹, Stephan Diehl¹, Carsten Görg²•Institutions (2)

University of Trier¹, Georgia Institute of Technology²

22 May 2006

TL;DR: This paper combines the results of the refactoring reconstruction technique with bug, mail and release information to perform process and bug analyses of the ARGOUML CVS archive.

...read moreread less

Abstract: In this paper we combine the results of our refactoring reconstruction technique with bug, mail and release information to perform process and bug analyses of the ARGOUML CVS archive.

...read moreread less

Proceedings Article•10.1145/1137983.1138025•

Mining additions of method calls in ArgoUML

[...]

Thomas Zimmermann¹, Silvia Breu², Christian Lindig¹, Benjamin Livshits³•Institutions (3)

Saarland University¹, University of Cambridge², Stanford University³

22 May 2006

TL;DR: In this paper, the authors refine the classical co-change to the addition of method calls and use this concept to find usage patterns and to identify cross-cutting concerns for ArgoUML.

...read moreread less

Abstract: In this paper we refine the classical co-change to the addition of method calls. We use this concept to find usage patterns and to identify cross-cutting concerns for ArgoUML.

...read moreread less

Proceedings Article•10.1145/1137983.1137987•

Scenarios for mining the software architecture evolution

[...]

Yaojin Yang¹, Claudio Riva¹•Institutions (1)

Nokia¹

22 May 2006

TL;DR: This position paper introduces the latest activities on architecture evolution analysis through software repository mining, and introduces a meta-model covering the design and implementation spaces, and defines a set of scenarios that demonstrate the architecturally significant analysis that can be conducted by mining the software repository.

...read moreread less

Abstract: In this position paper, we introduce our latest activities on architecture evolution analysis through software repository mining. The traditional approaches for software repository mining provide means for analyzing source-level information. However, we believe that software repository mining can also provide valuable results for analyzing the system evolution at the architectural level.There are two challenges for analyzing the architecture evolution. The first one is to have in place a process for recovering the architectural models of the various releases. Architecture evolution is often visible only in the evolution of the implementation and this complicates the monitoring process. The second one is to have access to the past design models that were created by the architects during the design phase. A practical solutions for versioning the architectural models is not in use yet and this complicates the possibility of accessing the past design decisions.Analyzing architecture evolution through software repository mining represents the most promising choice. In order to conduct the analysis through software repository mining, we introduce our meta-model covering the design and implementation spaces. Then, we define a set of scenarios that demonstrate the architecturally significant analysis that we can conduct by mining the software repository.

...read moreread less

Proceedings Article•10.1145/1137983.1138032•

Where is bug resolution knowledge stored

[...]

Gerardo Canfora¹, Luigi Cerulo¹•Institutions (1)

University of Sannio¹

22 May 2006

TL;DR: This paper analyzes ArgoUML software repositories with a tool and shows what are Bugzilla fields that better predict code entities impacted by a new bug report, that is where knowledge about bug resolution is stored.

...read moreread less

Abstract: ArgoUML uses both CVS and Bugzilla to keep track of bug-fixing activities since 1998. A common practice is to reference source code changes resolving a bug stored in Bugzilla by inserting the id number of the bug in the CVS commit notes. This relationship reveals useful to predict code entities impacted by a new bug report.In this paper we analyze ArgoUML software repositories with a tool, we have implemented, showing what are Bugzilla fields that better predict such impact relationship, that is where knowledge about bug resolution is stored.

...read moreread less

Proceedings Article•10.1145/1137983.1138005•

Software engineering applications of logic file system: application to automated multi-criteria indexation of software components

[...]

Benjamin Sigonneau¹, Olivier Ridoux¹•Institutions (1)

University of Rennes¹

22 May 2006

TL;DR: This work presents several applications of Logic file system to software engineering: multi-criteria indexation of software components, multi-concern browsing of source files, and bug finding in test traces.

...read moreread less

Abstract: Logic information systems use formal concept analysis in a novel way to manage information. A file system implementation has been designed under the name of Logic file system. It offers a flexible management of non-hierarchical data. We present several applications of Logic file system to software engineering: multi-criteria indexation of software components, multi-concern browsing of source files, and bug finding in test traces.We detail multi-criteria indexing of software components. Three independent indexing frameworks are developed and merged in a single multi-criteria framework. The three indexing frameworks capture formal criteria like type isomorphisms and inheritance relations, semi-formal criteria like naming conventions, and informal criteria like keywords of comments. We show how the logical orientation of Logic file system helps in capturing all these criteria in a single framework.

...read moreread less

Proceedings Article•10.1145/1137983.1138004•

Concern based mining of heterogeneous software repositories

[...]

Imed Hammouda¹, Kai Koskimies¹•Institutions (1)

Tampere University of Technology¹

22 May 2006

TL;DR: This work proposes a conceptual framework for a concern-oriented query language for software repositories, and a pattern-based implementation scheme is discussed, exploiting existing tools.

...read moreread less

Abstract: In the current trend of software engineering, software systems are viewed as clusters of overlapping structures representing various concerns, covering heterogeneous artifacts like models, code, resource files etc. In those cases, adequate search mechanisms for software repositories should be based on such fragmented nature of software systems, allowing concern-oriented queries on the system data. For this purpose, we propose a conceptual framework for a concern-oriented query language for software repositories. A pattern-based implementation scheme is discussed, exploiting existing tools. The applicability of the approach is studied in the context of an industrial case study.

...read moreread less

Proceedings Article•10.1145/1137983.1137984•

Introduction to MSR 2006

[...]

Stephan Diehl¹, Harald C. Gall², Martin Pinzger², Ahmed E. Hassan³•Institutions (3)

University of Trier¹, University of Zurich², BlackBerry Limited³

22 May 2006

TL;DR: Following the success of the first two iterations of the MSR workshop in 2004 and 2005, MSR 2006 attracted even more submissions and received 45 papers from 15 different countries, which were accepted for presentation at the workshop and inclusion in the proceedings.

...read moreread less

Abstract: Software repositories such as source control systems,defect tracking systems,or archived communications between project personnel are used to help manage the progress of software projects.Software practitioners and researchers are beginning to recognize the potential bene .t of mining this information to support the maintenance of software systems,improve software design/reuse,and empirically validate novel ideas and techniques.Research is now proceeding to uncover the ways in which mining these repositories can help to understand software development,to support predictions about software development,and to plan various aspects of software projects.Following the success of the first two iterations of the MSR workshop in 2004 and 2005,MSR 2006 attracted even more submissions:We received 45 papers from 15 different countries.The international program committee accepted 16 full and 12 short papers for presentation at the workshop and inclusion in the proceedings.We are grateful for the excellent and professional review job done by the reviewers on such a tight schedule.

...read moreread less