Proceedings Article10.1145/1882362.1882410
Software is data too
Andrian Marcus,Tim Menzies +1 more
- 07 Nov 2010
- pp 229-232
TL;DR: It is argued in this position paper that data mining, statistical analysis, machine learning, information retrieval, data integration, etc., are necessary solutions to deal with software data.
read more
Abstract: Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. We argue in this position paper that data mining, statistical analysis, machine learning, information retrieval, data integration, etc., are necessary solutions to deal with software data. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created.In order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners. Software engineering researchers can get inspired by other fields, such as, bioinformatics, where results of mining and analyzing biological data are often stored in databases shared across the world.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evaluating source code summarization techniques: Replication and expansion
Brian P. Eddy,Jeffrey Robinson,Nicholas A. Kraft,Jeffrey C. Carver +3 more
- 20 May 2013
TL;DR: A new topic modeling based approach to source code summarization is proposed, and via a study of 14 developers, source code summaries generated using the proposed technique are evaluated.
174
Data journals: A survey
Leonardo Candela,Donatella Castelli,Paolo Manghi,Alice Tani +3 more
- 01 Sep 2015
TL;DR: This study of more than 100 currently existing data journals describes the approaches they promote for data set description, availability, citation, quality, and open access and identifies ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.
Structural information based term weighting in text retrieval for feature location
Blake Bassett,Nicholas A. Kraft +1 more
- 20 May 2013
TL;DR: This paper studies over 400 bugs and features from five open source Java systems and finds that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.
Goldfish bowl panel: software development analytics
Tim Menzies,Thomas Zimmermann +1 more
- 02 Jun 2012
TL;DR: This panel will address the open issues with analytics and address the potential and strengths and weaknesses of the current generation of analytics tools.
Modeling the ownership of source code topics
Christopher S. Corley,Elizabeth A. Kammer,Nicholas A. Kraft +2 more
- 11 Jun 2012
TL;DR: This paper combines software repository mining and topic modeling to measure the ownership of linguistic topics in source code and finds that classes that belong to the same linguistic topic tend to have similar ownership characteristics, which suggests that conceptually related classes often share the same owner.
References
Lecture Notes in Artificial Intelligence
P. Brezillon,P. Bouquet +1 more
- 01 Jan 1999
TL;DR: The topics in LNAI include automated reasoning, automated programming, algorithms, knowledge representation, agent-based systems, intelligent systems, expert systems, machine learning, natural-language processing, machine vision, robotics, search systems, knowledge discovery, data mining, and related programming languages.
7.5K
Data integration: a theoretical perspective
Maurizio Lenzerini
- 03 Jun 2002
TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Developing Multiagent Systems with agentTool
Scott A. DeLoach,Mark F. Wood +1 more
- 07 Jul 2000
TL;DR: MaSE guides a designer from an initial system specification to implementation by guiding the designer through a set of inter-related graphically based system models as envisioned by MaSE.
1.7K
Data Mining Static Code Attributes to Learn Defect Predictors
TL;DR: It is shown that static code attributes used to build defect predictors are much more important than which particular attributes are used, and contrary to prior pessimism, they are demonstrably useful and yield predictors with a mean probability of detection and mean false alarms rates.
Advancing candidate link generation for requirements tracing: the study of methods
TL;DR: This paper defines goals for a tracing tool based on analyst responsibilities in the tracing process, introduces several new measures for validating that the goals have been satisfied, and presents a prototype tool that is built, RETRO (REquirements TRacing On-target), to address these goals.