About: Software feature is a research topic. Over the lifetime, 120 publications have been published within this topic receiving 924 citations. The topic is also known as: feature & capability.
TL;DR: The investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches and should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision.
Abstract: Communication about requirements is often handled in issue tracking systems, especially in a distributed setting. As issue tracking systems also contain bug reports or programming tasks, the software feature requests of the users are often difficult to identify. This paper investigates natural language processing and machine learning features to detect software feature requests in natural language data of issue tracking systems. It compares traditional linguistic machine learning features, such as "bag of words", with more advanced features, such as subject-action-object, and evaluates combinations of machine learning features derived from the natural language and features taken from the issue tracking system meta-data. Our investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches. We show that issues or data fields (e.g. descriptions or comments), which contain software feature requests, can be identified reasonably well, but hardly the exact sentence. Finally, we show that the choice of machine learning algorithms should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision. In addition, the paper contributes a double coded gold standard and an open-source implementation to further pursue this topic.
TL;DR: The use of time in the weighting of terms, along with the use of only the noun terms, makes significant improvements to a feature location approach that relies on textual information.
Abstract: Context Feature location aims to identify the source code location corresponding to the implementation of a software feature. Many existing feature location methods apply text retrieval to determine the relevancy of the features to the text data extracted from the software repositories. One of the preprocessing activities in text retrieval is term-weighting, which is used to adjust the importance of a term within a document or corpus. Common term-weighting techniques may not be optimal to deal with text data from software repositories due to the origin of term-weighting techniques from a natural language context. Objective This paper describes how the consideration of when the terms were used in the repositories, under the condition of weighting only the noun terms, can improve a feature location approach. Method We propose a feature location approach using a new term-weighting technique that takes into account how recently a term has been used in the repositories. In this approach, only the noun terms are weighted to reduce the dataset volume and avoid dealing with dimensionality reduction. Results An empirical evaluation of the approach on four open-source projects reveals improvements to the accuracy, effectiveness and performance up to 50%, 17%, and 13%, respectively, when compared to the commonly-used Vector Space Model approach. The comparison of the proposed term-weighting technique with the Term Frequency-Inverse Document Frequency technique shows accuracy, effectiveness, and performance improvements as much as 15%, 10%, and 40%, respectively. The investigation of using only noun terms, instead of using all terms, in the proposed approach also indicates improvements up to 28%, 21%, and 58% on accuracy, effectiveness, and performance, respectively. Conclusion In general, the use of time in the weighting of terms, along with the use of only the noun terms, makes significant improvements to a feature location approach that relies on textual information.
TL;DR: It is argued in this work that over-requirement is due partially to the emotional involvement of developers with the software features they specify, and insights into behavioral effects in the context of software development are provided.
TL;DR: The proposed framework is general to various software products with mass user reviews and semi-automatic without much human efforts and intervention and helps managers better understand user feedback on the software functionality and make feature refinement plan for the upcoming releases.
Abstract: Context Online software reviews have provided a wealth of user feedback on software applications. User reviews along with ratings have been influential in a series of software engineering tasks e.g. software maintenance and release planning. Objective Our research aims to assist managers in prioritizing features to be refined in next release from the perspective of enhancing user ratings via mining online reviews. Method We first extract software features from user reviews and determine their probability distribution in each review with LDA. Then the ground truth rating of each feature is estimated by linear regression under the assumption that the software functionality rating is a convex combination of all feature ratings weighted by their distribution probabilities over the review. Finally, we formalize feature refinement prioritization as an optimization problem which maximizes user group’s rating on the software functionality under the constraint of development budget. Results The proposed approach can use topic model to jointly extract features from user reviews semi-supervisedly and determine each feature’s weight in each user’s rating on the software functionality. The estimated ground truth ratings of all features reveal how reviewer group evaluate those features. Finally, we provide an illustrative example to demonstrate the key idea of our framework. Conclusion Our proposed framework is general to various software products with mass user reviews and semi-automatic without much human efforts and intervention. The framework’s interpretability helps managers better understand user feedback on the software functionality and make feature refinement plan for the upcoming releases.
TL;DR: In this article, the authors compared traditional linguistic machine learning features, such as bag of words, with more advanced features such as subject-action-object, and evaluated combinations of machine learning feature derived from the natural language and features taken from the issue tracking system meta-data.
Abstract: Communication about requirements is often handled in issue tracking systems, especially in a distributed setting. As issue tracking systems also contain bug reports or programming tasks, the software feature requests of the users are often difficult to identify. This paper investigates natural language processing and machine learning features to detect software feature requests in natural language data of issue tracking systems. It compares traditional linguistic machine learning features, such as "bag of words", with more advanced features, such as subject-action-object, and evaluates combinations of machine learning features derived from the natural language and features taken from the issue tracking system meta-data. Our investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches. We show that issues or data fields (e.g. descriptions or comments), which contain software feature requests, can be identified reasonably well, but hardly the exact sentence. Finally, we show that the choice of machine learning algorithms should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision. In addition, the paper contributes a double coded gold standard and an open-source implementation to further pursue this topic.