TL;DR: This work investigates using natural language processing (NLP) techniques to identify duplicates in defect reports at Sony Ericsson mobile communications, and shows that about 2/3 of the duplicates can possibly be found using the NLP techniques.
Abstract: Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using natural language processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson mobile communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential.
TL;DR: A comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic, and exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering.
TL;DR: This paper leverages recent advances on using discriminative models for information retrieval to detect duplicate bug reports more accurately and shows that this technique could result in 17--31%, 22--26%, and 35--43% relative improvement over state-of-the-art techniques in OpenOffice, Firefox, and Eclipse datasets respectively using commonly available natural language information only.
Abstract: Bug repositories are usually maintained in software projects. Testers or users submit bug reports to identify various issues with systems. Sometimes two or more bug reports correspond to the same defect. To address the problem with duplicate bug reports, a person called a triager needs to manually label these bug reports as duplicates, and link them to their "master" reports for subsequent maintenance work. However, in practice there are considerable duplicate bug reports sent daily; requesting triagers to manually label these bugs could be highly time consuming. To address this issue, recently, several techniques have be proposed using various similarity based metrics to detect candidate duplicate bug reports for manual verification. Automating triaging has been proved challenging as two reports of the same bug could be written in various ways. There is still much room for improvement in terms of accuracy of duplicate detection process. In this paper, we leverage recent advances on using discriminative models for information retrieval to detect duplicate bug reports more accurately. We have validated our approach on three large software bug repositories from Firefox, Eclipse, and OpenOffice. We show that our technique could result in 17--31%, 22--26%, and 35--43% relative improvement over state-of-the-art techniques in OpenOffice, Firefox, and Eclipse datasets respectively using commonly available natural language information only.
TL;DR: A usability study which evaluated a graduate school’s website using a crowdsourcing platform and a similar but not identical traditional lab usability test on the same site finds that crowdsourcing exhibits some notable limitations in comparison to the traditional lab environment.
Abstract: While usability evaluation is critical to designing usable websites, traditional usability testing can be both expensive and time consuming. The advent of crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower offer an intriguing new avenue for performing remote usability testing with potentially many users, quick turn-around, and significant cost savings. To investigate the potential of such crowdsourced usability testing, we conducted a usability study which evaluated a graduate school’s website using a crowdsourcing platform. In addition, we performed a similar but not identical traditional lab usability test on the same site. While we find that crowdsourcing exhibits some notable limitations in comparison to the traditional lab environment, its applicability and value for usability testing is clearly evidenced. We discuss both methodological differences for crowdsourced usability testing, as well as empirical contrasts to results from more traditional, face-to-face usability testing.
TL;DR: The experiments demonstrate that CrowdOracles are a viable solution to automate the oracle problem, yet taming the crowd to get useful results is a difficult task.
Abstract: Despite the recent advances in test generation, fully automatic software testing remains a dream: Ultimately, any generated test input depends on a test oracle that determines correctness, and, except for generic properties such as “the program shall not crash”, such oracles require human input in one form or another. CrowdSourcing is a recently popular technique to automate computations that cannot be performed by machines, but only by humans. A problem is split into small chunks, that are then solved by a crowd of users on the Internet. In this paper we investigate whether it is possible to exploit CrowdSourcing to solve the oracle problem: We produce tasks asking users to evaluate CrowdOracles - assertions that reflect the current behavior of the program. If the crowd determines that an assertion does not match the behavior described in the code documentation, then a bug has been found. Our experiments demonstrate that CrowdOracles are a viable solution to automate the oracle problem, yet taming the crowd to get useful results is a difficult task.