TL;DR: TsDetect as mentioned in this paper is an automated test smell detection tool for Java software systems that uses a set of detection rules to locate existing test smells in test code and evaluate the effectiveness on a benchmark of 65 unit test files containing instances of 19 test smell types.
Abstract: The test code, just like production source code, is subject to bad design and programming practices, also known as smells. The presence of test smells in a software project may affect the quality, maintainability, and extendability of test suites making them less effective in finding potential faults and quality issues in the project's production code. In this paper, we introduce tsDetect, an automated test smell detection tool for Java software systems that uses a set of detection rules to locate existing test smells in test code. We evaluate the effectiveness of tsDetect on a benchmark of 65 unit test files containing instances of 19 test smell types. Results show that tsDetect achieves a high detection accuracy with an average precision score of 96% and an average recall score of 97%. tsDetect is publicly available, with a demo video, at: https://testsmells.github.io/
TL;DR: A Neural Machine Translation (NMT) based approach called Atlas (AuTomatic Learning of Assert Statements) to automatically generate meaningful assert statements for test methods to be used as a complement to automatic test case generation techniques and a code completion support for developers.
Abstract: Software testing is an essential part of the software lifecycle and requires a substantial amount of time and effort. It has been estimated that software developers spend close to 50% of their time on testing the code they write. For these reasons, a long standing goal within the research community is to (partially) automate software testing. While several techniques and tools have been proposed to automatically generate test methods, recent work has criticized the quality and usefulness of the assert statements they generate. Therefore, we employ a Neural Machine Translation (NMT) based approach called Atlas(AuTomatic Learning of Assert Statements) to automatically generate meaningful assert statements for test methods. Given a test method and a focal method (i.e.,the main method under test), Atlas can predict a meaningful assert statement to assess the correctness of the focal method. We applied Atlas to thousands of test methods from GitHub projects and it was able to predict the exact assert statement manually written by developers in 31% of the cases when only considering the top-1 predicted assert. When considering the top-5 predicted assert statements, Atlas is able to predict exact matches in 50% of the cases. These promising results hint to the potential usefulness ofour approach as (i) a complement to automatic test case generation techniques, and (ii) a code completion support for developers, whocan benefit from the recommended assert statements while writing test code.
TL;DR: In this paper, a Neural Machine Translation (NMT) based approach called Atlas (AuTomatic Learning of Assert Statements) is employed to automatically generate meaningful assert statements for test methods.
Abstract: Software testing is an essential part of the software lifecycle and requires a substantial amount of time and effort. It has been estimated that software developers spend close to 50% of their time on testing the code they write. For these reasons, a long standing goal within the research community is to (partially) automate software testing. While several techniques and tools have been proposed to automatically generate test methods, recent work has criticized the quality and usefulness of the assert statements they generate. Therefore, we employ a Neural Machine Translation (NMT) based approach called Atlas (AuTomatic Learning of Assert Statements) to automatically generate meaningful assert statements for test methods. Given a test method and a focal method (i.e., the main method under test), Atlas can predict a meaningful assert statement to assess the correctness of the focal method. We applied Atlas to thousands of test methods from GitHub projects and it was able to predict the exact assert statement manually written by developers in 31% of the cases when only considering the top-1 predicted assert. When considering the top-5 predicted assert statements, Atlas is able to predict exact matches in 50% of the cases. These promising results hint to the potential usefulness of our approach as (i) a complement to automatic test case generation techniques, and (ii) a code completion support for developers, who can benefit from the recommended assert statements while writing test code.
TL;DR: In this article, a fully unsupervised neural transcompiler is proposed to translate functions between C++, Java, and Python with high accuracy using monolingual source code.
Abstract: A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
TL;DR: Pynguin this article is an automated unit test generation framework for dynamically typed languages such as Python, which uses evolutionary algorithms to generate test cases for statically typed languages with incomplete type information.
Abstract: Automated unit test generation is an established research field, and mature test generation tools exist for statically typed programming languages such as Java. It is, however, substantially more difficult to automatically generate supportive tests for dynamically typed programming languages such as Python, due to the lack of type information and the dynamic nature of the language. In this paper, we describe a foray into the problem of unit test generation for dynamically typed languages. We introduce Pynguin, an automated unit test generation framework for Python. Using Pynguin, we aim to empirically shed light on two central questions: (1) Do well-established search-based test generation methods, previously evaluated only on statically typed languages, generalise to dynamically typed languages? (2) What is the influence of incomplete type information and dynamic typing on the problem of automated test generation? Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and can even alleviate the problem of absent type information to some degree. However, our results demonstrate that dynamic typing nevertheless poses a fundamental issue for test generation, suggesting future work on integrating type inference.
TL;DR: Pynguin is introduced, an automated unit test generation framework for Python that aims to empirically shed light on two central questions: do well-established search-based test generation methods, previously evaluated only on statically typed languages, generalise to dynamically typed languages?
Abstract: Automated unit test generation is an established research field, and mature test generation tools exist for statically typed programming languages such as Java. It is, however, substantially more difficult to automatically generate supportive tests for dynamically typed programming languages such as Python, due to the lack of type information and the dynamic nature of the language. In this paper, we describe a foray into the problem of unit test generation for dynamically typed languages. We introduce Pynguin, an automated unit test generation framework for Python. Using Pynguin, we aim to empirically shed light on two central questions: (1) Do well-established search-based test generation methods, previously evaluated only on statically typed languages, generalise to dynamically typed languages? (2) What is the influence of incomplete type information and dynamic typing on the problem of automated test generation? Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and can even alleviate the problem of absent type information to some degree. However, our results demonstrate that dynamic typing nevertheless poses a fundamental issue for test generation, suggesting future work on integrating type inference.
TL;DR: RAIDE as mentioned in this paper is an open-source and IDE-integrated tool that assists testers with an environment for automated detection of lines of code affected by test smells, as well as a semi-automated refactoring for Java projects using the JUnit framework.
Abstract: Test smells are fragments of code that can affect the comprehensibility and the maintainability of the test code. Preventing, detecting, and correcting test smells are tasks that may require a lot of effort, and might not scale to large-sized projects when carried out manually. Currently, there are many tools available to support test smells detection. However, they usually do not provide neither a user-friendly interface nor automated support for refactoring the test code to remove test smells. In this work, we propose RAIDE, an open-source and IDE-integrated tool. RAIDE assists testers with an environment for automated detection of lines of code affected by test smells, as well as a semi-automated refactoring for Java projects using the JUnit framework.
TL;DR: This paper proposes AthenaTest, an approach that aims at generating unit test cases by learning from real-world, developer-written test cases, relying on a state-of-the-art sequence-to-sequence transformer model which is able to write useful test cases for a given method under test.
Abstract: Automated Unit Test Case generation has been the focus of extensive literature within the research community. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult to read or understand for developers. In this paper we propose AthenaTest, an approach that aims at generating unit test cases by learning from real-world, developer-written test cases. Our approach relies on a state-of-the-art sequence-to-sequence transformer model which is able to write useful test cases for a given method under test (i.e., focal method). We also introduce methods2test - the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 630k test cases mined from 70k open-source repositories hosted on GitHub. We use this dataset to train a transformer model to translate focal methods into the corresponding test cases. We evaluate the ability of our model in generating test cases using natural language processing as well as code-specific criteria. First, we assess the quality of the translation compared to the target test case, then we analyze properties of the test case such as syntactic correctness and number and variety of testing APIs (e.g., asserts). We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated test cases.
TL;DR: The experiments showed that the type of refactoring operations performed by developers on test files differ from those performed on non-test files, and results around test smells show a co-occurrence between certain smell types and refactoring, and how refactorings are utilized to eliminate smells.
Abstract: An essential activity of software maintenance is the refactoring of source code. Refactoring operations enable developers to take necessary actions to correct bad programming practices (i.e., smells) in the source code of both production and test files. With unit testing being a vital and fundamental part of ensuring the quality of a system, developers must address smelly test code. In this paper, we empirically explore the impact and relationship between refactoring operations and test smells in 250 open-source Android applications (apps). Our experiments showed that the type of refactoring operations performed by developers on test files differ from those performed on non-test files. Further, results around test smells show a co-occurrence between certain smell types and refactorings, and how refactorings are utilized to eliminate smells. Findings from this study will not only further our knowledge of refactoring operations on test files, but will also help developers in understanding the possible ways on how to maintain their apps.
TL;DR: This paper presents in this paper a methodology supported by the Asm2C++ tool, which allows the users to generate C++ code from abstract state machine models and devised a process able to test the generated code by reusing unit tests.
Abstract: According to best practices of model-driven engineering, the implementation of a system should be obtained from its model through a systematic model-to-code transformation. We present in this paper a methodology supported by the Asm2C++ tool, which allows the users to generate C++ code from abstract state machine models. Thanks to Asm2C++, the implementation is generated in a seamless manner with an assurance of potential bug freeness of the generated code. Following the same approach, model-based testing suggests deriving also (unit) tests from abstract models. We extend the Asm2C++ tool such that it can automatically produce unit tests for the generated code. Abstract test sequences, either generated randomly or through model checking, are translated to concrete C++ unit tests using the Boost library. In a similar manner, also, scenarios are generated in a behavior-driven development (BDD) approach. To guarantee the correctness of the transformation process, we define a mechanism to test the correctness of the model-to-code transformation with respect to two main criteria: syntactical correctness and semantic correctness, which is based on the definition of conformance between the specification and the code. Using this approach, we have devised a process able to test the generated code by reusing unit tests. The process has been used to validate our model-to-code transformations.
TL;DR: The methodology and statistical analysis of the results, presents the results achieved by the contestant tools and highlights the challenges the team faced during the competition are described.
Abstract: We report on the results of the eighth edition of the Java unit testing tool competition. This year, two tools, EvoSuite and Randoop, were executed on a benchmark with (i) new classes under test, selected from open-source software projects, and (ii) the set of classes from one project considered in the previous edition. We relied on an updated infrastructure for the execution of the different tools and the subsequent coverage and mutation analysis based on Docker containers. We considered two different time budgets for test case generation: one an three minutes. This paper describes our methodology and statistical analysis of the results, presents the results achieved by the contestant tools and highlights the challenges we faced during the competition.
TL;DR: A novel, formal and operational approach that addresses the open challenging issues of modeling, verifying, and testing intelligent critical avionics systems by unifying the three challenges and considering the intelligence, autonomy, and accountability of the components as first citizen concepts is introduced.
Abstract: The paper contributes by introducing a novel, formal and operational approach that addresses the open challenging issues of modeling, verifying, and testing intelligent critical avionics systems. We advance the state-of-the-art by unifying the three challenges and considering the intelligence, autonomy, and accountability of the components as first citizen concepts. The proposed methodology is effectively applied to a real, practical and complex case study of intelligent avionics systems, namely the landing gear system and uses multi-agent systems to model each main component in the system as an intelligent agent. We also introduce the formalism of extended interpreted systems that supports intelligence, autonomy, communication, input and output actions, predicate conditions and post-conditions. The paper adopts the computation tree logic of conditional commitments to model communication among autonomous agents and trace its progress. The symbolic model checker of this logic is used to run the verification of the system model, encoded in an extended input language, against coverage criteria and properties. Furthermore, we introduce a new testing methodology that: 1) Follows a test-driven development approach; 2) performs unit testing, component testing, and system testing in each increment; and 3) uses model checking to generate automatically counterexamples and witness traces interpreted into concrete test suites that achieve new coverage criteria. The experimental results showed the efficiency and scalability of the developed approach against a transformation-based technique. Finally, the computational complexity of the developed approach is analysed.
TL;DR: This work analyzes if the existing standard definitions of unit and integration tests are still valid in modern software development contexts and suggests that the current property-based definitions may be exchanged with usage- based definitions.
TL;DR: Pedal is so named because it supports the PEDAgogical goals of instructors and is an expandable Library of components motivated by these goals and comes with components for type inferencing, flow analysis, pattern matching, and unit testing.
Abstract: This paper describes Pedal, an innovative approach to the automated creation of feedback given to students in programming classes. Pedal is so named because it supports the PEDAgogical goals of instructors and is an expandable Library of components motivated by these goals. Pedal currently comes with components for type inferencing, flow analysis, pattern matching, and unit testing to provide an instructor with a rich set of resources to use in authoring and prioritizing feedback. The larger vision is the loosely-coupled architecture whose components can be readily expanded or replaced. The Pedal library components are motivated by a study of contemporary automated feedback systems and our own experience. Pedal's components are described and examples are given of Pedal-based feedback from three different introductory classes at two different universities. The integration of Pedal into several programming and autograding environments is briefly described.
TL;DR: A new, pattern-based approach that can help developers improve the quality of test names of JUnit tests by making them more descriptive by detecting non-descriptive test names and in some cases, providing additional information about how the name can be improved.
Abstract: Unit tests are an important artifact that supports the software development process in several ways. For example, when a test fails, its name can provide the first step towards understanding the purpose of the test. Unfortunately, unit tests often lack descriptive names. In this paper, we propose a new, pattern-based approach that can help developers improve the quality of test names of JUnit tests by making them more descriptive. It does this by detecting non-descriptive test names and in some cases, providing additional information about how the name can be improved. Our approach was assessed using an empirical evaluation on 34352 JUnit tests. The results of the evaluation show that the approach is feasible, accurate, and useful at discriminating descriptive and non-descriptive names with a 95% true-positive rate.
TL;DR: This paper presents a mocking solution prototype for the OutSystems low-code development platform that removes dependencies to components that the developer wants to abstract a test from, as for instance web services or other pieces of logic of an application.
Abstract: Unit testing is a core component of continuous integration and delivery, which in turn is key to faster and more frequent delivery of solutions to customers. Testing at the unit level allows program components to be tested in complete isolation, therefore these tests can be carried out quicker thus reducing troubleshoot time. But to test at this level, dependencies between application components (e.g. a web service connection) need to be removed. There have been advances in mocking and stubbing techniques that remove these dependencies. However, these advances have been made for high-level programming languages, while low-code development technology has yet to take full advantage of these techniques. This paper presents a mocking solution prototype for the OutSystems low-code development platform. The proposed mocking mechanism removes dependencies to components that the developer wants to abstract a test from, as for instance web services or other pieces of logic of an application.
TL;DR: In this article, a pattern-based approach is proposed to improve the quality of test names of JUnit tests by detecting non-descriptive test names and in some cases, providing additional information about how the name can be improved.
TL;DR: WebJShrink is a visual analytics tool for analyzing and pruning bloated software projects that provides rich visualizations of the bloat lurking within a target project's internal structure, and returns a safer, slimmer variant of the software project.
Abstract: As software projects grow in complexity, they come packaged with under-utilized libraries and therefore become bloated. Though several software debloating tools exist, none of them help developers gain insights into how under-utilized those libraries are nor help developers build confidence in the behavior preservation of software after debloating. To bridge this gap, we developed WebJShrink, a visual analytics tool for analyzing and pruning bloated software projects. WebJShrink is built on JShrink which uses static and dynamic reachability analysis to determine the extent of software bloat. WebJShrink provides rich visualizations of the bloat lurking within a target project's internal structure. It then removes unused features, and returns a safer, slimmer variant of the software project. To illustrate the target project's behavior preservation, WebJShrink examines the debloated software with its JUnit tests and visualizes the test results. In evaluating WebJShrink against 26 real world systems, we found WebJShrink could reduce software size by up to 42%, 11% on average, while still passing 100% of unit tests after debloating. We provide a video demonstrating WebJShrink at https://youtu.be/yzVzcd-MJ1w.
TL;DR: This work is able to find what code is missing tests by identifying code entities which are not tested in the same way as other similar entities and shows how a code entity with a missing test should be tested by leveraging the tests written for those similar entities.
Abstract: Because tests are important to the development process, developers need to know when a test suite is missing tests. Missing tests—tests that should be included in a test suite but are not—reduce the utility that developers can derive from a test suite. Currently, developers find missing tests by using coverage information such as line coverage or mutation coverage. However, coverage metrics are limited in their ability to reveal missing tests and show only what code needs to be tested, not how to test it.We present a method for finding missing tests that addresses the shortcomings of coverage metrics based on the fact that similar code entities are often tested in the same way. We are able to find what code is missing tests by identifying code entities which are not tested in the same way as other similar entities. We then show how a code entity with a missing test should be tested by leveraging the tests written for those similar entities. Our results show that our approach offers several benefits over a coverage-based approach and is able to find missing tests in a range of software projects while generating few erroneous identifications of missing tests.
TL;DR: An approach called PANKTI is devised which monitors applications as they execute in production, and then automatically generates unit tests from the collected production data, and shows that the generated tests indeed improve the quality of the test suite of the application under consideration.
Abstract: In this paper, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite, or methods that are poorly tested. We devise an approach called PANKTI which monitors applications as they execute in production, and then automatically generates differential unit tests, as well as derived oracles, from the collected data. PANKTI's monitoring and generation focuses on one single programming language, Java. We evaluate it on three real-world, open-source projects: a videoconferencing system, a PDF manipulation library, and an e-commerce application. We show that PANKTI is able to generate differential unit tests by monitoring target methods in production, and that the generated tests improve the quality of the test suite of the application under consideration.
TL;DR: A testing library called Advanced Program Organization Unit Testing Framework written in native language, built according to the unit testing paradigm, and supporting automated testing for simple and complex scenarios of IEC 61131-3—compliant PLCs is presented.
Abstract: Programmable logic controllers (PLCs) are the most used digital systems in manufacturing industry, but there is little support for test automation of such systems. As net result, testing is mostly done manually or not at all despite the recommendations of the IEC 61131-3 Standard. Attempts to provide an automated testing framework for PLCs have been recently performed with first successful results. The most advanced and promising framework proposes an approach close to object orientation that relies on nonnative language and platform. In this article, we propose a testing library called Advanced Program Organization Unit Testing Framework written in native language, built according to the unit testing paradigm, and supporting automated testing for simple and complex scenarios of IEC 61131-3—compliant PLCs. In this article, we present such library, discuss its performance and advantages, and illustrate its application to a real case study.
TL;DR: In this article, the authors demonstrate the capabilities of information retrieval for prioritizing tests in dynamic programming languages using Python as an example, and conclude that lightweight IR-based prioritization strategies are effective tools to predict failing tests in the absence of coverage data.
Abstract: The practice of unit testing enables programmers to obtain automated feedback on whether a currently edited program is consistent with the expectations specified in test cases. Feedback is most valuable when it happens immediately, as defects can be corrected instantly before they become harder to fix. With growing and longer running test suites, however, feedback is obtained less frequently and lags behind program changes.
The objective of test prioritization is to rank tests so that defects, if present, are found as early as possible or with the least costs. While there are numerous static approaches that output a ranking of tests solely based on the current version of a program, we focus on change-based test prioritization, which recommends tests that likely fail in response to the most recent program change. The canonical approach relies on coverage data and prioritizes tests that cover the changed region, but obtaining and updating coverage data is costly. More recently, information retrieval techniques that exploit overlapping vocabulary between change and tests have proven to be powerful, yet lightweight.
In this work, we demonstrate the capabilities of information retrieval for prioritizing tests in dynamic programming languages using Python as example. We discuss and measure previously understudied variation points, including how contextual information around a program change can be used, and design alternatives to the widespread \emph{TF-IDF} retrieval model tailored to retrieving failing tests.
To obtain program changes with associated test failures, we designed a tool that generates a large set of faulty changes from version history along with their test results. Using this data set, we compared existing and new lexical prioritization strategies using four open-source Python projects, showing large improvements over untreated and random test orders and results consistent with related work in statically typed languages.
We conclude that lightweight IR-based prioritization strategies are effective tools to predict failing tests in the absence of coverage data or when static analysis is intractable like in dynamic languages. This knowledge can benefit both individual programmers that rely on fast feedback, as well as operators of continuous integration infrastructure, where resources can be freed sooner by detecting defects earlier in the build cycle.
TL;DR: This dissertation proposes an approach that automatically predicts whether a test would manifest performance regressions in a code commit, an approach to recovering field-representative workload that can be used to detect performance regression, and proposes that using execution logs generated by unit tests to predict performance regression in load tests.
Abstract: Performance is an important aspect of software quality. The goals of performance are typically defined by setting upper and lower bounds for response time and throughput of a system and physical level measurements such as CPU, memory and I/O. To meet such performance goals, several performance-related activities are needed in development (Dev) and operations (Ops). In fact, large software system failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Although performance regressions are not all bugs, they often have a direct impact on users' experience of the system. The process of detection of performance regressions in development and operations is faced with challenges. First, the detection of performance regression is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. Large amounts of resources are required to detect, locate, understand and fix performance regressions at such a late stage in the development cycle. Second, even we can detect a performance regression, it is extremely hard to fix it because other changes are applied to the system after the introduction of the regression. These challenges call for further in-depth analyses of the performance regression. In this dissertation, to avoid performance regression slipping into operation, we first perform an exploratory study on the source code changes that introduce performance regressions in order to understand root-causes of performance regression in the source code level. Second, we propose an approach that automatically predicts whether a test would manifest performance regressions in a code commit. To assist practitioners to analyze system performance with operational data, we propose an approach to recovering field-representative workload that can be used to detect performance regression. We also propose that using execution logs generated by unit tests to predict performance regression in load tests.
TL;DR: This research work describes a concrete use of property-based testing for quality assurance in the parameter learning algorithm of a probabilistic graphical model and the necessity and effectiveness of this method in comparison to unit tests is analyzed with concrete code examples for enhanced retraceability and interpretability.
Abstract: Code quality is a requirement for successful and sustainable software development. The emergence of Artificial Intelligence and data driven Machine Learning in current applications makes customized solutions for both data as well as code quality a requirement. The diversity and the stochastic nature of Machine Learning algorithms require different test methods, each of which is suitable for a particular method. Conventional unit tests in test-automation environments provide the common, well-studied approach to tackle code quality issues, but Machine Learning applications pose new challenges and have different requirements, mostly as far the numerical computations are concerned. In this research work, a concrete use of property-based testing for quality assurance in the parameter learning algorithm of a probabilistic graphical model is described. The necessity and effectiveness of this method in comparison to unit tests is analyzed with concrete code examples for enhanced retraceability and interpretability, thus highly relevant for what is called explainable AI.
TL;DR: TeSRS is a test recommendation system which can effectively assist test novice in learning unit testing and improve capabilities(e.g. branch coverage rate and mutation coverage rate) of their test scripts.
Abstract: Software testing plays a crucial role in software lifecycle. As a basic approach of software testing, unit testing is one of the necessary skills for software practitioners. Since testers are required to understand the inner code of the software under test(SUT) while writing a test case, testers usually need to learn how to detect the bug within SUT effectively. When novice programmers started to learn writing unit tests, they will generally watch a video lesson or reading unit tests written by others. These learning approaches are either time-consuming or too hard for a novice. To solve these problems, we developed a system, named TeSRS, to assist novice programmers to learn unit testing. TeSRS is a test recommendation system which can effectively assist test novice in learning unit testing. Utilizing program slice technique, TeSRS has gotten an enormous amount of test snippets from superior crowdsourcing test scripts. Depending on these test snippets, TeSRS provides novices a easier way for unit test learning. To sum up, TeSRS can help test novices (1) obtain high level design ideas of unit test case and (2) improve capabilities(e.g. branch coverage rate and mutation coverage rate) of their test scripts. TeSRS has built a scalable corpus composed of over 8000 test snippets from more than 25 test problems. Its stable performance shows effectiveness in unit test learning. Demo video can be found at https://youtu.be/xvrLdvU8zFA
TL;DR: A new test data generation approach based on reinforcement learning is proposed, which utilize analogy with a game, in which a gamer, the test engineer, plays in an environment, a unit under test, and tries to achieve the highest possible reward, MC/DC coverage.
Abstract: Unit testing focused on MC/DC criterion is essential in development of safety-critical systems. However design of test data that meet the MC/DC criterion needs detailed manual analysis of branching in units under test by test engineers. To deal with this problem we propose a new test data generation approach based on reinforcement learning, which utilize analogy with a game, in which a gamer, the test engineer, plays in an environment, a unit under test, and tries to achieve the highest possible reward, MC/DC coverage. We evaluated our approach for two different granularity levels, test suite and test case, and for two different action types allowed to the gamer, discrete and continuous action spaces. Preliminary results shows that the proposed approach could solve path explosion problem of symbolic approaches and that the proposed approach achieves at least comparable results to the current state-of-the-art search-based test data generation approaches.
TL;DR: By tracking the process of Http request and object construction in the server, WebRTS can collect accurate test dependencies for each test in isolation, and supports parallel regression testing of distributed Web application.
Abstract: Regression testing is an expensive activity in software development. To speed it up, regression test selection (RTS) is a promising approach by selecting a subset of tests which are affected by code changes. Although there are lots of regression test selection tools, most of them aim to unit tests, require direct code dependency between tests and code under test, and cannot be applied to Web applications to select end-to-end web tests. This paper presents WebRTS, a dynamic RTS tool for regression testing of Web applications. By tracking the process of Http request and object construction in the server, WebRTS can collect accurate test dependencies for each test in isolation, and supports parallel regression testing of distributed Web application. The design of WebRTS is also flexible, and it can be combined with different web testing frameworks. The experimental results show that WebRTS is effective and can be used to select regression tests for Java Web applications. Video: https://youtu.be/OlAsvrX7HXc. Source code: https://gitlab.com/aozeliu18/webrts
TL;DR: DCI as mentioned in this paper detects behavioral changes in commits by generating variations of the existing test cases through assertion amplification and a search-based exploration of the input space, which can be integrated into current development processes.
Abstract: When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes in commits. As input, it takes a program, its test suite, and a commit. Its output is a set of test methods that capture the behavioral difference between the pre-commit and postcommit versions of the program. We call our approach DCI (Detecting behavioral changes in CI). It works by generating variations of the existing test cases through (i) assertion amplification and (ii) a search-based exploration of the input space. We evaluate our approach on a curated set of 60 commits from 6 open source Java projects. To our knowledge, this is the first ever curated dataset of real-world behavioral changes. Our evaluation shows that DCI is able to generate test methods that detect behavioral changes. Our approach is fully automated and can be integrated into current development processes. The main limitations are that it targets unit tests and works on a relatively small fraction of commits. More specifically, DCI works on commits that have a unit test that already executes the modified code. In practice, from our benchmark projects, we found 15.29% of commits to meet the conditions required by DCI.
TL;DR: The proposed approach, PerfDetect, can automatically detect the root cause of performance regression in a shorter time as compared to alternative performance detection approaches, and detects performance regressions, that were missed by other performance regression techniques, due to its reliance on source code analysis techniques.
TL;DR: In this paper, the authors present a developer-friendly unit testing guide organized as a catalog of 53 test cases for token authentication, representing unique combinations of 17 scenarios, 40 conditions, and 30 expected outcomes learned from the data set in their analysis.
Abstract: Authentication is a critical security feature for confirming the identity of a system's users, typically implemented with help from frameworks like Spring Security. It is a complex feature which should be robustly tested at all stages of development. Unit testing is an effective technique for fine-grained verification of feature behaviors that is not widely-used to test authentication. Part of the problem is that resources to help developers unit test security features are limited. Most security testing guides recommend test cases in a "black box" or penetration testing perspective. These resources are not easily applicable to developers writing new unit tests, or who want a security-focused perspective on coverage. In this paper, we address these issues by applying a grounded theory-based approach to identify common (unit) test cases for token authentication through analysis of 481 JUnit tests exercising Spring Security-based authentication implementations from 53 open source Java projects. The outcome of this study is a developer-friendly unit testing guide organized as a catalog of 53 test cases for token authentication, representing unique combinations of 17 scenarios, 40 conditions, and 30 expected outcomes learned from the data set in our analysis. We supplement the test guide with common test smells to avoid. To verify the accuracy and usefulness of our testing guide, we sought feedback from selected developers, some of whom authored unit tests in our dataset.