TL;DR: A study has been made of the software errors committed during development of an interactive special-purpose editor system, developed for commercial production use, and a new fault categorization scheme was developed and used to classify the 173 faults that resulted from the project's errors.
Abstract: A study has been made of the software errors committed during development of an interactive special-purpose editor system. This product, developed for commercial production use, has been followed during nine months of coding, unit testing, function testing, and system testing. Detected problems and their fixes have been described by testers and debuggers. A new fault categorization scheme was developed from these descriptions and used to classify the 173 faults that resulted from the project's errors. For each error, we asked the programmers to select its most likely cause, report the stages of the software development cycle in which the error was committed and the problem first noticed, and the circumstances of the problem's detection and isolation, including time required, techniques tried, and successful techniques. The results collected in this study are compared to results from earlier studies, and similarities and differences are noted.
TL;DR: Nighthawk is described, a system which uses a genetic algorithm (GA) to find parameters for randomized unit testing that optimize test coverage that suggest that FSS could significantly optimize metaheuristic search-based software engineering tools.
Abstract: Randomized testing is an effective method for testing software units. The thoroughness of randomized unit testing varies widely according to the settings of certain parameters, such as the relative frequencies with which methods are called. In this paper, we describe Nighthawk, a system which uses a genetic algorithm (GA) to find parameters for randomized unit testing that optimize test coverage. Designing GAs is somewhat of a black art. We therefore use a feature subset selection (FSS) tool to assess the size and content of the representations within the GA. Using that tool, we can reduce the size of the representation substantially while still achieving most of the coverage found using the full representation. Our reduced GA achieves almost the same results as the full system, but in only 10 percent of the time. These results suggest that FSS could significantly optimize metaheuristic search-based software engineering tools.
TL;DR: Evaluated on five open source libraries, the generated parameterized unit tests are more expressive, characterizing general rather than concrete behavior; need fewer computation steps, making them easier to understand; and achieve a higher coverage than regular unit tests.
Abstract: State-of-the art techniques for automated test generation focus on generating executions that cover program behavior. As they do not generate oracles, it is up to the developer to figure out what a test does and how to check the correctness of the observed behavior. In this paper, we present an approach to generate parameterized unit tests---unit tests containing symbolic pre- and postconditions characterizing test input and test result. Starting from concrete inputs and results, we use test generation and mutation to systematically generalize pre- and postconditions while simplifying the computation steps. Evaluated on five open source libraries, the generated parameterized unit tests are (a) more expressive, characterizing general rather than concrete behavior; (b) need fewer computation steps, making them easier to understand; and (c) achieve a higher coverage than regular unit tests.
TL;DR: JUnit in Action, Second Edition summarizes many related open-source tools, offering a mature view of the unit testing field including strategies for EJB, database, and web applications.
Abstract: HIGHLIGHT Updated and revised edition of a Manning classic and the only in-depth book on JUnit. Explains modern unit testing principles and the latest features in JUnit 4.5. DESCRIPTION Unit testing during software development, done properly, can mean the difference between a project's success and failure. JUnit in Action, Second Edition is an up-to-date guide to unit testing Java and Java EE applications using the popular JUnit framework and its extensions. Revised and updated from the best-selling original, the book provides techniques to help readers exploit JUnit 4.5. JUnit in Action, Second Edition summarizes many related open-source tools, offering a mature view of the unit testing field including strategies for EJB, database, and web applications. With real-world examples throughout, the authors demonstrate how to incorporate open source frameworks with JUnit, and explain test-driven development and other best practices for modern unit testing. KEY POINTS * Strong early demand through Manning's Early Access program (MEAP) * Covers latest JUnit 4.5 features including annotations, exception handling and assertion methods * Concise and developer-centric "In Action" style * Examples with AJAX applications, mock testing, test automation and more
TL;DR: Taking a single failing run, this work record and minimize the interaction between objects to the set of calls relevant for the failure, resulting in a minimal unit test that faithfully reproduces the failure at will.
Abstract: A program fails. What now? Taking a single failing run, we record and minimize the interaction between objects to the set of calls relevant for the failure. The result is a minimal unit test that faithfully reproduces the failure at will: "Out of these 14,628 calls, only 2 are required". In a study of 17 real-life bugs, our JINSI prototype reduced the search space to 13.7% of the dynamic slice or 0.22% of the source code, with only 1--12 calls left to examine.
TL;DR: An automatic technique for generating maintainable regression unit tests for programs that achieves good coverage and mutation kill score, were readable by the product's developers, and required few edits as the system under test evolved.
Abstract: This paper presents an automatic technique for generating maintainable regression unit tests for programs. We found previous test generation techniques inadequate for two main reasons. First. they were designed for and evaluated upon libraries rather than applications. Second, they were designed to find bugs rather than to create maintainable regression test suites: the test suites that they generated were brittle and hard to understand. This paper presents a suite of techniques that address these problems by enhancing an existing unit test generation system. In experiments using an industrial system, the generated tests achieved good coverage and mutation kill score, were readable by the product's developers, and required few edits as the system under test evolved. While our evaluation is in the context of one test generator, we are aware of many research systems that suffer similar limitations, so our approach and observations are more generally relevant.
TL;DR: A novel automated solution to this problem, based on dynamic slicing and conceptual coupling, is presented, and the resulting tool, SCOTCH, identifies traceability links between unit test classes and tested classes with a high accuracy and greater stability than existing techniques.
Abstract: Maintaining traceability links between unit tests and tested classes is an important factor for effectively managing the development and evolution of software systems. Exploiting traceability links helps in program comprehension and maintenance by ensuring consistency between unit tests and tested classes during maintenance activities. Unfortunately, it is often the case that such links are not explicitly maintained and thus they have to be recovered manually during software evolution. A novel automated solution to this problem, based on dynamic slicing and conceptual coupling, is presented. The resulting tool, SCOTCH (Slicing and Coupling based Test to Code trace Hunter), is empirically evaluated on three systems: an open source system and two industrial systems. The results indicate that SCOTCH identifies traceability links between unit test classes and tested classes with a high accuracy and greater stability than existing techniques, highlighting its potential usefulness as a feature within a software development environment.
TL;DR: This paper investigates whether it is possible to improve a program's testability using an automated refactoring approach, and creates a small application that scores poorly using a proven cohesion metric, LSCC.
Abstract: Current software practice places a strong emphasis on unit testing, to the extent that the amount of test code produced on a project can exceed the amount of actual application code required. This illustrates the importance of testability as a feature of software. In this paper we investigate whether it is possible to improve a program's testability using an automated refactoring approach. We conduct a quasi-experiment where we create a small application that scores poorly using a proven cohesion metric, LSCC. Using our automated refactoring platform, Code-Imp, this application is automatically refactored using the LSCC metric to guide the search for better solutions. To evaluate the results, a number of industrial software engineers were asked to write test cases for the application both before and after refactoring and compare the relative difficulty involved. The results were interesting though inconclusive, and suggest that further work is required.
TL;DR: In this article, a generic procedure used in test intensive industries for service simulation testing is outlined and applied to wave tank mooring tests, which can assist marine energy stakeholders in obtaining evidence of component reliability under simulated operational conditions much more rapidly than can be achieved with prototypes under normal service conditions.
TL;DR: This paper reports the experience in applying Blast and CBMC to testing the components of a storage platform software for flash memory and analyzed the strong and weak points of two different software model checking technologies in the viewpoint of real-world industrial application-counterexample-guided abstraction refinement with predicate abstraction and SAT-based bounded analysis.
Abstract: Conventional testing methods often fail to detect hidden flaws in complex embedded software such as device drivers or file systems. This deficiency incurs significant development and support/maintenance cost for the manufacturers. Model checking techniques have been proposed to compensate for the weaknesses of conventional testing methods through exhaustive analyses. Whereas conventional model checkers require manual effort to create an abstract target model, modern software model checkers remove this overhead by directly analyzing a target C program, and can be utilized as unit testing tools. However, since software model checkers are not fully mature yet, they have limitations according to the underlying technologies and tool implementations, potentially critical issues when applied in industrial projects. This paper reports our experience in applying Blast and CBMC to testing the components of a storage platform software for flash memory. Through this project, we analyzed the strong and weak points of two different software model checking technologies in the viewpoint of real-world industrial application-counterexample-guided abstraction refinement with predicate abstraction and SAT-based bounded analysis.
TL;DR: In this paper, the authors conduct a qualitative (grounded theory) study, in which they interview 25 senior practitioners about how they test plugin applications based on the Eclipse plug-in architecture.
Abstract: Testing plug-in-based systems is challenging due to complex interactions among many different plug-ins, and variations in version and configuration. The objective of this paper is to increase our understanding of what testers and developers think and do when it comes to testing plug-inbased systems. To that end, we conduct a qualitative (grounded theory) study, in which we interview 25 senior practitioners about how they test plug-in applications based on the Eclipse plug-in architecture. The outcome is an overview of the testing practices currently used, a set of identified barriers limiting test adoption, and an explanation of how limited testing is compensated by self-hosting of projects and by involving the community. These results are supported by a structured survey of more than 150 professionals. The study reveals that unit testing plays a key role, whereas plug-in specific integration problems are identified and resolved by the community. Based on our findings, we propose a series of recommendations and areas for future research. Accepted paper for 34th International Conference on Software Engineering, Zurich, Switserland, 2-9 june 2012
TL;DR: This paper conducts an empirical study to investigate whether existing CUTs can be retrofitted as PUTs with feasible effort and achieve the benefits ofPUTs in terms of additional fault-detection capability and code coverage, and proposes a methodology, called test generalization, that helps in systematically retrofitting existingCUTs as P UTs.
Abstract: Recent advances in software testing introduced parameterized unit tests (PUT), which accept parameters, unlike conventional unit tests (CUT), which do not accept parameters. PUTs are more beneficial than CUTs with regards to fault-detection capability, since PUTs help describe the behaviors of methods under test for all test arguments. In general, existing applications often include manually written CUTs. With the existence of these CUTs, natural questions that arise are whether these CUTs can be retrofitted as PUTs to leverage the benefits of PUTs, and what are the cost and benefits involved in retrofitting CUTs as PUTs. To address these questions, in this paper, we conduct an empirical study to investigate whether existing CUTs can be retrofitted as PUTs with feasible effort and achieve the benefits of PUTs in terms of additional fault-detection capability and code coverage. We also propose a methodology, called test generalization, that helps in systematically retrofitting existing CUTs as PUTs. Our results on three real-world open-source applications (≈ 4.6 KLOC) show that the retrofitted PUTs detect 19 new defects that are not detected by existing CUTs, and also increase branch coverage by 4% on average (with maximum increase of 52% for one class under test and 10% for one application under analysis) with feasible effort.
TL;DR: An empirical study is designed using data collected from two Java software systems for which JUnit test cases exist to explore empirically the capa-bility of the model to assess testability of classes at the code level.
Abstract: We present, in this paper, a metric based testability model for object-oriented programs. The model is, in fact, an adaptation of a model pro-posed in literature for assessing the testability of object-oriented design. The study presented in this paper aims at exploring empirically the capa-bility of the model to assess testability of classes at the code level. We investigate testability from the perspective of unit testing and required testing effort. We designed an empirical study using data collected from two Java software systems for which JUnit test cases exist. To capture testability of classes in terms of required testing effort, we used different metrics to quantify the corresponding JUnit test cases. In order to eva-luate the capability of the model to predict testability of classes (charac-teristics of corresponding test classes), we used statistical tests using correlation.
TL;DR: It is found that testing components in isolation will probably reduce the effort required on testing the whole product line compared to testing each product that is delivered to a customer from scratch, to achieve the same defect detection level.
Abstract: We should employ a strategy to test product lines. Testing products individually is redundant for product lines since the products share a considerable amount of code. This document presents a survey of product line testing strategies based on the best available empirics out there. We found that, particularly, a testing strategy backed up with empirics is reusable component testing. Based on the empirics, we found that testing components in isolation will probably reduce the effort required on testing the whole product line compared to testing each product that is delivered to a customer from scratch, to achieve the same defect detection level.
TL;DR: Ciao as discussed by the authors is a multi-paradigm programming system that supports logic programming, including Prolog, and provides the programmer with a large number of useful features from different programming paradigms and styles.
Abstract: We provide an overall description of the Ciao multiparadigm programming system emphasizing some of the novel aspects and motivations behind its design and implementation. An important aspect of Ciao is that, in addition to supporting logic programming (and, in particular, Prolog), it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features (including those of Prolog) can be turned on and off at will for each program module. Thus, a given module may be using, e.g., higher order functions and constraints, while another module may be using assignment, predicates, Prolog meta-programming, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of optimizations (including automatic parallelization). Such optimizations produce code that is highly competitive with other dynamic languages or, with the (experimental) optimizing compiler, even that of static languages, all while retaining the flexibility and interactive development of a dynamic language. This compilation architecture supports modularity and separate compilation throughout. The environment also includes a powerful auto-documenter and a unit testing framework, both closely integrated with the assertion system. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in a single journal paper, pointing instead to previous Ciao literature.
TL;DR: This paper investigates the adaptation of TDD‐like practices for already‐implemented code, in particular legacy systems, and presents a TDM approach that assists software development and testing managers to use the limited resources they have for testing legacy systems efficiently.
TL;DR: In this article, the integration of unit tests into a first semester programming course was discussed and a questionnaire was completed by the student cohort about their use and perceptions of these unit tests.
Abstract: This paper discusses the integration of unit tests into a first semester programming course. The students were supplied with unit tests to support their learning and assessments. A questionnaire was completed by the student cohort about their use and perceptions of these unit tests. As a result of both the students and our experiences we examine the advantages and disadvantages of introducing unit tests early and make some pedagogical recommendations for the introduction and use of unit tests in first year programming.
TL;DR: A language for specifying and running unit tests on ASP programs and is implemented in ASPIDE, a comprehensive IDE for ASP, which supports the entire life cycle of ASP development with a collection of user-friendly graphical tools for program composition, testing, debugging, profiling, solver execution configuration, and output handling.
Abstract: Answer Set Programming (ASP) is a declarative logic programming formalism, which is employed nowadays in both academic and industrial real-world applications. Although some tools for supporting the development of ASP programs have been proposed in the last few years, the crucial task of testing ASP programs received less attention, and is an Achilles' heel of the available programming environments.
In this paper we present a language for specifying and running unit tests on ASP programs. The testing language has been implemented in ASPIDE, a comprehensive IDE for ASP, which supports the entire life-cycle of ASP development with a collection of user-friendly graphical tools for program composition, testing, debugging, profiling, solver execution configuration, and output-handling.
TL;DR: The AutoTest unit testing framework as discussed by the authors automates both test oracles and test cases, two activities which are too tedious to be effectively performed by humans, yet for the most part remain manual.
Abstract: Effective testing involves preparing test oracles and test cases, two activities which are too tedious to be effectively performed by humans, yet for the most part remain manual. The AutoTest unit testing framework automates both, by using Eiffel contracts -- already present in the software -- as test oracles, and generating objects and routine arguments to exercise all given classes; manual tests can also be added, and all failed test cases are automatically retained for regression testing, in a "minimized" form retaining only the relevant instructions. AutoTest has already detected numerous hitherto unknown bugs in production software.
TL;DR: In this article, a system and a method may generate unit tests for source code concurrently with API documentation, where the system and method may parse the source code file to determine a source code function type corresponding to the unit description, and then copy the description to a unit test stub corresponding to a function type.
Abstract: A system and method may generate unit tests for source code concurrently with API documentation. The system may receive a source code file including several comments sections. Each comments section may include a description of a source code unit such as a class, method, member variable, etc. The description may also correspond to input and output parameters the source code unit. The system and method may parsing the source code file to determine a source code function type corresponding to the unit description and copy the unit description to a unit test stub corresponding to the function type. A developer or another module may then complete the unit test stub to transform each stub into a complete unit test corresponding to the source code unit. Additionally, the system and method may execute the unit test and generate a test result indication for each unit test.
TL;DR: In this paper, the authors present a language for specifying and running unit tests on ASP programs, which is implemented in ASPIDE, a comprehensive IDE for ASP, which supports the entire life cycle of ASP development with a collection of user-friendly graphical tools for program composition, testing, debugging, profiling, solver execution configuration, and output handling.
Abstract: Answer Set Programming (ASP) is a declarative logic programming formalism, which is employed nowadays in both academic and industrial real-world applications Although some tools for supporting the development of ASP programs have been proposed in the last few years, the crucial task of testing ASP programs received less attention and it is an Achilles’ heel of the available programming environments In this paper we present a language for specifying and running unit tests on ASP programs The testing language was implemented in ASPIDE, a comprehensive IDE for ASP, which supports the entire life cycle of ASP development with a collection of user-friendly graphical tools for program composition, testing, debugging, profiling, solver execution configuration, and output handling
TL;DR: Infandango1 is an open source web-based system for automated grading of Java code submitted by students that gains near-instant feedback on the correctness of their code, and instructors are able to monitor the progress of students in the class.
Abstract: Infandango1 is an open source web-based system for automated grading of Java code submitted by students. Uploaded Java files are compiled and run against a set of unit tests on a central server, with results being stored in a database. Students gain near-instant feedback on the correctness of their code, and instructors are able to monitor the progress of students in the class.
TL;DR: A novel framework for data size-dependent, static resource usage verification, which generalizes the checking of assertions to allow preconditions expressing intervals within which the input data size of a program is supposed to lie, and extends the class of resource usage functions that can be checked.
Abstract: In an increasing number of applications (e.g., in embedded, real-time, or mobile systems) it is important or even essential to ensure conformance with respect to a specification expressing the use of some resource, such as execution time, energy, or user-defined resources. In previous work we have presented a novel framework for data size-dependent, static resource usage verification (which can also be combined with run-time tests). Specifications can include both lower and upper bound resource usage functions. In order to statically check such specifications, both upper- and lower-bound resource usage functions (on input data sizes) approximating the actual resource usage of the program are automatically inferred and compared against the specification. The outcome of the static checking of assertions can express intervals for the input data sizes such that a given specification can be proved for some intervals but disproved for others. After an overview of the approach, in this paper we provide a number of novel contributions: we present a more complete formalization and we report on and provide results from an implementation within the Ciao/CiaoPP framework (which provides a general, unified platform for static and run-time verification, as well as unit testing). We also generalize the checking of assertions to allow preconditions expressing intervals within which the input data size of a program is supposed to lie (i.e., intervals for which each assertion is applicable), and we extend the class of resource usage functions that can be checked.
TL;DR: The result shows by using JUnit framework in unit testing learning can improve student interest and understanding in software engineering courses.
Abstract: Based on Software Engineering discipline, unit testing play significant rule in testing procedure to determine if the source code fit for use. A unit test is the smallest testable part of an application. In basic learning of unit testing, JUnit framework with interactive GUI techniques as suitable tools in supporting students learning in higher learning institutes. In this paper, it presents the method to create unit testing code and how JUnit can give good experience in unit testing learning in software engineering. The result shows by using JUnit framework in unit testing learning can improve student interest and understanding in software engineering courses.
TL;DR: An approach that non-intrusively integrates the use of software testing tools in SE courses is described that contains tutorials on testing concepts and testing tools from a Web-Based Repository of Software Testing Tools (WReSTT).
Abstract: One of the main concerns in the software industry continues to be the development of high quality software. This concern will be exacerbated as software systems become more complex. The training of software developers continues to grow in academia since more institutions are offering software engineering (SE) courses. However, the list of topics that are expected to be covered in this course leaves little or no time for topics that focus on developing quality software, such as software testing and the use of testing tools.In this paper we describe an approach that non-intrusively integrates the use of software testing tools in SE courses. The cornerstone of our approach is the interaction students have with a Web-Based Repository of Software Testing Tools (WReSTT) that contains tutorials on testing concepts and testing tools. WReSTT employs both collaborative learning and social networking features that are attractive to students. We present the results of preliminary study performed in two SE courses that show how using the resources in WReSTT can potentially impact the students' understanding of software testing and the use of testing tools.
TL;DR: In this article, an approach to unit testing of plan-based agent systems, with a focus on automated generation and execution of test cases, is described, where test cases are generated from design artefacts, supplemented with some additional data.
Abstract: This paper describes an approach to unit testing of plan based agent systems, with a focus on automated generation and execution of test cases Design artefacts, supplemented with some additional data, provide the basis for specification of a comprehensive suite of test cases Correctness of execution is evaluated against a design model, and a comprehensive report of errors and warnings is provided to the user Given that it is impossible to design test suites which execute all possible traces of an agent program, it is extremely important to thoroughly test all units in as wide a variety of situations as possible to ensure acceptable behaviour We provide details of the information required in design models or related data to enable the automated generation and execution of test cases We also briefly describe the implemented tool which realises this approach
TL;DR: In this article, the authors propose a method for unit testing of a software module, which provides for reading, by a computer, target data and discovering of functional aspects of a piece of software code, dividing the target data into chunks, estimating a plurality of decision/condition statements of the software code.
Abstract: In a method and apparatus of performing unit testing of a software module, the method provides for reading, by a computer, target data and discovering of functional aspects of a piece of software code, dividing the target data into chunks, estimating a plurality of decision/condition statements of the software code, estimating an amount of possible test cases based on the program inputs, defining a data set over the plurality of identified decisions/conditions, finding subset relationships between all the defined data sets, defining a plurality of optimal data sets, classifying the condition of the plurality of optimal data sets by category, refining the plurality of optimal data sets, and calculating the best amount of data sets.
TL;DR: The approach first uses dynamic analysis to infer a call sequence model from a sample execution, then uses static analysis to identify method dependence relations based on the fields they may read or write, and guides a random test generator to create legal and behaviorally-diverse tests.
Abstract: In object-oriented programs, a unit test often consists of a sequence of method calls that create and mutate objects. It is challenging to automatically generate sequences that are legal and behaviorally-diverse, that is, reaching as many different program states as possible. This paper proposes a combined static and dynamic test generation approach to address these problems, for code without a formal specification. Our approach first uses dynamic analysis to infer a call sequence model from a sample execution, then uses static analysis to identify method dependence relations based on the fields they may read or write. Finally, both the dynamically-inferred model (which tends to be accurate but incomplete) and the statically-identified dependence information (which tends to be conservative) guide a random test generator to create legal and behaviorally-diverse tests. Our Palus tool implements this approach. We compared it with a pure random approach, a dynamic-random approach (without a static phase), and a static-random approach (without a dynamic phase) on six popular open-source Java programs. Tests generated by Palus achieved 35% higher structural coverage on average. Palus is also internally used in Google, and has found 22 new bugs in four well-tested products.
TL;DR: This work presents an algorithmic procedure to compute optimal test times based on the column generation technique, and illustrates it with numerical examples.
Abstract: We consider the component testing problem of a device that is designed to perform a mission consisting of a random sequence of phases with random durations. Testing is done at the component level to attain desired levels of mission reliability at minimum cost. The components fail exponentially where the failure rate depends on the phase of the mission. The reliability structure of the device involves a series connection of nonidentical components with different failure characteristics. The optimal component testing problem is formulated as a semi-infinite linear program. We present an algorithmic procedure to compute optimal test times based on the column generation technique, and illustrate it with numerical examples.
TL;DR: This thesis effectively addresses the challenges of existing GUI testing methods and provides a unified solution to GUI testing automation and the proposed Graphic User Interface Testing Automation Model (GUITAM), the development of GUI Defect Classification and the proposal of the Long Use Case Closure Envelope Model.
Abstract: A Graphical User Interface (GUI) is the most widely used method whereby information systems interact with users. According to ACM Computing Surveys, on average, more than 45% of software code in a software application is dedicated to the GUI. However, GUI testing is extremely expensive. In unit testing, 10,000 cases can often be automatically tested within a minute whereas, in GUI testing, 10,000 simple GUI test cases need more than 10 hours to complete. This thesis effectively addresses the challenges of existing GUI testing methods and provides a unified solution to GUI testing automation. The three main contributions of this thesis are the proposal of the Graphic User Interface Testing Automation Model
(GUITAM), the development of GUI Defect Classification and the proposal of the Long Use Case Closure Envelope Model.