TL;DR: Light is shed as to why practitioners often perform some of these search tasks and why they find some of them to be challenging, and the implications of the findings to future research in several research areas are discussed.
Abstract: Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.
TL;DR: A dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers are collected.
Abstract: Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these approaches relate to the way practitioners actually locate, understand, and fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- the correct fault locations, bug diagnoses, and software patches of 27 real errors in open-source C projects that were consolidated from hundreds of debugging sessions of professional software engineers. Moreover, we shed light on the entire debugging process, from constructing a hypothesis to submitting a patch, and how debugging time, difficulty, and strategies vary across practitioners and types of errors. Most notably, DBGBENCH can serve as reality check for novel automated debugging and repair techniques.
TL;DR: A replication study of 21 different Java-based open source projects from three different categories shows that all projects contain logging code, which is actively maintained, however, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages.
Abstract: Log messages, which are generated by the debug statements that developers insert into the code at runtime, contain rich information about the runtime behavior of software systems. Log messages are used widely for system monitoring, problem diagnoses and legal compliances. Yuan et al. performed the first empirical study on the logging practices in open source software systems. They studied the development history of four C/C++ server-side projects and derived ten interesting findings. In this paper, we have performed a replication study in order to assess whether their findings would be applicable to Java projects in Apache Software Foundations. We examined 21 different Java-based open source projects from three different categories: server-side, client-side and supporting-component. Similar to the original study, our results show that all projects contain logging code, which is actively maintained. However, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages. A significantly higher portion of log updates are for enhancing the quality of logs (e.g., formatting & style changes and spelling/grammar fixes) rather than co-changes with feature implementations (e.g., updating variable names).
TL;DR: This paper presents Log20, a tool that determines a near optimal placement of log printing statements under the constraint of adding less than a specified amount of performance overhead, and observed that Log20 is substantially more efficient in code path disambiguation compared to the developers' manually placed log printing Statements.
Abstract: When systems fail in production environments, log data is often the only information available to programmers for postmortem debugging. Consequently, programmers' decision on where to place a log printing statement is of crucial importance, as it directly affects how effective and efficient postmortem debugging can be. This paper presents Log20, a tool that determines a near optimal placement of log printing statements under the constraint of adding less than a specified amount of performance overhead. Log20 does this in an automated way without any human involvement. Guided by information theory, the core of our algorithm measures how effective each log printing statement is in disambiguating code paths. To do so, it uses the frequencies of different execution paths that are collected from a production environment by a low-overhead tracing library. We evaluated Log20 on HDFS, HBase, Cassandra, and ZooKeeper, and observed that Log20 is substantially more efficient in code path disambiguation compared to the developers' manually placed log printing statements. Log20 can also output a curve showing the trade-off between the informativeness of the logs and the performance slowdown, so that a developer can choose the right balance.
TL;DR: This work reviews the reported software failures that have impacted the health, safety, and welfare of the public and acts as guidelines for engineers to improve the quality of their products and avoid similar failures from happening.
TL;DR: It was found that in the authors' programming game, debugging required deeper understanding than writing new codes, and problem solving behaviors were significantly correlated with students’ self-explanation quality, number of code edits, and prior programming experience.
Abstract: Debugging is an over-looked component in K-12 computational thinking education. Few K-12 programming environments are designed to teach debugging, and most debugging research were conducted on coll...
TL;DR: EvoCrash is presented, a post-failure approach which uses a novel Guided Genetic Algorithm (GGA) to cope with the large search space characterizing real-world software programs and outperforming the state-of-the-art in crash replication.
Abstract: To reduce the effort developers have to make for crash debugging, researchers have proposed several solutions for automatic failure reproduction. Recent advances proposed the use of symbolic execution, mutation analysis, and directed model checking as underling techniques for post-failure analysis of crash stack traces. However, existing approaches still cannot reproduce many real-world crashes due to such limitations as environment dependencies, path explosion, and time complexity. To address these challenges, we present EvoCrash, a post-failure approach which uses a novel Guided Genetic Algorithm (GGA) to cope with the large search space characterizing real-world software programs. Our empirical study on three open-source systems shows that EvoCrash can replicate 41 (82%) of real-world crashes, 34 (89%) of which are useful reproductions for debugging purposes, outperforming the state-of-the-art in crash replication.
TL;DR: NINJA is proposed, a transparent malware analysis framework on ARM platform with low artifacts that leverages a hardware-assisted isolated execution environment TrustZone to transparently trace and debug a target application with the help of Performance Monitor Unit and Embedded Trace Macrocell.
Abstract: Existing malware analysis platforms leave detectable fingerprints like uncommon string properties in QEMU, signatures in Android Java virtual machine, and artifacts in Linux kernel profiles. Since these fingerprints provide the malware a chance to split its behavior depending on whether the analysis system is present or not, existing analysis systems are not sufficient to analyze the sophisticated malware. In this paper, we propose NINJA, a transparent malware analysis framework on ARM platform with low artifacts. NINJA leverages a hardware-assisted isolated execution environment TrustZone to transparently trace and debug a target application with the help of Performance Monitor Unit and Embedded Trace Macrocell. NINJA does not modify system software and is OS-agnostic on ARM platform. We implement a prototype of NINJA (i.e., tracing and debugging subsystems), and the experiment results show that NINJA is efficient and transparent for malware analysis.
TL;DR: The design and implementation of NORAX is presented, a practical system that retrofits XOM into stripped COTS binaries on AArch64 platforms and is designed to co-exist with other COTS binary hardening techniques, such as in-place randomization (IPR).
Abstract: Code reuse attacks exploiting memory disclosure vulnerabilities can bypass all deployed mitigations. One promising defense against this class of attacks is to enable execute-only memory (XOM) protection on top of fine-grained address space layout randomization (ASLR). However, recent works implementing XOM, despite their efficacy, only protect programs that have been (re)built with new compiler support, leaving commercial-off-the-shelf (COTS) binaries and source-unavailable programs unprotected. We present the design and implementation of NORAX, a practical system that retrofits XOM into stripped COTS binaries on AArch64 platforms. Unlike previous techniques, NORAX requires neither source code nor debugging symbols. NORAX statically transforms existing binaries so that during runtime their code sections can be loaded into XOM memory pages with embedded data relocated and data references properly updated. NORAX allows transformed binaries to leverage the new hardware-based XOM support—a feature widely available on AArch64 platforms (e.g., recent mobile devices) yet virtually unused due to the incompatibility of existing binaries. Furthermore, NORAX is designed to co-exist with other COTS binary hardening techniques, such as in-place randomization (IPR). We apply NORAX to the commonly used Android system binaries running on SAMSUNG Galaxy S6 and LG Nexus 5X devices. The results show that NORAX on average slows down the execution of transformed binaries by 1.18% and increases their memory footprint by 2.21%, suggesting NORAX is practical for real-world adoption.
TL;DR: A new development environment designed to illuminate the boundary between embedded code and circuits called Bifröst, which automatically instruments and captures the progress of the user's code, variable values, and the electrical and bus activity occurring at the interface between the processor and the circuit it operates in.
Abstract: The Maker movement has encouraged more people to start working with electronics and embedded processors. A key challenge in developing and debugging custom embedded systems is understanding their behavior, particularly at the boundary between hardware and software. Existing tools such as step debuggers and logic analyzers only focus on software or hardware, respectively. This paper presents a new development environment designed to illuminate the boundary between embedded code and circuits. Bifrost automatically instruments and captures the progress of the user's code, variable values, and the electrical and bus activity occurring at the interface between the processor and the circuit it operates in. This data is displayed in a linked visualization that allows navigation through time and program execution, enabling comparisons between variables in code and signals in circuits. Automatic checks can detect low-level hardware configuration and protocol issues, while user-authored checks can test particular application semantics. In an exploratory study with ten participants, we investigated how Bifrost influences debugging workflows.
TL;DR: It is shown how the use of a runtime monitor can greatly speed up the testing phase of a video game under development, by automating the detection of bugs when the game is being played, by successfully increments and efficiently monitoring various temporal properties over their execution.
Abstract: Runtime verification is the process of observing a sequence of events generated by a running system and comparing it to some formal specification for potential violations. We show how the use of a runtime monitor can greatly speed up the testing phase of a video game under development by automating the detection of bugs when the game is being played. We take advantage of the fact that a video game, contrarily to generic software, follows a special structure that contains a “game loop.” This game loop can be used to centralize the instrumentation and generate events based on the game's internal state. We report on experiments made on a sample of six real-world video games of various genres and sizes by successfully instrumenting and efficiently monitoring various temporal properties over their execution, including actual bugs reported in the games' bug tracking database in the course of their development.
TL;DR: The requirements of the SENT communication protocol can be monitored in real time, with a monitor capable of processing 70 million samples per second, and the results of the monitoring can be used for error logging to provide users with extensive debugging information.
Abstract: We show how the requirements of the SENT communication protocol between a magnetic sensor and an electronic control unit (ECU) can be monitored in real time, with a monitor capable of processing 70 million samples per second. We elaborate on a complete flow from formalizing electrical and timing requirements using Signal Temporal Logic (STL) and Timed Regular Expressions (TRE), to implementing runtime monitors in FPGA hardware and evaluating the results in the lab. For a class of asynchronous serial protocols, we define a procedure to obtain monitors that are capable to recover after violations. We elaborate on two different approaches to monitor the requirements of interest: (i) temporal testers with SystemC, STL and High-Level Synthesis; (ii) automata-based approach with TRE in HDL. We also present how the results of the monitoring can be used for error logging to provide users with extensive debugging information. Our approach allows to monitor requirements-specification conformance in real time for long-term tests.
TL;DR: A new non-homogeneous Poisson process (NHPP) software reliability model with a pioneering idea by considering software fault dependency and imperfect fault removal is presented.
TL;DR: The Energy-interference-free Debugger re-creates a familiar debugging environment for intermittent software and augments it with debugging primitives for effective diagnosis of intermittence bugs.
Abstract: Energy-autonomous computing devices have the potential to extend the reach of computing to a scale beyond eitherwired or battery-powered systems. However, these devices pose a unique set of challenges to application developerswho lack both hardware and software support tools. Energy harvesting devices experience power intermittence whichcauses the system to reset and power-cycle unpredictably, tens to hundreds of times per second. This can result incode execution errors that are not possible in continuously-powered systems and cannot be diagnosed with conventionaldebugging tools such as JTAG and/or oscilloscopes. We propose the Energy-interference-free Debugger, ahardware and software platform for monitoring and debugging intermittent systems without adversely effectingtheir energy state. The Energy-interference-free Debugger re-creates a familiar debugging environment for intermittentsoftware and augments it with debugging primitives for effective diagnosis of intermittence bugs. Our evaluationof the Energy-interference-free Debugger quantifies its energy-interference-freedom and shows its value in a set ofdebugging tasks in complex test programs and several real applications, including RFID code and a machine-learning-basedactivity recognition system.
TL;DR: This paper develops and formalises definitions for "why?" and "why not?" questions and associated answers, and illustrates their application using a scenario.
Abstract: Debugging is hard, and debugging cognitive agent programs is particularly hard, since they involve concurrency, a dynamic environment, and a complex execution model that includes failure handling Previous work by Ko & Myers has demonstrated that providing Alice and Java programmers with software that can answer "why?" and "why not?" questions can make a dramatic difference to debugging performance This paper considers how to adapt this approach to cognitive agent programs, specifically AgentSpeak It develops and formalises definitions for "why?" and "why not?" questions and associated answers, and illustrates their application using a scenario
TL;DR: This work investigates a number of different logical errors that novice programmers encounter and the associated debugging behaviors and shows some errors are more difficult than others.
Abstract: Students taking introductory computer science courses often have difficulty with the debugging process. This work investigates a number of different logical errors that novice programmers encounter and the associated debugging behaviors. Data is collected and analyzed data in two different experiments from 142 subjects. The results show some errors are more difficult than others. Different types of bugs and novices' debugging behaviors are identified. Years of experience showed a significant role in the process of debugging in terms of correctness level and time required for debugging
TL;DR: It is found that the programmers in two studies did static change impact analysis before they made changes by using IDE navigational functionalities, and they did dynamic change impactAnalysis after they madeChanges by running the programs.
Abstract: "Change Impact Analysis" is the process of determining the consequences of a modification to software. In theory, change impact analysis should be done during software maintenance, to make sure changes do not introduce new bugs. Many approaches and techniques are proposed to help programmers do change impact analysis automatically. However, it is still an open question whether and how programmers do change impact analysis. In this paper, we conducted two studies, one in-depth study and one breadth study. For the in-depth study, we recorded videos of nine professional programmers repairing two bugs for two hours. For the breadth study, we surveyed 35 professional programmers using an online system. We found that the programmers in our studies did static change impact analysis before they made changes by using IDE navigational functionalities, and they did dynamic change impact analysis after they made changes by running the programs. We also found that they did not use any change impact analysis tools.
TL;DR: A mutation-based fault localization technique that locates real-world multilingual bugs accurately and can provide an effective, efficient, and language semantics agnostic analysis on multilingual code is proposed.
Abstract: Context: The programming language ecosystem has diversified over the last few decades. Non-trivial programs are likely to be written in more than a single language to take advantage of various control/data abstractions and legacy libraries.
Objective: Debugging multilingual bugs is challenging because language interfaces are difficult to use correctly and the scope of fault localization goes beyond language boundaries. To locate the causes of real-world multilingual bugs, this article proposes a mutation-based fault localization technique (MUSEUM).
Method: MUSEUM modifies a buggy program systematically with our new mutation operators as well as conventional mutation operators, observes the dynamic behavioral changes in a test suite, and reports suspicious statements. To reduce the analysis cost, MUSEUM selects a subset of mutated programs and test cases.
Results: Our empirical evaluation shows that MUSEUM is (i) effective: it identifies the buggy statements as the most suspicious statements for both resolved and unresolved non-trivial bugs in real-world multilingual programming projects; and (ii) efficient: it locates the buggy statements in modest amount of time using multiple machines in parallel. Also, by applying selective mutation analysis (i.e., selecting subsets of mutants and test cases to use), MUSEUM achieves significant speedup with marginal accuracy loss compared to the full mutation analysis.
Conclusion: It is concluded that MUSEUM locates real-world multilingual bugs accurately. This result shows that mutation analysis can provide an effective, efficient, and language semantics agnostic analysis on multilingual code. Our light-weight analysis approach would play important roles as programmers write and debug large and complex programs in diverse programming languages.
TL;DR: A general software reliability growth model (SRGM) based on non-homogeneous Poisson process (NHPP) and optimal software release policy with cost and reliability criteria is presented and the effect of changes in various environmental factors on models parameters has been taken into account.
Abstract: This paper presents a general software reliability growth model (SRGM) based on non-homogeneous Poisson process (NHPP) and optimal software release policy with cost and reliability criteria. The main motive of this study is to develop a software release time decision model considering maintenance cost and warranty cost under fuzzy environment. In previous studies, maintenance cost has been defined either in terms of warranty cost or fault debugging cost. In reality, maintenance cost includes the cost of free patches, updates, technical support and future enhancement. Also, it is possible that maintenance process causes the removal of software faults in the operational phase including the faults which occur outside the warranty period or warranty definition. In other words, warranty action may be included the maintenance action, but not the converse. Considering this fact, maintenance cost and warranty cost are defined separately in the proposed study. Initially, an SRGM has been proposed with the revised concept of imperfect debugging phenomenon considering fault removal efficiency (FRE). Furthermore, the effect of changes in various environmental factors on models parameters has been taken into account. Numerical examples based on real software failure data sets have been given to analyze the performance of the proposed models.
TL;DR: This survey shows how the initial ideas have been developed to become a widespread debugging schema fitting many different programming paradigms and with applications out of the program debugging field.
Abstract: Algorithmic debugging is a technique proposed in 1982 by E. Y. Shapiro in the context of logic programming. This survey shows how the initial ideas have been developed to become a widespread debugging schema fitting many different programming paradigms and with applications out of the program debugging field. We describe the general framework and the main issues related to the implementations in different programming paradigms and discuss several proposed improvements and optimizations. We also review the main algorithmic debugger tools that have been implemented so far and compare their features. From this comparison, we elaborate a summary of desirable characteristics that should be considered when implementing future algorithmic debuggers.
TL;DR: CircuitSense is presented, a system that automatically recognizes the wires and electronic components placed on breadboards and uses a combination of passive sensing and active probing to detect and generate the corresponding circuit representation in software in real-time.
Abstract: The rise of Maker communities and open-source electronic prototyping platforms have made electronic circuit projects increasingly popular around the world. Although there are software tools that support the debugging and sharing of circuits, they require users to manually create the virtual circuits in software, which can be time-consuming and error-prone. We present CircuitSense, a system that automatically recognizes the wires and electronic components placed on breadboards. It uses a combination of passive sensing and active probing to detect and generate the corresponding circuit representation in software in real-time. CircuitSense bridges the gap between the physical and virtual representations of circuits. It enables users to interactively construct and experiment with physical circuits while gaining the benefits of using software tools. It also dramatically simplifies the sharing of circuit designs with online communities.
TL;DR: There are still quite a number of aspects that are not sufficiently covered in the field, most notably including exploring correction and fixing bugs in terms of debugging process, and the concurrent, parallel and multicore software community needs broader studies in debugging.
Abstract: Debugging--the process of identifying, localizing and fixing bugs--is a key activity in software development . Due to issues such as non-determinism and difficulties of reproducing failures, debugging concurrent software is significantly more challenging than debugging sequential software. A number of methods, models and tools for debugging concurrent and multicore software have been proposed, but the body of work partially lacks a common terminology and a more recent view of the problems to solve. This suggests the need for a classification, and an up-to-date comprehensive overview of the area. This paper presents the results of a systematic mapping study in the field of debugging of concurrent and multicore software in the last decade (2005---2014). The study is guided by two objectives: (1) to summarize the recent publication trends and (2) to clarify current research gaps in the field. Through a multi-stage selection process, we identified 145 relevant papers. Based on these, we summarize the publication trend in the field by showing distribution of publications with respect to year, publication venues, representation of academia and industry, and active research institutes. We also identify research gaps in the field based on attributes such as types of concurrency bugs, types of debugging processes, types of research and research contributions. The main observations from the study are that during the years 2005---2014: (1) there is no focal conference or venue to publish papers in this area; hence, a large variety of conferences and journal venues (90) are used to publish relevant papers in this area; (2) in terms of publication contribution, academia was more active in this area than industry; (3) most publications in the field address the data race bug; (4) bug identification is the most common stage of debugging addressed by articles in the period; (5) there are six types of research approaches found, with solution proposals being the most common one; and (6) the published papers essentially focus on four different types of contributions, with "methods" being the most common type. We can further conclude that there are still quite a number of aspects that are not sufficiently covered in the field, most notably including (1) exploring correction and fixing bugs in terms of debugging process; (2) order violation, suspension and starvation in terms of concurrency bugs; (3) validation and evaluation research in the matter of research type; (4) metric in terms of research contribution. It is clear that the concurrent, parallel and multicore software community needs broader studies in debugging. This systematic mapping study can help direct such efforts.
TL;DR: A new platform-independent approach to implement model-level debuggers that does not require a program debugger for the code generated from the model and that any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected.
Abstract: Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger.
TL;DR: HLScope is proposed, a source-to-source transformation framework based on Vivado HLS for automated performance analysis and an in-FPGA analysis method is proposed that can be natively integrated into the HLS environment.
Abstract: In their quest for further optimization, field-programmable gate array (FPGA) designers often spend considerable time trying to identify the performance bottleneck in a current design. But since FPGAs do not have built-in high-level probes for performance analysis, manual effort is required to insert custom hardware monitors. This, however, is a time-consuming process which calls for automation. Previous work automates the process of inserting hardware monitors into the communication channels or the finite-state machine, but the instrumentation is applied in low-level hardware description languages (HDL) which limits the comprehensibility in identifying the root cause of stalls. Instead, we propose a performance debugging methodology based on high-level synthesis (HLS). High-level analysis allows tracing the cause of stalls on a function or loop level, which provides a more intuitive feedback that can be used to pinpoint the performance bottleneck. In this paper we propose HLScope, a source-to-source transformation framework based on Vivado HLS for automated performance analysis. We present a method for analyzing the information collected from the software simulation to estimate the stall rate and its cause without the need for FPGA bitstream generation. For detailed analysis, an in-FPGA analysis method is proposed that can be natively integrated into the HLS environment. Experiments show that the parameter extraction from the simulation process is orders of magnitude faster than bitstream generation, with a 2.2% cycle difference on average. In-FPGA flow consumes only about 170 LUTs and a BRAM per monitored module and provides cycle-accurate results.
TL;DR: An experiment asks developers to debug programs with and without variability, while recording their eye movements using an eye tracker, and results indicate that debugging time increases for code fragments containing variability.
Abstract: Preprocessor directives (#ifdefs) are often used to implement compile-time variability, despite the critique that they increase complexity, hamper maintainability, and impair code comprehensibility. Previous studies have shown that the time of bug finding increases linearly with variability. However, little is known about the cognitive process of debugging programs with variability. We carry out an experiment to understand how developers debug programs with variability. We ask developers to debug programs with and without variability, while recording their eye movements using an eye tracker. The results indicate that debugging time increases for code fragments containing variability. Interestingly, debugging time also seems to increase for code fragments without variability in the proximity of fragments that do contain variability. The presence of variability correlates with increase in the number of gaze transitions between definitions and usages for fields and methods. Variability also appears to prolong the "initial scan" of the entire program that most developers initiate debugging with.
TL;DR: This work presents Scanalog, a tool built on programmable analog hardware that enables users to rapidly explore different circuit designs using direct manipulation, and receive immediate feedback on the resulting behaviors without manual assembly, calculation, or probing.
Abstract: Analog circuit design is a complex, error-prone task in which the processes of gathering observations, formulating reasonable hypotheses, and manually adjusting the circuit raise significant barriers to an iterative workflow. We present Scanalog, a tool built on programmable analog hardware that enables users to rapidly explore different circuit designs using direct manipulation, and receive immediate feedback on the resulting behaviors without manual assembly, calculation, or probing. Users can interactively tune modular signal transformations on hardware with real inputs, while observing real-time changes at all points in the circuit. They can create custom unit tests and assertions to detect potential issues. We describe three interactive applications demonstrating the expressive potential of Scanalog. In an informal evaluation, users successfully conditioned analog sensors and described Scanalog as both enjoyable and easy to use.
TL;DR: Checkmate as mentioned in this paper provides a plethora of functions to check the type and related properties of the most frequently used R objects and variable types, which can be employed to detect unexpected input during runtime and to signal understandable and traceable errors.
Abstract: Dynamically typed programming languages like R allow programmers to write generic, flexible and concise code and to interact with the language using an interactive Read-eval-print-loop (REPL). However, this flexibility has its price: As the R interpreter has no information about the expected variable type, many base functions automatically convert the input instead of raising an exception. Unfortunately, this frequently leads to runtime errors deeper down the call stack which obfuscates the original problem and renders debugging challenging. Even worse, unwanted conversions can remain undetected and skew or invalidate the results of a statistical analysis. As a resort, assertions can be employed to detect unexpected input during runtime and to signal understandable and traceable errors.
The package "checkmate" provides a plethora of functions to check the type and related properties of the most frequently used R objects and variable types. The package is mostly written in C to avoid any unnecessary performance overhead. Thus, the programmer can conveniently write concise, well-tested assertions which outperforms custom R code for many applications. Furthermore, checkmate simplifies writing unit tests using the framework "testthat" by extending it with plenty of additional expectation functions, and registered C routines are available for package developers to perform assertions on arbitrary SEXPs (internal data structure for R objects implemented as struct in C) in compiled code.
TL;DR: This paper presents an approach for rapidly adjustable embedded trace online monitoring of multi-core systems, called RETOM, which employs a novel online reconstruction technique that makes the program run available outside the SoC and allows for evaluating a specification formulated in the stream-based specification language TeSSLa in real time.
Abstract: This paper presents an approach for rapidly adjustable embedded trace online monitoring of multi-core systems, called RETOM. Today, most commercial multi-core SoCs provide accurate runtime information through an embedded trace unit without affecting program execution. Available debugging solutions can use it to reconstruct the run offline, but usually for up to a few seconds only. RETOM employs a novel online reconstruction technique that makes the program run available outside the SoC and allows for evaluating a specification formulated in the stream-based specification language TeSSLa in real time. The necessary computing performance is provided by an FPGA-based event processing system. In contrast to other hardware-based runtime verification techniques, changing the specification requires no circuit synthesis and thus seconds rather than minutes or hours. Therefore, iterated testing and property adjustment during development and debugging becomes feasible while preserving the option of arbitrarily extending observation time, which may be necessary to detect rarely occurring errors. Experiments show the feasibility of the approach.
TL;DR: This paper proposes an incremental equivalence checking method to address the scalability challenges by solving the verification problem in the increasing order of design's input complexity and is able to generate smaller and compact remainders for large designs.
Abstract: Symbolic algebra is a promising approach to verify large and complex arithmetic circuits. Existing algebraic-based verification methods generate a remainder to indicate buggy implementation. The remainder is beneficial for debugging of the faulty implementation since it can be used for automated test generation, bug localization, and bug correction. However, existing equivalence checking approaches are not scalable and lead to explosion in size of the remainder when the design is faulty. To make the matters worse, the location of the bug can also lead to the explosion in the number of remainder terms. In this paper, we propose an incremental equivalence checking method to address the scalability challenges by solving the verification problem in the increasing order of design's input complexity. Our proposed approach makes two important contributions. It is able to generate smaller and compact remainders for large designs. Our proposed incremental debugging is capable of localizing and correcting hard-to-detect bugs irrespective of their location in the design. Experimental results demonstrate that our approach can efficiently debug most difficult bugs in large arithmetic circuits when the state-of-the-art methods fail.
TL;DR: A framework is proposed to reduce the effort on output verification using a strategy based on the Hamming distance or K-Means clustering to predict results of test executions and suggests that fault localization techniques using execution results verified against the expected outputs can be even more effective or as good as the former.
Abstract: Software has become ubiquitous in our daily lives, and with its increasing functionality and complexity comes a frequently tedious and prolonged debugging process. Of the three activities in program debugging (failure detection, fault localization, and bug fixing), the focus of this paper is on the first, failure detection, under the condition that there is no test oracle that can be used to automatically determine the success or failure of all the executions. More precisely, the outputs for many executions have to be verified manually, or the expected outputs are not even available. We want to determine whether there is a solution to help programmers predict the execution results. How good are these predicted results when they are used to help programmers find the locations of bugs? A framework is proposed to reduce the effort on output verification using a strategy based on the Hamming distance or K-Means clustering to predict results of test executions. Such data and the statement coverage of each test case are used to compute the suspiciousness of each statement according to a fault localization technique and produce a ranking for examination to locate bugs. Case studies using 22 programs and seven fault localization techniques were conducted to evaluate the fault localization effectiveness of the proposed framework on 1203 faulty versions, some of which have a single bug and others with multiple bugs. A discussion on factors that may affect the accuracy of execution result prediction and the resulting fault localization effectiveness is also presented. Our data suggests that, in general, with respect to fault localization techniques using execution results verified against the expected outputs, those using predicted execution results can be even more effective than (by examining a smaller number of statements to locate the first faulty statement) or as good as the former (the verified).