TL;DR: This work studied deep learning applications built on top of Tensor Flow and collected program bugs related to TensorFlow from StackOverflow QA pages and Github projects to examine the root causes and symptoms of coding defects in Tensorflow programs.
Abstract: Deep learning applications become increasingly popular in important domains such as self-driving systems and facial identity systems. Defective deep learning applications may lead to catastrophic consequences. Although recent research efforts were made on testing and debugging deep learning applications, the characteristics of deep learning defects have never been studied. To fill this gap, we studied deep learning applications built on top of TensorFlow and collected program bugs related to TensorFlow from StackOverflow QA pages and Github projects. We extracted information from QA pages, commit messages, pull request messages, and issue discussions to examine the root causes and symptoms of these bugs. We also studied the strategies deployed by TensorFlow users for bug detection and localization. These findings help researchers and TensorFlow users to gain a better understanding of coding defects in TensorFlow programs and point out a new direction for future research.
TL;DR: A new class of approaches, namely program repair techniques, whose key idea is to try to automatically repair software systems by producing an actual fix that can be validated by the testers before it is finally accepted, or that is adapted to properly fit the system.
Abstract: Despite their growing complexity and increasing size, modern software applications must satisfy strict release requirements that impose short bug fixing and maintenance cycles, putting significant pressure on developers who are responsible for timely producing high-quality software. To reduce developers workload, repairing and healing techniques have been extensively investigated as solutions for efficiently repairing and maintaining software in the last few years. In particular, repairing solutions have been able to automatically produce useful fixes for several classes of bugs that might be present in software programs. A range of algorithms, techniques, and heuristics have been integrated, experimented, and studied, producing a heterogeneous and articulated research framework where automatic repair techniques are proliferating. This paper organizes the knowledge in the area by surveying a body of 108 papers about automatic software repair techniques, illustrating the algorithms and the approaches, comparing them on representative examples, and discussing the open challenges and the empirical evidence reported so far.
TL;DR: This work develops coverage-guided fuzzing methods for neural networks that are well-suited to discovering errors which occur only for rare inputs, and describes how fast approximate nearest neighbor algorithms can provide this coverage metric.
Abstract: Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.
TL;DR: This work proposes a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing.
Abstract: Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy.
TL;DR: Bugs.jar is a large-scale dataset for research in automated debugging, patching, and testing of Java programs, comprised of 1,158 bugs and patches, drawn from 8 large, popular opensource Java projects, spanning 8 diverse and prominent application categories.
Abstract: We present Bugs.jar, a large-scale dataset for research in automated debugging, patching, and testing of Java programs. Bugs.jar is comprised of 1,158 bugs and patches, drawn from 8 large, popular open-source Java projects, spanning 8 diverse and prominent application categories. It is an order of magnitude larger than Defects4J, the only other dataset in its class. We discuss the methodology used for constructing Bugs.jar, the representation of the dataset, several use-cases, and an illustration of three of the use-cases through the application of 3 specific tools on Bugs.jar, namely our own tool, Elixir, and two third-party tools, Ekstazi and JaCoCo.
TL;DR: This mapping study identifies and reports on 63 studies, published between 1990 and June 2017, collected and gathered via manual search on digital libraries and databases related to computer science and computer engineering, finding Java language and Unified Modeling Language (UML) representation to be the most used materials.
Abstract: Traditional quantitative research methods of data collection in programming, such as questionnaires and interviews, are the most common approaches for researchers in this field. However, in recent years, eye-tracking has been on the rise as a new method of collecting evidence of visual attention and the cognitive process of programmers. Eye-tracking has been used by researchers in the field of programming to analyze and understand a variety of tasks such as comprehension and debugging. In this article, we will focus on reporting how experiments that used eye-trackers in programming research are conducted, and the information that can be collected from these experiments. In this mapping study, we identify and report on 63 studies, published between 1990 and June 2017, collected and gathered via manual search on digital libraries and databases related to computer science and computer engineering. Among the five main areas of research interest are program comprehension and debugging, which received an increased interest in recent years, non-code comprehension, collaborative programming, and requirements traceability research, which had the fewest number of publications due to possible limitations of the eye-tracking technology in this type of experiments. We find that most of the participants in these studies were students and faculty members from institutions of higher learning, and while they performed programming tasks on a range of programming languages and programming representations, we find Java language and Unified Modeling Language (UML) representation to be the most used materials. We also report on a range of eye-trackers and attention tracking tools that have been utilized, and find Tobii eye-trackers to be the most used devices by researchers.
TL;DR: Helix as discussed by the authors is a human-in-the-loop ML system that accelerates the development of machine learning workflows by intelligently tracking changes and intermediate results over time, enabling rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation.
Abstract: Development of machine learning (ML) workflows is a tedious process of iterative experimentation: developers repeatedly make changes to workflows until the desired accuracy is attained. We describe our vision for a "human-in-the-loop" ML system that accelerates this process: by intelligently tracking changes and intermediate results over time, such a system can enable rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation. We finally describe Helix, our preliminary attempt at such a system that has already led to speedups of upto 10x on typical iterative workflows against competing systems.
TL;DR: Vera is a tool that verifies P4 programs using symbolic execution and automatically uncovers a number of common bugs including parsing/deparsing errors, invalid memory accesses, loops and tunneling errors as well as verifying user-specified properties in a novel language the authors call NetCTL.
Abstract: We present Vera, a tool that verifies P4 programs using symbolic execution. Vera automatically uncovers a number of common bugs including parsing/deparsing errors, invalid memory accesses, loops and tunneling errors, among others. Vera can also be used to verify user-specified properties in a novel language we call NetCTL. To enable scalable, exhaustive verification of P4 program snapshots, Vera automatically generates all valid header layouts and uses a novel data-structure for match-action processing optimized for verification. These techniques allow Vera to scale very well: it only takes between 5s-15s to track the execution of a purely symbolic packet in the largest P4 program currently available (6KLOC) and can compute SEFL model updates in milliseconds. Vera can also explore multiple concrete dataplanes at once by allowing the programmer to insert symbolic table entries; the resulting verification highlights possible control plane errors. We have used Vera to analyze many P4 programs including the P4 tutorials, P4 programs in the research literature and the switch code from https://p4.org. Vera has found several bugs in each of them in seconds/minutes.
TL;DR: The results indicate that IDE-provided debuggers are not used as often as expected, as "printf debugging" remains a feasible choice for many programmers, and both knowledge and use of advanced debugging features are low.
Abstract: Debugging is an inevitable activity in most software projects, often difficult and more time-consuming than expected, giving it the nickname the "dirty little secret of computer science." Surprisingly, we have little knowledge on how software engineers debug software problems in the real world, whether they use dedicated debugging tools, and how knowledgeable they are about debugging. This study aims to shed light on these aspects by following a mixed-methods research approach. We conduct an online survey capturing how 176 developers reflect on debugging. We augment this subjective survey data with objective observations on how 458 developers use the debugger included in their integrated development environments (IDEs) by instrumenting the popular Eclipse and IntelliJ IDEs with the purpose-built plugin WatchDog 2.0. To clarify the insights and discrepancies observed in the previous steps, we followed up by conducting interviews with debugging experts and regular debugging users. Our results indicate that IDE-provided debuggers are not used as often as expected, as "printf debugging" remains a feasible choice for many programmers. Furthermore, both knowledge and use of advanced debugging features are low. These results call to strengthen hands-on debugging experience in computer science curricula and have already refined the implementation of modern IDE debuggers.
TL;DR: To unobtrusively reveal runtime behavior during both normal execution and debugging, techniques for visualizing program variables directly within the source code are contributed.
Abstract: Programmers must draw explicit connections between their code and runtime state to properly assess the correctness of their programs. However, debugging tools often decouple the program state from the source code and require explicitly invoked views to bridge the rift between program editing and program understanding. To unobtrusively reveal runtime behavior during both normal execution and debugging, we contribute techniques for visualizing program variables directly within the source code. We describe a design space and placement criteria for embedded visualizations. We evaluate our in situ visualizations in an editor for the Vega visualization grammar. Compared to a baseline development environment, novice Vega users improve their overall task grade by about 2 points when using the in situ visualizations and exhibit significant positive effects on their self-reported speed and accuracy.
TL;DR: The key contribution of SwitchPointer is to efficiently provide network visibility by using switch memory as a “directory service” — each switch, rather than storing the data necessary for monitoring functionalities, stores pointers to end-hosts where relevant telemetry data is stored.
Abstract: Monitoring and debugging large-scale networks remains a challenging problem. Existing solutions operate at one of the two extremes — systems running at end-hosts (more resources but less visibility into the network) or at network switches (more visibility, but limited resources). We present SwitchPointer, a network monitoring and debugging system that integrates the best of the two worlds. SwitchPointer exploits end-host resources and programmability to collect and monitor telemetry data. The key contribution of SwitchPointer is to efficiently provide network visibility by using switch memory as a “directory service” — each switch, rather than storing the data necessary for monitoring functionalities, stores pointers to end-hosts where relevant telemetry data is stored. We demonstrate, via experiments over real-world testbeds, that SwitchPointer can efficiently monitor and debug network problems, many of which were either hard or even infeasible with existing designs.
TL;DR: The types of errors that early childhood preservice teachers commonly made and how they debugged the errors are reported and provided directions for future computer science education research that aims to prepare teachers for programming, computational thinking, and STEM education.
Abstract: In this study, we investigated the debugging process that early childhood preservice teachers used during block-based programing. Its purpose was to provide insights into how to prepare early childhood teachers to integrate computer science into instruction. This study reports the types of errors that early childhood preservice teachers commonly made and how they debugged the errors. Findings are discussed in relation to research and practice that could benefit from debugging instruction. This study provides directions for future computer science education research that aims to prepare teachers for programming, computational thinking, and STEM education. Though this study used robotics as a programming context, findings on early childhood preservice teachers’ debugging processes could be applicable to other contexts involving block-based programming.
TL;DR: A genetic algorithm-based framework which integrates software fault localization techniques and focuses on reusing test specifications and input values whenever feasible is proposed which can be easily reused between different products of the same family and help reduce the overall testing and debugging cost.
Abstract: In response to the highly competitive market and the pressure to cost-effectively release good-quality software, companies have adopted the concept of software product line to reduce development cost. However, testing and debugging of each product, even from the same family, is still done independently. This can be very expensive. To solve this problem, we need to explore how test cases generated for one product can be used for another product. We propose a genetic algorithm-based framework which integrates software fault localization techniques and focuses on reusing test specifications and input values whenever feasible. Case studies using four software product lines and eight fault localization techniques were conducted to demonstrate the effectiveness of our framework. Discussions on factors that may affect the effectiveness of the proposed framework is also presented. Our results indicate that test cases generated in such a way can be easily reused (with appropriate conversion) between different products of the same family and help reduce the overall testing and debugging cost.
TL;DR: This paper shows that unikernels do not actually require a virtual hardware abstraction, but can achieve similar levels of isolation when running as processes by leveraging existing kernel system call whitelisting mechanisms.
Abstract: System virtualization (e.g., the virtual machine abstraction) has been established as the de facto standard form of isolation in multi-tenant clouds. More recently, unikernels have emerged as a way to reuse VM isolation while also being lightweight by eliminating the general purpose OS (e.g., Linux) from the VM. Instead, unikernels directly run the application (linked with a library OS) on the virtual hardware. In this paper, we show that unikernels do not actually require a virtual hardware abstraction, but can achieve similar levels of isolation when running as processes by leveraging existing kernel system call whitelisting mechanisms. Moreover, we show that running unikernels as processes reduces hardware requirements, enables the use of standard process debugging and management tooling, and improves the already impressive performance that unikernels exhibit.
TL;DR: This paper presents an overview of the past, present and future of the OpenMP application programming interface (API), which provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory.
Abstract: This paper presents an overview of the past, present and future of the OpenMP application programming interface (API). While the API originally specified a small set of directives that guided shared memory fork-join parallelization of loops and program sections, OpenMP now provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory. As we look toward the future of OpenMP, we immediately see further evolution of the support for that range of parallelization strategies and the addition of direct support for debugging and performance analysis tools. Looking beyond the next major release of the specification of the OpenMP API, we expect the specification eventually to include support for more parallelization strategies and to embrace closer integration into its Fortran, C and, in particular, C++ base languages, which will likely require the API to adopt additional programming abstractions.
TL;DR: ChangeLocator is proposed, a method to automatically locate crash-inducing changes for a given bucket of crash reports and significantly outperforms the existing state-of-the-art approach on characterizing the bug inducing changes for crashing bugs.
Abstract: Software crashes are severe manifestations of software bugs. Debugging crashing bugs is tedious and time-consuming. Understanding software changes that induce a crashing bug can provide useful contextual information for bug fixing and is highly demanded by developers. Locating the bug inducing changes is also useful for automatic program repair, since it narrows down the root causes and reduces the search space of bug fix location. However, currently there are no systematic studies on locating the software changes to a source code repository that induce a crashing bug reflected by a bucket of crash reports. To tackle this problem, we first conducted an empirical study on characterizing the bug inducing changes for crashing bugs (denoted as crash-inducing changes). We also propose ChangeLocator, a method to automatically locate crash-inducing changes for a given bucket of crash reports. We base our approach on a learning model that uses features originated from our empirical study and train the model using the data from the historical fixed crashes. We evaluated ChangeLocator with six release versions of Netbeans project. The results show that it can locate the crash-inducing changes for 44.7%, 68.5%, and 74.5% of the bugs by examining only top 1, 5 and 10 changes in the recommended list, respectively. It significantly outperforms the existing state-of-the-art approach.
TL;DR: Evaluation results revealed that the best techniques, namely Kulcynski2, Mountford, Ochiai, and Zoltar, lead the debugger to inspect a maximum of three rules to locate the bug in around 74% of the cases.
Abstract: Model transformations play a cornerstone role in Model-Driven Engineering (MDE), as they provide the essential mechanisms for manipulating and transforming models. The correctness of software built using MDE techniques greatly relies on the correctness of model transformations. However, it is challenging and error prone to debug them, and the situation gets more critical as the size and complexity of model transformations grow, where manual debugging is no longer possible.Spectrum-Based Fault Localization (SBFL) uses the results of test cases and their corresponding code coverage information to estimate the likelihood of each program component (e.g., statements) of being faulty. In this article we present an approach to apply SBFL for locating the faulty rules in model transformations. We evaluate the feasibility and accuracy of the approach by comparing the effectiveness of 18 different state-of-the-art SBFL techniques at locating faults in model transformations. Evaluation results revealed that the best techniques, namely Kulcynski2, Mountford, Ochiai, and Zoltar, lead the debugger to inspect a maximum of three rules to locate the bug in around 74% of the cases. Furthermore, we compare our approach with a static approach for fault localization in model transformations, observing a clear superiority of the proposed SBFL-based method.
TL;DR: This paper presents CUDAAdvisor, a profiling framework to guide code optimization in modern NVIDIA GPUs that supports GPU profiling across different CUDA versions and architectures, including CUDA 8.0 and Pascal architecture.
Abstract: General-purpose GPUs have been widely utilized to accelerate parallel applications. Given a relatively complex programming model and fast architecture evolution, producing efficient GPU code is nontrivial. A variety of simulation and profiling tools have been developed to aid GPU application optimization and architecture design. However, existing tools are either limited by insufficient insights or lacking in support across different GPU architectures, runtime and driver versions. This paper presents CUDAAdvisor, a profiling framework to guide code optimization in modern NVIDIA GPUs. CUDAAdvisor performs various fine-grained analyses based on the profiling results from GPU kernels, such as memory-level analysis (e.g., reuse distance and memory divergence), control flow analysis (e.g., branch divergence) and code-/data-centric debugging. Unlike prior tools, CUDAAdvisor supports GPU profiling across different CUDA versions and architectures, including CUDA 8.0 and Pascal architecture. We demonstrate several case studies that derive significant insights to guide GPU code optimization for performance improvement.
TL;DR: A semi-automated method is described that reduces the burden of PSP data collection by extracting the required time and size of PSP measurements from IDE interaction logs and can be easily generalized to other developing environment.
Abstract: The Personal Software Process (PSP) is an effective software process improvement method that heavily relies on manual collection of software development data. This paper describes a semi-automated method that reduces the burden of PSP data collection by extracting the required time and size of PSP measurements from IDE interaction logs. The tool mines enriched event data streams so can be easily generalized to other developing environment also. In addition, the proposed method is adaptable to phase definition changes and creates activity visualizations and summarizations that are helpful for software project management. Tools and processed data used for this paper are available on GitHub at: https://github.com/unknowngithubuser1/data.
TL;DR: The key insight behind MemFix is that finding such a set of deallocation statements corresponds to solving an exact cover problem derived from a variant of typestate static analysis.
Abstract: We present MemFix, an automated technique for fixing memory deallocation errors in C programs. MemFix aims to fix memory-leak, double-free, and use-after-free errors, which occur when developers fail to properly deallocate memory objects. MemFix attempts to fix these errors by finding a set of free-statements that correctly deallocate all allocated objects without causing double-frees and use-after-frees. The key insight behind MemFix is that finding such a set of deallocation statements corresponds to solving an exact cover problem derived from a variant of typestate static analysis. We formally present the technique and experimentally show that MemFix is able to fix real errors found in open-source programs. Because MemFix is based on a sound static analysis, the generated patches guarantee to fix the original errors without introducing new errors.
TL;DR: Results show that the debugger can be used with various executable DSLs implemented with different metaprogramming approaches, and is on average six times more efficient in memory, and at least 2.2 faster when exploring past execution states, while only slowing down the execution 1.6 times on average.
TL;DR: A novel fault localization technique based on complex network theory (FLCN) is proposed to improve localization effectiveness in programs with single-fault and multiple-f fault and to aid developers to localize multiple faults simultaneously in a single diagnosis rank list.
Abstract: Effective debugging is necessary for producing high quality and reliable software. Fault localization plays a vital role in the debugging process. However, fault localization is the most tedious and expensive activity in program debugging, as such, effective fault localization techniques that can identify the exact location of faults is eminent. Despite various fault localization techniques proposed, their application in multiple-fault programs is limited. The presence of multiple faults in a program reduces the efficacy of the existing fault localization techniques to locate faults effectively due to fault interference. Moreover, most of these techniques are unable to localize faults simultaneously. This has led researchers to adopt alternative approaches, such as one-fault-at-a-time debugging and parallel debugging. In this paper, we propose a novel fault localization technique based on complex network theory (FLCN) to improve localization effectiveness in programs with single-fault and multiple-fault and to aid developers to localize multiple faults simultaneously in a single diagnosis rank list. The proposed technique ranks faulty statements based on their behavioral abnormalities and distance between faulty statements in both passed and failed test executions. Two graph-based centrality measures are adopted for fault diagnosis, namely, degree centrality and closeness centrality and a new ranking formula is proposed. FLCN is evaluated across 14 subjects with both single-fault and multiple-fault programs. Our experimental results show that FLCN is more effective at locating faults when compared with existing state-of-the-art fault localization techniques with improvement of fault localization effectiveness in both single-fault and multiple-fault programs.
TL;DR: This thesis culminates in the finding that feedback loops are the characterizing criterion of contemporary software development, and its model is flexible enough to accommodate a broad band of modern workflows, despite large variances in how projects use and configure parts of FDD.
Abstract: Software developers today crave for feedback, be it from their peers in the form of code review, static analysis tools like their compiler, or the local or remote execution of their tests in the Continuous Integration (CI) environment. With the advent of social coding sites such as GitHub and tight integration of CI services such as Travis CI, software development practices have fundamentally changed. Despite a highly alternated software engineering landscape, however, we still lack a suitable holistic description of contemporary software development practices. Existing descriptions such as the V-model are either too coarse-grained to describe an individual contributor’s workflow, or only regard a subpart of the development process, like Test-Driven Development (TDD). In addition, most existing models are pre- rather than de-scriptive. By contrast, in this thesis, we perform a series of empirical studies to characterize the individual constituents of Feedback-Driven Development (FDD): we study the prevalence and evolution of Automatic Static Analysis Tools (ASATs), we explain the “Last Line Effect,” a phenomenon at the boundary between ASATs and code review, we observe local testing patterns in the Integrated Development Environment (IDE) of developers, compare them to remote testing on the CI server, and, finally, should these quality assurance techniques have failed, we examine how developers debug faults. We then compile this empirical evidence into a model of how today’s software developers work. Our results show that developers employ the different techniques in FDD to best achieve their current task in the most efficient way, often knowingly taking shortcuts to get the job done. While this is efficient in the short term, it also bears risks, namely that prevention and introspection activities fall short: developers might not configure or combine ASATs to their full benefit, they might have wrong perceptions about the amount of time spent on quality-control, quality-related activities such as testing could become an after-thought, and learning about debugging techniques falls short. A relatively rigid, tool-enforced FDD process could help developers in not committing some of these mistakes. Our thesis culminates in the finding that feedback loops are the characterizing criterion of contemporary software development. Our model is flexible enough to accommodate a broad band of modern workflows, despite large variances in how projects use and configure parts of FDD.
TL;DR: D4 as discussed by the authors is a concurrency analysis framework that detects concurrency bugs interactively in the programming phase by sending code changes to D4, which in turn provides immediate feedback to the developer of the new bugs.
Abstract: We present D4, a fast concurrency analysis framework that detects concurrency bugs (e.g., data races and deadlocks) interactively in the programming phase. As developers add, modify, and remove statements, the code changes are sent to D4 to detect concurrency bugs in real time, which in turn provides immediate feedback to the developer of the new bugs. The cornerstone of D4 includes a novel system design and two novel parallel differential algorithms that embrace both change and parallelization for fundamental static analyses of concurrent programs. Both algorithms react to program changes by memoizing the analysis results and only recomputing the impact of a change in parallel. Our evaluation on an extensive collection of large real-world applications shows that D4 efficiently pinpoints concurrency bugs within 100ms on average after a code change, several orders of magnitude faster than both the exhaustive analysis and the state-of-the-art incremental techniques.
TL;DR: In this paper, a taxonomy of quantum computing bugs is presented. Butler et al. investigate what types of bugs occur and what defenses against those bugs are possible in quantum computing programs.
Abstract: With the advent of small-scale prototype quantum computers, researchers can now code and run quantum algorithms that were previously proposed but not fully implemented. In support of this growing interest in quantum computing experimentation, programmers need new tools and techniques to write and debug QC code. In this work, we implement a range of QC algorithms and programs in order to discover what types of bugs occur and what defenses against those bugs are possible in QC programs. We conduct our study by running small-sized QC programs in QC simulators in order to replicate published results in QC implementations. Where possible, we cross-validate results from programs written in different QC languages for the same problems and inputs. Drawing on this experience, we provide a taxonomy for QC bugs, and we propose QC language features that would aid in writing correct code.
TL;DR: In this paper, the authors propose an approach to identify training set bugs in the training set and suggest appropriate changes to the labels to improve the learning of machine learning models, which is a step toward trustworthy machine learning.
Abstract: Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for manual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus improves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem.Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. Our algorithm is a step toward trustworthy machine learning.
TL;DR: C-Coupler2 is ready for use to develop various coupled or nested models and has passed a number of test cases involving model coupling and nesting, and with various MPI process layouts between component models, and has already been used in several real coupled models.
Abstract: The Chinese C-Coupler (Community Coupler) family aims primarily to develop coupled models for weather forecasting and climate simulation and prediction It is targeted to serve various coupled models with flexibility, user-friendliness, and extensive coupling functions C-Coupler2, the latest version, includes a series of new features in addition to those of C-Coupler1 – including a common, flexible, and user-friendly coupling configuration interface that combines a set of application programming interfaces and a set of XML-formatted configuration files; the capability of coupling within one executable or the same subset of MPI (message passing interface) processes; flexible and automatic coupling procedure generation for any subset of component models; dynamic 3-D coupling that enables convenient coupling of fields on 3-D grids with time-evolving vertical coordinate values; non-blocking data transfer; facilitation for model nesting; facilitation for increment coupling; adaptive restart capability; and finally a debugging capability C-Coupler2 is ready for use to develop various coupled or nested models It has passed a number of test cases involving model coupling and nesting, and with various MPI process layouts between component models, and has already been used in several real coupled models
TL;DR: This paper presents the first automated testing technique for interactive debuggers, called DBDB, which generates debugger actions to exercise the debugger and records traces that summarize the debugger's behavior, and finds diverging behavior that points to bugs and other noteworthy differences.
Abstract: To understand, localize, and fix programming errors, developers often rely on interactive debuggers. However, as debuggers are software, they may themselves have bugs, which can make debugging unnecessarily hard or even cause developers to reason about bugs that do not actually exist in their code. This paper presents the first automated testing technique for interactive debuggers. The problem of testing debuggers is fundamentally different from the well-studied problem of testing compilers because debuggers are interactive and because they lack a specification of expected behavior. Our approach, called DBDB, generates debugger actions to exercise the debugger and records traces that summarize the debugger's behavior. By comparing traces of multiple debuggers with each other, we find diverging behavior that points to bugs and other noteworthy differences. We evaluate DBDB on the JavaScript debuggers of Firefox and Chromium, finding 19 previously unreported bugs, eight of which are already fixed by the developers.
TL;DR: This paper describes an online tool, Ladebug, that is designed to scaffold the learning of debugging skills, and finds that students are positive about the tool, and report the exercises to be engaging and helpful.
Abstract: Debugging software is challenging, particularly for novices. Despite the importance of debugging, most novice programmers are not formally taught any debugging skills. This paper describes an online tool, Ladebug, that is designed to scaffold the learning of debugging skills. In this environment, students follow a structured debugging process to find and fix errors in predefined exercises. Overall, we find that students are positive about the tool, and report the exercises to be engaging and helpful.
TL;DR: Guided by witchcraft, the lightweight framework Witch detected several performance problems in important code bases; eliminating these inefficiencies resulted in significant speedups.
Abstract: Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations. Many existing tools instrument every load and store instruction to monitor memory, which significantly slows execution and consumes enormously extra memory. Our lightweight framework, Witch, samples consecutive accesses to the same memory location by exploiting two ubiquitous hardware features: the performance monitoring units (PMU) and debug registers. Witch performs no instrumentation. Hence, witchcraft---tools built atop Witch---can detect a variety of software inefficiencies while introducing negligible slowdown and insignificant memory consumption and yet maintaining accuracy comparable to exhaustive instrumentation tools. Witch allowed us to scale our analysis to a large number of code bases. Guided by witchcraft, we detected several performance problems in important code bases; eliminating these inefficiencies resulted in significant speedups.