TL;DR: The results show that the current industrial practices of microservice debugging can be improved by employing proper tracing and visualization techniques and strategies, and suggest that there is a strong need for more intelligent trace analysis and visualization for distributed systems.
Abstract: The complexity and dynamism of microservice systems pose unique challenges to a variety of software engineering tasks such as fault analysis and debugging. In spite of the prevalence and importance of microservices in industry, there is limited research on the fault analysis and debugging of microservice systems. To fill this gap, we conduct an industrial survey to learn typical faults of microservice systems, current practice of debugging, and the challenges faced by developers in practice. We then develop a medium-size benchmark microservice system (being the largest and most complex open source microservice system within our knowledge) and replicate 22 industrial fault cases on it. Based on the benchmark system and the replicated fault cases, we conduct an empirical study to investigate the effectiveness of existing industrial debugging practices and whether they can be further improved by introducing the state-of-the-art tracing and visualization techniques for distributed systems. The results show that the current industrial practices of microservice debugging can be improved by employing proper tracing and visualization techniques and strategies. Our findings also suggest that there is a strong need for more intelligent trace analysis and visualization, e.g., by combining trace visualization and improved fault localization, and employing data-driven and learning-based recommendation for guided visual exploration and comparison of traces.
TL;DR: Sage as mentioned in this paper leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service's QoS.
Abstract: Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations. Prior work on performance debugging for cloud services either relies on empirical techniques, or uses supervised learning to diagnose the root causes of performance issues, which requires significant application instrumentation, and is difficult to deploy in practice. We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices that focuses on practicality and scalability. Sage leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service’s QoS. In experiments on both dedicated local clusters and large clusters on Google Compute Engine we show that Sage consistently achieves over 93% accuracy in correctly identifying the root cause of QoS violations, and improves performance predictability.
TL;DR: In this article, the authors present the first systematic study of DL compiler bugs by analyzing 603 bugs arising in three popular DL compilers (i.e., TVM from Apache, Glow from Facebook, and nGraph from Intel).
Abstract: There are increasing uses of deep learning (DL) compilers to generate optimized code, boosting the runtime performance of DL models on specific hardware. Like their traditional counterparts, DL compilers can generate incorrect code, resulting in unexpected model behaviors that may cause catastrophic consequences in mission-critical systems. On the other hand, the DL models processed by DL compilers differ fundamentally from imperative programs in that the program logic in DL models is implicit. As such, various characteristics of the bugs arising from traditional compilers need to be revisited in the context of DL compilers. In this paper, we present the first systematic study of DL compiler bugs by analyzing 603 bugs arising in three popular DL compilers (i.e., TVM from Apache, Glow from Facebook, and nGraph from Intel). We analyzed these bugs according to their root causes, symptoms, and the stages where they occur during compilation. We obtain 12 findings, and provide a series of valuable guidelines for future work on DL compiler bug detection and debugging. For example, a large portion (nearly 20%) of DL compiler bugs are related to types, especially tensor types. The analysis of these bugs helps design new mutation operators (e.g., adding type cast for a tensor to promote implicit type conversion in subsequent tensor computations) to facilitate type-related bug detection. Further, we developed TVMfuzz as a proof-of-concept application of our findings to test the TVM DL compiler. It generates new tests based on TVM's original test suite. They expose 8 TVM bugs that are missed by the original test suite. The result demonstrates the usefulness of our findings.
TL;DR: In this paper, the authors propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors by analyzing the historic trends in values propagated between layers.
Abstract: Deep neural networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.
TL;DR: The Kokkos EcoSystem as discussed by the authors is a portable software stack based on the Kokkos Core Programming Model, which provides math libraries, interoperability capabilities with Python and Fortran, and Tools for analyzing, debugging, and optimizing applications.
Abstract: State-of-the-art engineering and science codes have grown in complexity dramatically over the last two decades. Application teams have adopted more sophisticated development strategies, leveraging third party libraries, deploying comprehensive testing, and using advanced debugging and profiling tools. In today’s environment of diverse hardware platforms, these applications also desire performance portability—avoiding the need to duplicate work for various platforms. The Kokkos EcoSystem provides that portable software stack. Based on the Kokkos Core Programming Model, the EcoSystem provides math libraries, interoperability capabilities with Python and Fortran, and Tools for analyzing, debugging, and optimizing applications. In this article, we overview the components, discuss some specific use cases, and highlight how codesigning these components enables a more developer friendly experience.
TL;DR: The QBugs benchmark as mentioned in this paper provides experimental subjects and an experimental infrastructure to ease the evaluation of new research and the reproducibility of previously published results on quantum software engineering, which is a new benchmark for software engineering research.
Abstract: Reproducibility and comparability of empirical results are at the core tenet of the scientific method in any scientific field. To ease reproducibility of empirical studies, several benchmarks in software engineering research, such as Defects4J, have been developed and widely used. For quantum software engineering research, however, no benchmark has been established yet. In this position paper, we propose a new benchmark—named QBugs—which will provide experimental subjects and an experimental infrastructure to ease the evaluation of new research and the reproducibility of previously published results on quantum software engineering.
TL;DR: In this paper, the authors identify and categorize some bug patterns in the quantum programming language Qiskit and briefly discuss how to eliminate or prevent those bug patterns, which are usually caused by the misunderstanding of a programming language's features, the use of erroneous design patterns or simple mistakes sharing common behaviors.
Abstract: Bug patterns are erroneous code idioms or bad coding practices that have been proved to fail time and time again, which are usually caused by the misunderstanding of a programming language's features, the use of erroneous design patterns, or simple mistakes sharing common behaviors. This paper identifies and categorizes some bug patterns in the quantum programming language Qiskit and briefly discusses how to eliminate or prevent those bug patterns. We take this research as the first step to provide an underlying basis for debugging and testing quantum programs.
TL;DR: Zhang et al. as mentioned in this paper propose a deep learning-based approach that can leverage the ordinal nature of log levels to make suggestions on choosing log levels, by using the syntactic context and message features of the logging statements extracted from the source code.
Abstract: Developers write logging statements to generate logs that provide valuable runtime information for debugging and maintenance of software systems. Log level is an important component of a logging statement, which enables developers to control the information to be generated at system runtime. However, due to the complexity of software systems and their runtime behaviors, deciding a proper log level for a logging statement is a challenging task. For example, choosing a higher level (e.g., error) for a trivial event may confuse end users and increase system maintenance overhead, while choosing a lower level (e.g., trace) for a critical event may prevent the important execution information to be conveyed opportunely. In this paper, we tackle the challenge by first conducting a preliminary manual study on the characteristics of log levels. We find that the syntactic context of the logging statement and the message to be logged might be related to the decision of log levels, and log levels that are further apart in order (e.g., trace and error) tend to have more differences in their characteristics. Based on this, we then propose a deep-learning based approach that can leverage the ordinal nature of log levels to make suggestions on choosing log levels, by using the syntactic context and message features of the logging statements extracted from the source code. Through an evaluation on nine large-scale open source projects, we find that: 1) our approach outperforms the state-of-the-art baseline approaches; 2) we can further improve the performance of our approach by enlarging the training data obtained from other systems; 3) our approach also achieves promising results on cross-system suggestions that are even better than the baseline approaches on within-system suggestions. Our study highlights the potentials in suggesting log levels to help developers make informed logging decisions.
TL;DR: An empirical study to investigate the characteristics of optimization bugs in two mainstream compilers, GCC and LLVM, and reveals that Optimizations are the buggiest component in both compilers except for the C++ component.
TL;DR: Even if you don't already know how to program, Hacking: The Art of Exploitation, 2nd Edition will give you a complete picture of programming, machine architecture, network communications, and existing hacking techniques.
Abstract: Hacking is the art of creative problem solving, whether that means finding an unconventional solution to a difficult problem or exploiting holes in sloppy programming. Many people call themselves hackers, but few have the strong technical foundation needed to really push the envelope. Rather than merely showing how to run existing exploits, author Jon Erickson explains how arcane hacking techniques actually work. To share the art and science of hacking in a way that is accessible to everyone, Hacking: The Art of Exploitation, 2nd Edition introduces the fundamentals of C programming from a hacker's perspective.The included LiveCD provides a complete Linux programming and debugging environment-all without modifying your current operating system. Use it to follow along with the book's examples as you fill gaps in your knowledge and explore hacking techniques on your own. Get your hands dirty debugging code, overflowing buffers, hijacking network communications, bypassing protections, exploiting cryptographic weaknesses, and perhaps even inventing new exploits. This book will teach you how to:Program computers using C, assembly language, and shell scripts Corrupt system memory to run arbitrary code using buffer overflows and format strings Inspect processor registers and system memory with a debugger to gain a real understanding of what is happening Outsmart common security measures like nonexecutable stacks and intrusion detection systems Gain access to a remote server using port-binding or connect-back shellcode, and alter a server's logging behavior to hide your presence Redirect network traffic, conceal open ports, and hijack TCP connections Crack encrypted wireless traffic using the FMS attack, and speed up brute-force attacks using a password probability matrixHackers are always pushing the boundaries, investigating the unknown, and evolving their art. Even if you don't already know how to program, Hacking: The Art of Exploitation, 2nd Edition will give you a complete picture of programming, machine architecture, network communications, and existing hacking techniques. Combine this knowledge with the included Linux environment, and all you need is your own creativity.
TL;DR: Zhang et al. as discussed by the authors proposed FLDebugger, which traces the global model's test errors, jointly through the training log and the underlying learning algorithm, back to first identify the clients and subsequently their training samples that are most responsible for the errors.
Abstract: Federated learning (FL) enables large amounts of participants to construct a global learning model, while storing training data privately at each client device. A fundamental issue in this framework is the susceptibility to the erroneous training data. This problem is especially challenging due to the invisibility of clients’ local training data and training process, as well as the resource constraints of a large number of mobile and edge devices. In this paper, we try to tackle this challenging issue by introducing the first FL debugging framework, FLDebugger, for mitigating test error caused by erroneous training data. The pro-posed solution traces the global model’s bugs (test errors), jointly through the training log and the underlying learning algorithm, back to first identify the clients and subsequently their training samples that are most responsible for the errors. In addition, we devise an influence-based participant selection strategy to fix bugs as well as to accelerate the convergence of model retraining. The performance of the identification algorithm is evaluated via extensive experiments on a real AIoT system (50 clients, including 20 edge computers, 20 laptops and 10 desktops) and in larger-scale simulated environments. The evaluation results attest to that our framework achieves accurate and efficient identification of negatively influential clients and samples, and significantly improves the model performance by fixing bugs.
TL;DR: The current status of testing and debugging in serverless-based applications depicted by the experts helped to highlight issues and challenges that need to be deeply investigated.
Abstract: Testing serverless applications plays an important role in software quality assurance. The current status of testing and debugging in serverless-based applications depicted by the experts helped us highlight issues and challenges that need to be deeply investigated.
TL;DR: An IRBL approach, Pathidea, which leverages logs in bug reports to re-construct execution paths and helps improve the results of bug localization and conducts a parameter sensitivity analysis and provides recommendations on setting the parameter values when applying PathideA.
Abstract: To assist developers with debugging and analyzing bug reports, researchers have proposed information retrieval-based bug localization (IRBL) approaches. IRBL approaches leverage the textual information in bug reports as queries to generate a ranked list of potential buggy files that may need further investigation. Although IRBL approaches have shown promising results, most prior research only leverages the textual information that is “visible” in bug reports, such as bug description or title. However, in addition to the textual description of the bug, developers also often attach logs in bug reports. Logs provide important information that can be used to re-construct the system execution paths when an issue happens and assist developers with debugging. In this paper, we propose an IRBL approach, Pathidea, which leverages logs in bug reports to re-construct execution paths and helps improve the results of bug localization. Pathidea uses static analysis to create a file-level call graph, and re-constructs the call paths from the reported logs. We evaluate Pathidea on eight open source systems, with a total of 1,273 bug reports that contain logs. We find that Pathidea achieves a high recall (up to 51.9% for Top@5). On average, Pathidea achieves an improvement that varies from 8% to 21% and 5% to 21% over BRTracer in terms of Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) across studied systems, respectively. Moreover, we find that the re-constructed execution paths can also complement other IRBL approaches by providing a 10% and 8% improvement in terms of MAP and MRR, respectively. Finally, we conduct a parameter sensitivity analysis and provide recommendations on setting the parameter values when applying Pathidea.
TL;DR: DIY as discussed by the authors is an interactive technique that enables users to assess the responses from a state-of-the-art natural language to SQL (NL2SQL) system for correctness and, if possible, fix errors.
Abstract: Designing natural language interfaces for querying databases remains an important goal pursued by researchers in natural language processing, databases, and HCI. These systems receive natural language as input, translate it into a formal database query, and execute the query to compute a result. Because the responses from these systems are not always correct, it is important to provide people with mechanisms to assess the correctness of the generated query and computed result. However, this assessment can be challenging for people who lack expertise in query languages. We present Debug-It-Yourself (DIY), an interactive technique that enables users to assess the responses from a state-of-the-art natural language to SQL (NL2SQL) system for correctness and, if possible, fix errors. DIY provides users with a sandbox where they can interact with (1) the mappings between the question and the generated query, (2) a small-but-relevant subset of the underlying database, and (3) a multi-modal explanation of the generated query. End-users can then employ a back-of-the-envelope calculation debugging strategy to evaluate the system’s response. Through an exploratory study with 12 users, we investigate how DIY helps users assess the correctness of the system’s answers and detect & fix errors. Our observations reveal the benefits of DIY while providing insights about end-user debugging strategies and underscore opportunities for further improving the user experience.
TL;DR: In this paper, the authors provide a holistic overview of all secure scan chain architectures starting from preliminary methods introduced when cryptography is in place and the adversary threat model is very limited, and then newer and more advanced methods were introduced when logic obfuscation was in place.
Abstract: The availability of access to Integrated Circuits’ scan chain is an inevitable requirement of modern ICs for testability/debugging purposes. However, leaving access to the scan chain OPEN resulted in numerous security threats on ICs. It raises challenging concerns particularly when the secret asset, like secret information, is placed within the chip, such as the keys of cryptographic algorithms, or similarly logic obfuscation key. So, to combat these threats, numerous secure scan chain architectures have been proposed in the literature to prevent any unauthorized access to the scan chain. They also keep the availability of the scan chain for testability/debugging. In this paper, we first show why a secure scan chain architecture is required when security primitives, like logic obfuscation, are in place. Then, we provide a holistic overview of all secure scan chain architectures starting from preliminary methods introduced when cryptography is in place and the adversary threat model is very limited. It is then followed by newer and more advanced methods introduced when logic obfuscation is in place and the adversary threat model is much stronger. Hence, we have more concentration on the architecture proposed more recently on logic obfuscation. We evaluate all secure scan chain architectures in terms of security and resiliency, testability/debugging time and complexity, and area/power/delay overhead.
TL;DR: A novel approach called LogAssist that assists practitioners with log analysis by providing an organized and concise view of logs by first grouping logs into event sequences (i.e., workflows), which better illustrate the system runtime execution paths.
Abstract: Logs contain valuable information about the runtime behaviors of software systems. Thus, practitioners rely on logs for various tasks such as debugging, system comprehension, and anomaly detection. However, due to the unstructured nature and large size of logs, there are several challenges that practitioners face with log analysis. In this paper, we propose a novel approach called LogAssist that tackles these challenges and assists practitioners with log analysis. LogAssist provides an organized and concise view of logs by first grouping logs into event sequences (i.e., workflows), which better illustrate the system runtime execution paths. Then, LogAssist compresses the log events in workflows by hiding consecutive events and applying n-gram modeling to identify common event sequences. We evaluated LogAssist on the logs that are generated by two open-source and one enterprise system. We find that LogAssist can reduce the number of log events that practitioners need to investigate by up to 99%. Through a user study with 19 participants, we also find that LogAssist can assist practitioners by reducing the needed time on log analysis tasks by an average of 40%. The participants also rated LogAssist an average of 4.53 out of 5 for improving their experiences of performing log analysis. Finally, we document our experiences and lessons learned from developing and adopting LogAssist in practice. We believe that LogAssist and our reported experiences may lay the basis for future analysis and interactive exploration on logs.
TL;DR: In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, the authors compare the effectiveness of statistical fault localization against dynamic slicing.
Abstract: Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (ie single or multiple faults) and error type (ie artificial or real errors)
TL;DR: PMDebugger as discussed by the authors uses a hierarchical design composed of PM debugging-specific data structures, operations and bug-detection algorithms (rules) to detect crash consistency bugs in PM programs.
Abstract: Debugging persistent memory (PM) programs faces a fundamental tradeoff between performance overhead and bug coverage (comprehensiveness). Large performance overhead or limited bug coverage makes debugging infeasible or ineffective for PM programs. We present PMDebugger, a debugger that detects crash consistency bugs in PM programs. Unlike prior work, PMDebugger is fast, flexible and comprehensive. The design of PMDebugger is driven by a characterization that shows how three fundamental operations in PM programs (store, cache writeback and fence) typically occur in PM programs. PMDebugger uses a hierarchical design composed of PM debugging-specific data structures, operations and bug-detection algorithms (rules). We generalize nine rules to detect crash-consistency bugs for various PM persistency models. Compared with a state-of-the-art detector (XFDetector) and an industry-quality detector (Pmemcheck), PMDebugger leads to 49.3x and 3.4x speedup on average. Compared with another state-of-the-art detector (PMTest) optimized for high performance, PMDebugger achieves comparable performance (within a factor of 2), without heavily relying on programmer annotations, and detects 38 more bugs on ten applications. PMDebugger also identifies more bugs than XFDetector and Pmemcheck. PMDebugger detects 19 new bugs in a real application (memcached) and two new bugs from Intel PMDK.
TL;DR: In this paper, the authors propose a technique that aims to facilitate the debugging process of trained statistical models, given an ML model and a labeled data set, to produce an interpretable characterization of the data on which the model performs particularly poorly.
Abstract: While machine learning (ML) models play an increasingly prevalent role in many software engineering tasks, their prediction accuracy is often problematic. When these models do mispredict, it can be very difficult to isolate the cause. In this paper, we propose a technique that aims to facilitate the debugging process of trained statistical models. Given an ML model and a labeled data set, our method produces an interpretable characterization of the data on which the model performs particularly poorly. The output of our technique can be useful for understanding limitations of the training data or the model itself; it can also be useful for ensembling if there are multiple models with different strengths. We evaluate our approach through case studies and illustrate how it can be used to improve the accuracy of predictive models used for software engineering tasks within Facebook.
TL;DR: A novel solution that leverages how functions are handled by Ethereum virtual machine (EVM) to automatically recover function signatures and proposes a new approach named type-aware symbolic execution (TASE) that utilizes the semantics of EVM operations on parameters to identify the number and the types of parameters.
Abstract: Millions of smart contracts have been deployed onto Ethereum for providing various services, which can be invoked through their functions. For this purpose, the caller needs to know the function signature of a callee, which includes its function id and parameter types. Such signatures are critical to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profiling. Unfortunately, it is challenging to recover the function signatures from contract bytecode, since neither debug information nor type information is present in the bytecode. To address this issue, prior approaches rely on source code, or a collection of known signatures from incomplete databases or incomplete heuristic rules, which, however, are far from adequate and cannot cope with the rapid growth of new contracts. In this paper, we propose a novel solution that leverages how functions are handled by Ethereum virtual machine (EVM) to automatically recover function signatures. In particular, we exploit how smart contracts determine the functions to be invoked to locate and extract function ids, and propose a new approach named type-aware symbolic execution (TASE) that utilizes the semantics of EVM operations on parameters to identify the number and the types of parameters.Moreover, we develop SigRec, a new tool for recovering function signatures from contract bytecode without the need of source code and function signature databases. The extensive experimental results show that SigRec outperforms all existing tools, achieving an unprecedented 98.9% accuracy within 0.07 seconds. We further demonstrate that the recovered function signatures are useful in attack detection, fuzzing and reverse engineering of EVM bytecode.
TL;DR: In a large study of 457 bugs in 46 open source C programs, it is found that dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults.
Abstract: Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors).
TL;DR: Witcher as mentioned in this paper detects both correctness and performance bugs in NVM-based persistent key-value stores and underlying NVM libraries, without test space explosion and without manual annotations or crash consistency checkers.
Abstract: The advent of non-volatile main memory (NVM) enables the development of crash-consistent software without paying storage stack overhead. However, building a correct crash-consistent program remains very challenging in the presence of a volatile cache. This paper presents Witcher, a systematic crash consistency testing framework, which detects both correctness and performance bugs in NVM-based persistent key-value stores and underlying NVM libraries, without test space explosion and without manual annotations or crash consistency checkers. To detect correctness bugs, Witcher automatically infers likely correctness conditions by analyzing data and control dependencies between NVM accesses. Then Witcher validates if any violation of them is a true crash consistency bug by checking output equivalence between executions with and without a crash. Moreover, Witcher detects performance bugs by analyzing the execution traces. Evaluation with 20 NVM key-value stores based on Intel's PMDK library shows that Witcher discovers 47 (36 new) correctness consistency bugs and 158 (113 new) performance bugs in both applications and PMDK.
TL;DR: Hippocrates as mentioned in this paper automatically performs the complex reasoning behind durability bug fixes, relieving developers of time-consuming bug fixes and is guaranteed to be safe, as they are guaranteed to not introduce new bugs (do no harm).
Abstract: Persistent memory (PM) technologies aim to revolutionize storage systems, providing persistent storage at near-DRAM speeds. Alas, programming PM systems is error-prone, as the misuse or omission of the durability mechanisms (i.e., cache flushes and memory fences) can lead to durability bugs (i.e., unflushed updates in CPU caches that violate crash consistency). PM-specific testing and debugging tools can help developers find these bugs, however even with such tools, fixing durability bugs can be challenging. To determine the reason behind this difficulty, we first study durability bugs and find that although the solution to a durability bug seems simple, the actual reasoning behind the fix can be complicated and time-consuming. Overall, the severity of these bugs coupled with the difficultly of developing fixes for them motivates us to consider automated approaches to fixing durability bugs. We introduce Hippocrates, a system that automatically fixes durability bugs in PM systems. Hippocrates automatically performs the complex reasoning behind durability bug fixes, relieving developers of time-consuming bug fixes. Hippocrates’s fixes are guaranteed to be safe, as they are guaranteed to not introduce new bugs (“do no harm”). We use Hippocrates to automatically fix 23 durability bugs in realworld and research systems. We show that Hippocrates produces fixes that are functionally equivalent to developer fixes. We then show that solely using Hippocrates’s fixes, we can create a PM port of Redis which has performance rivaling and exceeding the performance of a manually-developed PM-port of Redis.
TL;DR: Zilch as discussed by the authors is a framework that accelerates and simplifies the deployment of VC and ZKPK for any application transparently, i.e., without the need of trusted setup.
Abstract: As cloud computing becomes more popular, research has focused on usable solutions to the problem of verifiable computation (VC), where a computationally weak device (Verifier) outsources a program execution to a powerful server (Prover) and receives guarantees that the execution was performed faithfully. A Prover can further demonstrate knowledge of a secret input that causes the Verifier’s program to satisfy certain assertions, without ever revealing which input was used. State-of-the-art Zero-Knowledge Proofs of Knowledge (ZKPK) methods encode a computation using arithmetic circuits and preserve the privacy of Prover’s inputs while attesting the integrity of program execution. Nevertheless, developing, debugging, and optimizing programs as circuits remains a daunting task, as most users are unfamiliar with this programming paradigm. In this work, we present Zilch, a framework that accelerates and simplifies the deployment of VC and ZKPK for any application transparently , i.e., without the need of trusted setup. Zilch uses traditional instruction sequences rather than static arithmetic circuits that would need to be regenerated for each different computation. Towards that end, we have implemented Z MIPS: a MIPS-like processor model that allows verifying each instruction independently and compose a proof for the execution of the target application. To foster usability, Zilch incorporates a novel cross-compiler from an object-oriented Java-like language tailored to ZKPK and optimized our Z MIPS model, as well as a powerful API that enables integration of ZKPK within existing C/C++ programs. In our experiments, we demonstrate the flexibility of Zilch using two real-life applications, and evaluate Prover and Verifier performance on a variety of benchmarks.
TL;DR: In this paper, the authors proposed a new Virtual Commissioning (VC) architecture for IEC 61499-based control applications, which can improve the flexibility, collaboration and efficiency of control software debugging.
Abstract: The need of manufacturing industries to adapt to the demand from customers and the volatile market drives the need for digitalization to realize the so-called smart factory. Smart factory, Compared to traditional factories, smart factory needs to possess high flexibility and stability and often has to employ digital twin. Compared to IEC 61131-3 standard used in traditional industrial production, IEC 61499 standard can bring more advantages to meet the realization of Smart Factory, such as flexibility, distribution and event-driven execution. In this paper, we propose a new Virtual Commissioning (VC) architecture for IEC 61499-based control applications. We build an online collaborative VC platform with the help of cloud computing and containerization technologies. Furthermore, we discuss and propose the utilization of soft-PLC containers from different vendors to meet users’ commissioning requirements for different hardware configurations. Compared with the traditional offline single-person debugging method, our proposed architecture can improve the flexibility, collaboration and efficiency of control software debugging in a more interactive and collaborative style via the online platform we implemented.
TL;DR: DirectDebug as mentioned in this paper is a direct diagnosis approach to the automated testing and debugging of variability models, which helps software engineers by supporting an automated identification of faulty constraints responsible for an unintended behavior of a variability model.
Abstract: Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts. Such models can be translated to a logical representation and thus allow different operations for quality assurance and other types of model property analysis. Specifically, complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact. In this paper, we introduce DirectDebug which is a direct diagnosis approach to the automated testing and debugging of variability models. The algorithm helps software engineers by supporting an automated identification of faulty constraints responsible for an unintended behavior of a variability model. This approach can significantly decrease development and maintenance efforts for such models.
TL;DR: In this article, a modular approach for debugging two router configurations that are intended to be behaviorally equivalent is presented, which handles all router components that affect routing and forwarding, including configuration for BGP, OSPF, static routes, route maps and ACLs.
Abstract: We present a new approach for debugging two router configurations that are intended to be behaviorally equivalent. Existing router verification techniques cannot identify all differences or localize those differences to relevant configuration lines. Our approach addresses these limitations through a _modular_ analysis, which separately analyzes pairs of corresponding configuration components. It handles all router components that affect routing and forwarding, including configuration for BGP, OSPF, static routes, route maps and ACLs. Further, for many configuration components our modular approach enables simple _structural equivalence_ checks to be used without additional loss of precision versus modular semantic checks, aiding both efficiency and error localization. We implemented this approach in the tool Campion and applied it to debugging pairs of backup routers from different manufacturers and validating replacement of critical routers. Campion analyzed 30 proposed router replacements in a production cloud network and proactively detected four configuration bugs, including a route reflector bug that could have caused a severe outage. Campion also found multiple differences between backup routers from different vendors in a university network. These were undetected for three years, and depended on subtle semantic differences that the operators said they were "highly unlikely" to detect by "just eyeballing the configs."
TL;DR: In this paper, a model-driven low-code approach for the configuration and reconfiguration of Digital Twins using language plugins is proposed, which uses modeldriven software engineering and software language engineering methods to derive a configurable digital twin implementation.
Abstract: Digital Twins in smart manufacturing must be highly adaptable for different challenges, environments, and system states. In practice, there is a need for enabling the configuration of Digital Twins by domain experts. Low-code approaches seem to be a meaningful solution for configuration purposes but often lack extension options. We propose a model-driven low-code approach for the configuration and reconfiguration of Digital Twins using language plugins. This approach uses model-driven software engineering and software language engineering methods to derive a configurable digital twin implementation. Moreover, we discuss some remaining challenges such as interoperability, language modularity, evolution, integration of assistive services, collaborative development, and web-based debugging.
TL;DR: In this article, the authors propose Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration, which abstracts impacts of environment configurations (e.g., obstacles) to the drones in a swarm via behavior causal analysis.
Abstract: Swarm robotics collectively solve problems that are challenging for individual robots, from environmental monitoring to entertainment. The algorithms enabling swarms allow individual robots of the swarm to plan, share, and coordinate their trajectories and tasks to achieve a common goal. Such algorithms rely on a large number of configurable parameters that can be tailored to target particular scenarios. This large configuration space, the complexity of the algorithms, and the dependencies with the robots’ setup and performance make debugging and fixing swarms configuration bugs extremely challenging. This paper proposes Swarmbug, a swarm debugging system that automatically diagnoses and fixes buggy behaviors caused by misconfiguration. The essence of Swarmbug is the novel concept called the degree of causal contribution (Dcc), which abstracts impacts of environment configurations (e.g., obstacles) to the drones in a swarm via behavior causal analysis. Swarmbug automatically generates, validates, and ranks fixes for configuration bugs. We evaluate Swarmbug on four diverse swarm algorithms. Swarmbug successfully fixes four configuration bugs in the evaluated algorithms, showing that it is generic and effective. We also conduct a real-world experiment with physical drones to show the Swarmbug’s fix is effective in the real-world.
TL;DR: In this paper, the authors propose a semantic reflection mechanism by a direct mapping from program states to RDF knowledge graphs, which enables the programmer to use domain knowledge formalized as an ontology directly in the program's control flow.
Abstract: We propose a novel integration of programming languages with semantic technologies. We create a semantic reflection mechanism by a direct mapping from program states to RDF knowledge graphs. This mechanism enables several promising novel applications including the use of semantic technology, including reasoning, for debugging and validating the sanity of program states, and integration with external knowledge graphs. Additionally, by making the knowledge graph accessible from the program, method implementations can refer to state semantics rather than objects, establishing a deep integration between programs and semantics. This allows the programmer to use domain knowledge formalized as, e.g., an ontology directly in the program’s control flow. We formalize this integration by defining a core object based programming language that incorporates these features. A prototypical interpreter is available for download.