TL;DR: S2E's use in developing practical tools for comprehensive performance profiling, reverse engineering of proprietary software, and bug finding for both kernel-mode and user-mode binaries is demonstrated.
Abstract: This paper presents S2E, a platform for analyzing the properties and behavior of software systems. We demonstrate S2E's use in developing practical tools for comprehensive performance profiling, reverse engineering of proprietary software, and bug finding for both kernel-mode and user-mode binaries. Building these tools on top of S2E took less than 770 LOC and 40 person-hours each.S2E's novelty consists of its ability to scale to large real systems, such as a full Windows stack. S2E is based on two new ideas: selective symbolic execution, a way to automatically minimize the amount of code that has to be executed symbolically given a target analysis, and relaxed execution consistency models, a way to make principled performance/accuracy trade-offs in complex analyses. These techniques give S2E three key abilities: to simultaneously analyze entire families of execution paths, instead of just one execution at a time; to perform the analyses in-vivo within a real software stack--user programs, libraries, kernel, drivers, etc.--instead of using abstract models of these layers; and to operate directly on binaries, thus being able to analyze even proprietary software.Conceptually, S2E is an automated path explorer with modular path analyzers: the explorer drives the target system down all execution paths of interest, while analyzers check properties of each such path (e.g., to look for bugs) or simply collect information (e.g., count page faults). Desired paths can be specified in multiple ways, and S2E users can either combine existing analyzers to build a custom analysis tool, or write new analyzers using the S2E API.
TL;DR: The overall goal of this research is to investigate how developers use and benefit from automated debugging tools through a set of human studies by providing initial evidence that several assumptions made by automated debugging techniques do not hold in practice.
Abstract: Debugging is notoriously difficult and extremely time consuming. Researchers have therefore invested a considerable amount of effort in developing automated techniques and tools for supporting various debugging tasks. Although potentially useful, most of these techniques have yet to demonstrate their practical effectiveness. One common limitation of existing approaches, for instance, is their reliance on a set of strong assumptions on how developers behave when debugging (e.g., the fact that examining a faulty statement in isolation is enough for a developer to understand and fix the corresponding bug). In more general terms, most existing techniques just focus on selecting subsets of potentially faulty statements and ranking them according to some criterion. By doing so, they ignore the fact that understanding the root cause of a failure typically involves complex activities, such as navigating program dependencies and rerunning the program with different inputs. The overall goal of this research is to investigate how developers use and benefit from automated debugging tools through a set of human studies. As a first step in this direction, we perform a preliminary study on a set of developers by providing them with an automated debugging tool and two tasks to be performed with and without the tool. Our results provide initial evidence that several assumptions made by automated debugging techniques do not hold in practice. Through an analysis of the results, we also provide insights on potential directions for future work in the area of automated debugging.
TL;DR: This paper presents Anteater, a tool for checking invariants in the data plane, which translates high-level network invariants into boolean satisfiability problems (SAT), checks them against network state using a SAT solver, and reports counterexamples if violations have been found.
Abstract: Diagnosing problems in networks is a time-consuming and error-prone process. Existing tools to assist operators primarily focus on analyzing control plane configuration. Configuration analysis is limited in that it cannot find bugs in router software, and is harder to generalize across protocols since it must model complex configuration languages and dynamic protocol behavior.This paper studies an alternate approach: diagnosing problems through static analysis of the data plane. This approach can catch bugs that are invisible at the level of configuration files, and simplifies unified analysis of a network across many protocols and implementations. We present Anteater, a tool for checking invariants in the data plane. Anteater translates high-level network invariants into boolean satisfiability problems (SAT), checks them against network state using a SAT solver, and reports counterexamples if violations have been found. Applied to a large university network, Anteater revealed 23 bugs, including forwarding loops and stale ACL rules, with only five false positives. Nine of these faults are being fixed by campus network operators.
TL;DR: The result shows that the proposed kernel-base behavior analysis for android malware inspection can effectively detect malicious behaviors of the unknown applications.
Abstract: The most major threat of Android users is malware infection via Android application markets. In case of the Android Market, as security inspections are not applied for many users have uploaded applications. Therefore, malwares, e.g., Geimini and Droid Dream will attempt to leak personal information, getting root privilege, and abuse functions of the smart phone. An audit framework called log cat is implemented on the Dalvik virtual machine to monitor the application behavior. However, only the limited events are dumped, because an application developers use the log cat for debugging. The behavior monitoring framework that can audit all activities of applications is important for security inspections on the market places. In this paper, we propose a kernel-base behavior analysis for android malware inspection. The system consists of a log collector in the Linux layer and a log analysis application. The log collector records all system calls and filters events with the target application. The log analyzer matches activities with signatures described by regular expressions to detect a malicious activity. Here, signatures of information leakage are automatically generated using the smart phone IDs, e.g., phone number, SIM serial number, and Gmail accounts. We implement a prototype system and evaluate 230 applications in total. The result shows that our system can effectively detect malicious behaviors of the unknown applications.
TL;DR: CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.
Abstract: As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.
The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.
Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.
Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computing
Addresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchy
Includes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.
Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material
Table of Contents
1. First Programs and How to Think in CUDA
2. CUDA for Machine Learning and Optimization
3. The CUDA Tool Suite: Profiling a PCA/NLPCA Functor
4. The CUDA Execution Model
5. CUDA Memory
6. Efficiently Using GPU Memory
7. Techniques to Increase Parallelism
8. CUDA for All GPU and CPU Applications
9. Mixing CUDA and Rendering
10. CUDA in a Cloud and Cluster Environments
11. CUDA for Real Problems: Monte Carlo, Modeling, and More
12. Application Focus on Live Streaming Video
TL;DR: The first undergraduate text to directly address compiling and running parallel programs on the new multi-core and cluster architecture, An Introduction to Parallel Programming explains how to design, debug, and evaluate the performance of distributed and shared-memory programs.
Abstract: Author Peter Pacheco uses a tutorial approach to show students how to develop effective parallel programs with MPI, Pthreads, and OpenMP. The first undergraduate text to directly address compiling and running parallel programs on the new multi-core and cluster architecture, An Introduction to Parallel Programming explains how to design, debug, and evaluate the performance of distributed and shared-memory programs. User-friendly exercises teach studentshow to compile, run and modify example programs. Key features:Takes a tutorial approach, starting with small programming examples and building progressively to more challenging examplesFocuses on designing, debugging and evaluating the performance of distributed and shared-memory programsExplains how to develop parallel programs using MPI, Pthreads, and OpenMP programming models
TL;DR: This paper proposes two new directed symbolic execution strategies that aim to solve the problem of automatically finding program executions that reach a particular target line, and proposes a hybrid strategy, Mix-CCBSE, which alternates CCBSE with another (forward) search strategy.
Abstract: In this paper, we study the problem of automatically finding program executions that reach a particular target line. This problem arises in many debugging scenarios; for example, a developer may want to confirm that a bug reported by a static analysis tool on a particular line is a true positive. We propose two new directed symbolic execution strategies that aim to solve this problem: shortest-distance symbolic execution (SDSE) uses a distance metric in an interprocedural control flow graph to guide symbolic execution toward a particular target; and call-chain-backward symbolic execution (CCBSE) iteratively runs forward symbolic execution, starting in the function containing the target line, and then jumping backward up the call chain until it finds a feasible path from the start of the program. We also propose a hybrid strategy, Mix-CCBSE, which alternates CCBSE with another (forward) search strategy. We compare these three with several existing strategies from the literature on a suite of six GNU Coreutils programs. We find that SDSE performs extremely well in many cases but may fail badly. CCBSE also performs quite well, but imposes additional overhead that sometimes makes it slower than SDSE. Considering all our benchmarks together, Mix-CCBSE performed best on average, combining to good effect the features of its constituent components.
TL;DR: Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.
Abstract: Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This paper presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.
TL;DR: It is formally prove that Synoptic always produces a model that satisfies exactly the temporal invariants mined from the log, and it is argued that it does so efficiently.
Abstract: Computer systems are often difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process and documentation is often incomplete and out of sync with the implementation.This paper presents Synoptic, a tool that helps developers by inferring a concise and accurate system model. Unlike most related work, Synoptic does not require developer-written scenarios, specifications, negative execution examples, or other complex user input. Synoptic processes the logs most systems already produce and requires developers only to specify a set of regular expressions for parsing the logs.Synoptic has two unique features. First, the model it produces satisfies three kinds of temporal invariants mined from the logs, improving accuracy over related approaches. Second, Synoptic uses refinement and coarsening to explore the space of models. This improves model efficiency and precision, compared to using just one approach.In this paper, we formally prove that Synoptic always produces a model that satisfies exactly the temporal invariants mined from the log, and we argue that it does so efficiently. We empirically evaluate Synoptic through two user experience studies, one with a developer of a large, real-world system and another with 45 students in a distributed systems course. Developers used Synoptic-generated models to verify known bugs, diagnose new bugs, and increase their confidence in the correctness of their systems. None of the developers in our evaluation had a background in formal methods but were able to easily use Synoptic and detect implementation bugs in as little as a few minutes.
TL;DR: This paper proposes a roadmap towards developing a systematic diagnosing framework for debugging ebugs on smartphones and presents a taxonomy of the kinds of ebugs based on mining over 39K posts from 4 online mobile user forum and mobile OS bug repositories.
Abstract: This paper argues that a new class of bugs faced by millions of smartphones, energy bugs or ebugs, have become increasingly prominent that already they have led to significant user frustrations. We take a first look at this emerging important technical challenge faced by the smartphones, ebugs, broadly defined as an error in the system (application, OS, hardware, firmware, external conditions or combination) that causes an unexpected amount of high energy consumption by the system as a whole. We first present a taxonomy of the kinds of ebugs based on mining over 39K posts (1.2M before filtering) from 4 online mobile user forum and mobile OS bug repositories. The taxonomy shows the highly diverse nature of smartphone ebugs. We then propose a roadmap towards developing a systematic diagnosing framework for debugging ebugs on smartphones.
TL;DR: A tool, LogEnhancer, is described that automatically "enhances" existing logging code to aid in future post-failure debugging and can dramatically reduce the set of potential root failure causes that must be considered during diagnosis while imposing negligible overheads.
Abstract: Diagnosing software failures in the field is notoriously difficult, in part due to the fundamental complexity of trouble-shooting any complex software system, but further exacerbated by the paucity of information that is typically available in the production setting. Indeed, for reasons of both overhead and privacy, it is common that only the run-time log generated by a system (e.g., syslog) can be shared with the developers. Unfortunately, the ad-hoc nature of such reports are frequently insufficient for detailed failure diagnosis. This paper seeks to improve this situation within the rubric of existing practice. We describe a tool, LogEnhancer that automatically "enhances" existing logging code to aid in future post-failure debugging. We evaluate LogEnhancer on eight large, real-world applications and demonstrate that it can dramatically reduce the set of potential root failure causes that must be considered during diagnosis while imposing negligible overheads.
TL;DR: YCSB++ is described, a set of extensions to the Yahoo! Cloud Serving Benchmark that includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization, and abstract APIs for explicit incorporation of advanced features in benchmark tests.
Abstract: Inspired by Google's BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.
TL;DR: ConSeq's backwards approach, (3)!(2)!(1), provides advantages in bug-detection coverage and accuracy but is challenging to carry out, because phases (2) and (3) usually are short and occur within one thread.
Abstract: Concurrency bugs are caused by non-deterministic interleavings between shared memory accesses. Their effects propagate through data and control dependences until they cause software to crash, hang, produce incorrect output, etc. The lifecycle of a bug thus consists of three phases: (1) triggering, (2) propagation, and (3) failure.Traditional techniques for detecting concurrency bugs mostly focus on phase (1)--i.e., on finding certain structural patterns of interleavings that are common triggers of concurrency bugs, such as data races. This paper explores a consequence-oriented approach to improving the accuracy and coverage of state-space search and bug detection. The proposed approach first statically identifies potential failure sites in a program binary (i.e., it first considers a phase (3) issue). It then uses static slicing to identify critical read instructions that are highly likely to affect potential failure sites through control and data dependences (phase (2)). Finally, it monitors a single (correct) execution of a concurrent program and identifies suspicious interleavings that could cause an incorrect state to arise at a critical read and then lead to a software failure (phase (1)).ConSeq's backwards approach, (3)!(2)!(1), provides advantages in bug-detection coverage and accuracy but is challenging to carry out. ConSeq makes it feasible by exploiting the empirical observationthat phases (2) and (3) usually are short and occur within one thread. Our evaluation on large, real-world C/C++ applications shows that ConSeq detects more bugs than traditional approaches and has a much lower false-positive rate.
TL;DR: An automated approach for generating likely bug fixes using behavioral specifications to replace a faulty statement that has deterministic behavior with one that has nondeterministic behavior, and to use the specification constraints to prune the ensuing nondeterminism and repair the faulty statement.
Abstract: Removing bugs in programs - even when location of faulty statements is known - is tedious and error-prone, particularly because of the increased likelihood of introducing new bugs as a result of fixing known bugs. We present an automated approach for generating likely bug fixes using behavioral specifications. Our key insight is to replace a faulty statement that has deterministic behavior with one that has nondeterministic behavior, and to use the specification constraints to prune the ensuing nondeterminism and repair the faulty statement. As an enabling technology, we use the SAT-based Alloy tool-set to describe specification constraints as well as for solving them. Initial experiments show the effectiveness of our approach in repairing programs that manipulate structurally complex data. We believe specification-based automated debugging using SAT holds much promise.
TL;DR: An algorithm for generating debugging aid information called witnesses, which are concrete thread schedules that can deterministically trigger the data races, is proposed and precisely encodes the sequential consistency semantics using a scalable predictive model to ensure that the reported witness is always feasible.
Abstract: Data race is one of the most dangerous errors in multithreaded programming, and despite intensive studies, it remains a notorious cause of failures in concurrent systems. Detecting data races is already a hard problem, and yet it is even harder for a programmer to decide whether or how a reported data race can appear in the actual program execution. In this paper we propose an algorithm for generating debugging aid information called witnesses, which are concrete thread schedules that can deterministically trigger the data races. More specifically, given a concrete execution trace, e.g. non-erroneous one which may have triggered a warning in Eraser-style data race detectors, we use a symbolic analysis based on SMT solvers to search for a data race witness among alternative interleavings of events of that trace. Our symbolic analysis precisely encodes the sequential consistency semantics using a scalable predictive model to ensure that the reported witness is always feasible.
TL;DR: In this paper, the authors describe a debug system that generates hardware elements from normally non-synthesizable code elements for placement on an FPGA device, called a Behavior Processor.
Abstract: The debug system described in this patent specification provides a system that generates hardware elements from normally non-synthesizable code elements for placement on an FPGA device. This particular FPGA device is called a Behavior Processor. This Behavior Processor executes in hardware those code constructs that were previously executed in software. When some condition is satisfied (e.g., If . . . then . . . else loop) which requires some intervention by the workstation or the software model, the Behavior Processor works with an Xtrigger device to send a callback signal to the workstation for immediate response.
TL;DR: A novel debugging method for imperative software, featuring both automatic error localization and correction, that can handle all sorts of incorrect expressions, not only under a single-fault assumption but also for multiple faults.
Abstract: In this paper, we present a novel debugging method for imperative software, featuring both automatic error localization and correction. The input of our method is an incorrect program and a corresponding specification, which can be given in form of assertions or as a reference implementation. We use symbolic execution for program analysis. This allows for a wide range of different trade-offs between resource requirements and accuracy of results. Our error localization method rests upon model-based diagnosis and SMT-solving. Error correction is done using a template-based approach which ensures that the computed repairs are readable. Our method can handle all sorts of incorrect expressions, not only under a single-fault assumption but also for multiple faults. Finally, we present experimental results, where an implementation for C programs is used to debug mutants of the TCAS case study of the Siemens suite.
TL;DR: This work presents an integrated method for program proving, testing, and debugging using the concept of metamorphic relations that extrapolates from the correctness of a program for tested inputs to the Correctness of the program for related untested inputs.
Abstract: We present an integrated method for program proving, testing, and debugging. Using the concept of metamorphic relations, we select necessary properties for target programs. For programs where global symbolic evaluation can be conducted and the constraint expressions involved can be solved, we can either prove that these necessary conditions for program correctness are satisfied or identify all inputs that violate the conditions. For other programs, our method can be converted into a symbolic-testing approach. Our method extrapolates from the correctness of a program for tested inputs to the correctness of the program for related untested inputs. The method supports automatic debugging through the identification of constraint expressions that reveal failures.
TL;DR: The design choices underlying this general tool, which helps researchers capture usage context in their deployments in an extendible way, are described and the tool's overheads are evaluated in terms of phone resources and data.
Abstract: By deploying several research applications, we found that capturing usage context (e.g., CPU and memory) is highly valuable for debugging and interpreting results, even if that context information appears unrelated to the application. We have developed a general tool called SystemSens to help researchers capture usage context in their deployments in an extendible way. This paper describes and motivates the design choices underlying our tool and evaluates its overheads in terms of phone resources and data.
TL;DR: This paper introduces a specification-based technique called LeakChaser, a three-tier approach that uses varying levels of abstraction to assist programmers with different skill levels and code familiarity to find leaks, and implemented it in Jikes RVM and used it to help diagnose several real-world leaks.
Abstract: In large programs written in managed languages such as Java and C#, holding unnecessary references often results in memory leaks and bloat, degrading significantly their run-time performance and scalability. Despite the existence of many leak detectors for such languages, these detectors often target low-level objects; as a result, their reports contain many false warnings and lack sufficient semantic information to help diagnose problems. This paper introduces a specification-based technique called LeakChaser that can not only capture precisely the unnecessary references leading to leaks, but also explain, with high-level semantics, why these references become unnecessary.At the heart of LeakChaser is a three-tier approach that uses varying levels of abstraction to assist programmers with different skill levels and code familiarity to find leaks. At the highest tier of the approach, the programmer only needs to specify the boundaries of coarse-grained activities, referred to as transactions. The tool automatically infers liveness properties of these transactions, by monitoring the execution, in order to find unnecessary references. Diagnosis at this tier can be performed by any programmer after inspecting the APIs and basic modules of a program, without understanding of the detailed implementation of these APIs. At the middle tier, the programmer can introduce application-specific semantic information by specifying properties for the transactions. At the lowest tier of the approach is a liveness checker that does not rely on higher-level semantic information, but rather allows a programmer to assert lifetime relationships for pairs of objects. This task could only be performed by skillful programmers who have a clear understanding of data structures and algorithms in the program.We have implemented LeakChaser in Jikes RVM and used it to help us diagnose several real-world leaks. The implementation incurs a reasonable overhead for debugging and tuning. Our case studies indicate that the implementation is powerful in guiding programmers with varying code familiarity to find the root causes of several memory leaks---even someone who had not studied a leaking program can quickly find the cause after using LeakChaser's iterative process that infers and checks properties with different levels of semantic information.
TL;DR: In this article, a runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems, which enables users to leverage the computational power of parallel processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs.
Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
TL;DR: In this article, the authors propose two query selection strategies: a simple split-in-half strategy and an entropy-based strategy, which allows knowledge about typical user errors to be exploited to minimize the number of queries.
Abstract: Effective debugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on everyday users to create and maintain knowledge bases, such as the Semantic Web. In such systems ontologies capture formalized vocabularies of terms shared by its users. However in many cases users have different local views of the domain, i.e. of the context in which a given term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. To identify the best query we propose two query selection strategies: a simple "split-in-half" strategy and an entropy-based strategy. The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our evaluation showed that the entropy-based method significantly reduces the number of required queries compared to the "split-in-half" approach. We experimented with different probability distributions of user errors and different qualities of the a-priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection even in cases where all fault probabilities are equal, i.e. where no information about typical user errors is available.
TL;DR: QVM efficiently detects defects by continuously monitoring the execution of the application in a production setting and enables the efficient checking of violations of user-specified correctness properties, that is, typestate safety properties, Java assertions, and heap properties pertaining to ownership.
Abstract: Coping with software defects that occur in the post-deployment stage is a challenging problem: bugs may occur only when the system uses a specific configuration and only under certain usage scenarios. Nevertheless, halting production systems until the bug is tracked and fixed is often impossible. Thus, developers have to try to reproduce the bug in laboratory conditions. Often, the reproduction of the bug takes most of the debugging effort.In this paper we suggest an approach to address this problem by using a specialized runtime environment called Quality Virtual Machine (QVM). QVM efficiently detects defects by continuously monitoring the execution of the application in a production setting. QVM enables the efficient checking of violations of user-specified correctness properties, that is, typestate safety properties, Java assertions, and heap properties pertaining to ownership. QVM is markedly different from existing techniques for continuous monitoring by using a novel overhead manager which enforces a user-specified overhead budget for quality checks. Existing tools for error detection in the field usually disrupt the operation of the deployed system. QVM, on the other hand, provides a balanced trade-off between the cost of the monitoring process and the maintenance of sufficient accuracy for detecting defects. Specifically, the overhead cost of using QVM instead of a standard JVM, is low enough to be acceptable in production environments.We implemented QVM on top of IBM’s J9 Java Virtual Machine and used it to detect and fix various errors in real-world applications.
TL;DR: A more accurate restoration capacity metric is proposed, based on simulation information, and a novel algorithm that overcomes some key shortcomings of previous solutions is presented, showing that the technique provides up to 34% better state restoration compared to all previous techniques while showing a much better trend with increasing trace buffer size.
Abstract: Post-silicon validation has become a crucial part of modern integrated circuit design to capture and eliminate functional bugs that escape pre-silicon verification. The most critical roadblock in post-silicon validation is the limited observability of internal signals of a design, since this aspect hinders the ability to diagnose detected bugs. A solution to address this issue leverage trace buffers: these are register buffers embedded into the design with the goal of recording the value of a small number of state elements, over a time interval, triggered by a user-specified event. Due to the trace buffer's area overhead, only a very small fraction of signals can be traced. Thus, the selection of which signals to trace is of paramount importance in post-silicon debugging and diagnosis. Ideally, we would like to select signals enabling the maximum amount of reconstruction of internal signal values. Several signal selection algorithms for post-silicon debug have been proposed in the literature: they rely on a probability-based state-restoration capacity metric coupled with a greedy algorithm. In this work we propose a more accurate restoration capacity metric, based on simulation information, and present a novel algorithm that overcomes some key shortcomings of previous solutions. We show that our technique provides up to 34% better state restoration compared to all previous techniques while showing a much better trend with increasing trace buffer size.
TL;DR: This paper first analyzes some real data to observe the trends of FRF, and considers FRF to be a time-variable function, and further study how to integrate time- variable FRF into software reliability growth modeling.
TL;DR: This paper proposes RACEZ, a novel race detection mechanism which uses a sampled memory trace collected by the hardware performance monitoring unit rather than invasive instrumentation which catches a set of known bugs with reasonable probability and introduces only a modest overhead.
Abstract: Concurrency bugs, particularly data races, are notoriously difficult to debug and are a significant source of unreliability in multithreaded applications. Many tools to catch data races rely on program instrumentation to obtain memory instruction traces. Unfortunately, this instrumentation introduces significant runtime overhead, is extremely invasive, or has a limited domain of applicability making these tools unsuitable for many production systems. Consequently, these tools are typically used during application testing where many data races go undetected. This paper proposes RACEZ , a novel race detection mechanism which uses a sampled memory trace collected by the hardware performance monitoring unit rather than invasive instrumentation. The approach introduces only a modest overhead making it usable in production environments. We validate RACEZ using two open source server applications and the PARSEC benchmarks. Our experiments show that RACEZ catches a set of known bugs with reasonable probability while introducing only 2.8% runtime slow down on average.
TL;DR: A Concurrency bug detector that automatically identifies when an execution of a program triggers a concurrency bug, and was able to find several concurrency bugs in a stable version of the application, including subtle violations of application semantics, latent bugs, and incorrect error replies.
Abstract: Parallel software is increasingly necessary to take advantage of multi-core architectures, but it is also prone to concurrency bugs which are particularly hard to avoid, find, and fix, since their occurrence depends on specific thread interleavings. In this paper we propose a concurrency bug detector that automatically identifies when an execution of a program triggers a concurrency bug. Unlike previous concurrency bug detectors, we are able to find two particularly hard classes of bugs. The first are bugs that manifest themselves by subtle violation of application semantics, such as returning an incorrect result. The second are latent bugs, which silently corrupt internal data structures, and are especially hard to detect because when these bugs are triggered they do not become immediately visible. Pike detects these concurrency bugs by checking both the output and the internal state of the application for linearizability at the level of user requests. This paper presents this technique for finding concurrency bugs, its application in the context of a testing tool that systematically searches for such problems, and our experience in applying our approach to MySQL, a large-scale complex multi-threaded application. We were able to find several concurrency bugs in a stable version of the application, including subtle violations of application semantics, latent bugs, and incorrect error replies.
TL;DR: Cyberlock as mentioned in this paper provides a technique to analyze malware programs on a network in a secure manner, which allows multiple users to access or monitor the analysis, and is used for debugging and testing potential virus, trojans, and other malware programs.
Abstract: The present invention relates to a technique for debugging and testing potential virus, trojans, and other malware programs. The present invention, named Cyberlock™ provides a technique to analyze malware programs on a network in a secure manner, which allows multiple users to access or monitor the analysis. In the present invention, a virtual machine (VM) may be run on a network, emulating the operation of a Windows, LINUX, or Apple operating system (or other O/S), and the malware or suspected malware may be executed on that virtual machine. The virtual machine is isolated on the network, but accessible to one or more users, in such a manner than the malware or suspected malware may be analyzed.
TL;DR: This paper implements both guest-wide and system-wide profiling for a VMM based on the x86 hardware virtualization extensions and system -wide profilingFor a V MM based on binary translation, demonstrating that these profilers provide good accuracy with only limited overhead.
Abstract: Profilers based on hardware performance counters are indispensable for performance debugging of complex software systems. All modern processors feature hardware performance counters, but current virtual machine monitors (VMMs) do not properly expose them to the guest operating systems. Existing profiling tools require privileged access to the VMM to profile the guest and are only available for VMMs based on paravirtualization. Diagnosing performance problems of software running in a virtualized environment is therefore quite difficult.This paper describes how to extend VMMs to support performance profiling. We present two types of profiling in a virtualized environment: guest-wide profiling and system-wide profiling. Guest-wide profiling shows the runtime behavior of a guest. The profiler runs in the guest and does not require privileged access to the VMM. System-wide profiling exposes the runtime behavior of both the VMM and any number of guests. It requires profilers both in the VMM and in those guests.Not every VMM has the right architecture to support both types of profiling. We determine the requirements for each of them, and explore the possibilities for their implementation in virtual machines using hardware assistance, paravirtualization, and binary translation.We implement both guest-wide and system-wide profiling for a VMM based on the x86 hardware virtualization extensions and system-wide profiling for a VMM based on binary translation. We demonstrate that these profilers provide good accuracy with only limited overhead.
TL;DR: The results of the study suggest that the overall number of design pattern instances is not correlated to defect frequency and debugging effectiveness, however, specific design patterns appear to have a significant impact on the number of reported bugs and debugging rate.
Abstract: In this paper, we investigate the correlation between design pattern application and software defects. In order to achieve this goal we conducted an empirical study on java open source games. More specifically, we examined several successful open source games, identified the number of defects, the debugging rate and performed design pattern related measurements. The results of the study suggest that the overall number of design pattern instances is not correlated to defect frequency and debugging effectiveness. However, specific design patterns appear to have a significant impact on the number of reported bugs and debugging rate.