TL;DR: It was found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online.
Abstract: Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants’ feedback.
Abstract: Prototyping is notoriously difficult to do with machine learning (ML), but recent advances in large language models may lower the barriers to people prototyping with ML, through the use of natural language prompts. This case study reports on the real-world experiences of industry professionals (e.g. designers, program managers, front-end developers) prototyping new ML-powered feature ideas via prompt-based prototyping. Through interviews with eleven practitioners during a three-week sprint and a workshop, we find that prompt-based prototyping reduced barriers of access by substantially broadening who can prototype with ML, sped up the prototyping process, and grounded communication between collaborators. Yet, it also introduced new challenges, such as the need to reverse-engineer prompt designs, source example data, and debug and evaluate prompt effectiveness. Taken together, this case study provides important implications that lay the groundwork toward a new future of prototyping with ML.
TL;DR: plotgardener as mentioned in this paper is a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, esthetics and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment.
Abstract: The R programming language is one of the most widely used programming languages for transforming raw genomic datasets into meaningful biological conclusions through analysis and visualization, which has been largely facilitated by infrastructure and tools developed by the Bioconductor project. However, existing plotting packages rely on relative positioning and sizing of plots, which is often sufficient for exploratory analysis but is poorly suited for the creation of publication-quality multi-panel images inherent to scientific manuscript preparation.We present plotgardener, a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, esthetics and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment. Plotgardener also allows precise placement and sizing of ggplot2 plots, making it an invaluable tool for R users and data scientists from virtually any discipline.Package: https://bioconductor.org/packages/plotgardener, Code: https://github.com/PhanstielLab/plotgardener, Documentation: https://phanstiellab.github.io/plotgardener/.Supplementary data are available at Bioinformatics online.
TL;DR: This paper provides a historical overview of HIL simulations through different engineering challenges, i.e., within automotive, power electronics systems, and different industrial drives, within MATLAB Simulink Real-Time toolboxes and Speedgoat hardware systems.
Abstract: The design of modern industrial products is further improved through the hardware-in-the-loop (HIL) simulation. Realistic simulation is enabled by the closed loop between the hardware under test (HUT) and real-time simulation. Such a system involves a field programmable gate array (FPGA) and digital signal processor (DSP). An HIL model can bypass serious damage to the real object, reduce debugging cost, and, finally, reduce the comprehensive effort during the testing. This paper provides a historical overview of HIL simulations through different engineering challenges, i.e., within automotive, power electronics systems, and different industrial drives. Various platforms, such as National Instruments, dSPACE, Typhoon HIL, or MATLAB Simulink Real-Time toolboxes and Speedgoat hardware systems, offer a powerful tool for efficient and successful investigations in different fields. Therefore, HIL simulation practice must begin already during the university’s education process to prepare the students for professional engagements in the industry, which was also verified experimentally at the end of the paper.
TL;DR: A natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts that create or change code is presented, indicating that while naturallanguage code synthesis can sometimes provide a magical experience, participants still faced challenges.
Abstract: In this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a large generative language model and 2) a set of task-specific prompts that create or change code. To understand the user experience of natural language code synthesis with these new types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s “syntax,” despite their input being natural language. Participants also struggled to form an accurate mental model of the types of requests the model can reliably translate and developed a set of strategies to debug model input. From these findings, we discuss design implications for future natural language code synthesis tools built using large generative language models.
TL;DR: TACTO as discussed by the authors is an open-source simulator for vision-based tactile sensors, which allows to render realistic high-resolution touch readings at hundreds of frames per second, and can be easily configured to simulate different vision based tactile sensors.
Abstract: Simulators perform an important role in prototyping, debugging, and benchmarking new advances in robotics and learning for control. Although many physics engines exist, some aspects of the real world are harder than others to simulate. One of the aspects that have so far eluded accurate simulation is touch sensing. To address this gap, we present TACTO - a fast, flexible, and open-source simulator for vision-based tactile sensors. This simulator allows to render realistic high-resolution touch readings at hundreds of frames per second, and can be easily configured to simulate different vision-based tactile sensors, including DIGIT and OmniTact. In this paper, we detail the principles that drove the implementation of TACTO and how they are reflected in its architecture. We demonstrate TACTO on a perceptual task, by learning to predict grasp stability using touch from 1 million grasps, and on a marble manipulation control task. Moreover, we provide a proof-of-concept that TACTO can be successfully used for Sim2Real applications. We believe that TACTO is a step towards the widespread adoption of touch sensing in robotic applications, and to enable machine learning practitioners interested in multi-modal learning and control. TACTO is open-source at https://github.com/facebookresearch/tacto.
TL;DR: In this article, a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit Clever Hans behavior is proposed, and several approaches are collectively termed as Class Artifact Compensation, which are able to effectively reduce a model's Clever Hans behaviour.
TL;DR: AdaTest, a process which uses large scale language models in partnership with human feedback to automatically write unit tests highlighting bugs in a target model, makes users 5-10x more effective at finding bugs than current approaches, and helps users effectively fix bugs without adding new bugs.
Abstract: Current approaches to testing and debugging NLP models rely on highly variable human creativity and extensive labor, or only work for a very restrictive class of bugs. We present AdaTest, a process which uses large scale language models (LMs) in partnership with human feedback to automatically write unit tests highlighting bugs in a target model. Such bugs are then addressed through an iterative text-fix-retest loop, inspired by traditional software development. In experiments with expert and non-expert users and commercial / research models for 8 different tasks, AdaTest makes users 5-10x more effective at finding bugs than current approaches, and helps users effectively fix bugs without adding new bugs.
TL;DR: A comprehensive survey of app review analysis can be found in this article , covering 182 papers published between 2012 and 2020, covering mined information and applied data mining techniques, and most importantly, in terms of supported software engineering activities.
Abstract: Abstract App reviews found in app stores can provide critically valuable information to help software engineers understand user requirements and to design, debug, and evolve software products. Over the last ten years, a vast amount of research has been produced to study what useful information might be found in app reviews, and how to mine and organise such information as efficiently as possible. This paper presents a comprehensive survey of this research, covering 182 papers published between 2012 and 2020. This survey classifies app review analysis not only in terms of mined information and applied data mining techniques but also, and most importantly, in terms of supported software engineering activities. The survey also reports on the quality and results of empirical evaluation of existing techniques and identifies important avenues for further research. This survey can be of interest to researchers and commercial organisations developing app review analysis techniques and to software engineers considering to use app review analysis.
TL;DR: X-NeSyL as discussed by the authors proposes to fuse DL representations with expert domain knowledge during the learning process so it serves as a sound basis for explainability, and demonstrate that with their approach, it is possible to improve explainability at the same time as performance.
TL;DR: In this article , the authors provide a comprehensive literature review to characterize the current research state of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc.
Abstract: Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications. It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164 papers on 17 research directions of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms for serverless computing, as well as promising research opportunities.
TL;DR: This work conducts the most large-scale study on 800 bugs from four popular and diverse DL frameworks and obtains 14 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing and debugging practice.
Abstract: DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such a wide effect demonstrates the necessity and importance of guaranteeing DL frameworks’ quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating designing effective bug detection and debugging approaches. Hence, in this work we conduct the most large-scale study on 1,000 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with 5 components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques, we obtain 12 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging. Finally, based on the guidelines, we design and implement a prototype DL-framework testing tool, called TenFuzz, which is evaluated to be effective and finds 3 unknown bugs on the latest TensorFlow framework in a preliminary study, indicating the significance of our guidelines.
TL;DR: A conceptual map of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones is drawn, highlighting similarities and differences between them.
Abstract: Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities in order to improve model transparency and allow users to form a mental model of a trained ML model. However, explanations can go beyond this one way communication as a mechanism to elicit user control, because once users understand, they can then provide feedback. The goal of this paper is to present an overview of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones. To this end, we draw a conceptual map of the state-of-the-art, grouping relevant approaches based on their intended purpose and on how they structure the interaction, highlighting similarities and differences between them. We also discuss open research issues and outline possible directions forward, with the hope of spurring further research on this blooming research topic.
TL;DR: A T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely ContInual Repair aCross Programming LanguagEs (CIRCLE), which utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR.
Abstract: Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed offline. Therefore, they won’t function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely ContInual Repair aCross Programming LanguagEs (CIRCLE). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE’s continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance (e.g., fixes 64 Defects4J bugs) with a single repair model.
TL;DR: Scenic as discussed by the authors is a probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning, which can be used to write environment models, an essential prerequisite to any formal analysis.
Abstract: Abstract We propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. We consider several problems arising in the design process, including training a system to be robust to rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs, then sampling these to generate specialized training and test data. More generally, such languages can be used to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems such as autonomous cars and robots, whose environment at any point in time is a scene , a configuration of physical objects and agents. We design a domain-specific language, Scenic , for describing scenarios that are distributions over scenes and the behaviors of their agents over time. Scenic combines concise, readable syntax for spatiotemporal relationships with the ability to declaratively impose hard and soft constraints over the scenario. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic ’s domain-specific syntax. Finally, we apply Scenic in multiple case studies for training, testing, and debugging neural networks for perception both as standalone components and within the context of a full cyber-physical system.
TL;DR: In this article , a review of the existing literature on computational thinking activities in primary mathematics education is presented and analyzed, revealing a recent increased focus on the inclusion of Computational Thinking in primary Mathematics classrooms.
Abstract: Abstract Computational thinking (CT) has acquired the status of a necessary 21st-century skill and is currently being introduced in school curricula around the world, despite a lack of consensus about what it entails. The aims of this review are to provide an overview of the existing literature on CT activities in primary mathematics education, and to articulate how it is integrated into the teaching and learning of primary mathematics. This systematic review presents and analyses the findings of 10 empirical studies, revealing a recent increased focus on the inclusion of CT in primary mathematics classrooms, as most studies are published around 2020. Our findings indicate two categories of such activities, one focusing on skills (such as mainly sequencing, looping, conditionals, debugging, decomposition, and abstraction) and one on process-oriented activities (communication, creativity, exploration, and engagement). Furthermore, we found that, while there are studies reporting on mathematics being taught directly through CT activities (full integration), in most studies, the mathematics content was emphasised, with CT built in as a way for students to demonstrate their understanding of mathematics concepts (partial integration). This review identifies current gaps in the field and the need to investigate further such process-oriented activities, the use of these activities in accelerated mathematics, and the need for different methodological approaches in primary mathematics.
TL;DR: A review of machine learning-assisted synthesis of nanoparticles can be found in this paper , where the authors describe the working steps of the machine learning algorithms, the main algorithms, and the main ways to obtain datasets.
Abstract: Abstract Nanoparticles play irreplaceable roles in optoelectronic sensing, medical therapy, material science, and chemistry due to their unique properties. There are many synthetic pathways used for the preparation of nanoparticles, and different synthetic pathways can produce nanoparticles with different properties. Therefore, it is crucial to control the properties of nanoparticles precisely to impart the desired functions. In general, the properties of nanoparticles are influenced by their sizes and morphologies. Current technology for the preparation of nanoparticles on microfluidic chips requires repeated experimental debugging and significant resources to synthesize nanoparticles with precisely the desired properties. Machine learning-assisted synthesis of nanoparticles is a sensible choice for addressing this challenge. In this paper, we review many recent studies on syntheses of nanoparticles assisted by machine learning. Moreover, we describe the working steps of machine learning, the main algorithms, and the main ways to obtain datasets. Finally, we discuss the current problems of this research and provide an outlook.
TL;DR: It is found that the problem-based digital making environment supported the students’ development of critical thinking in the form of critical modeling and critical data handling, creativity in the shape of creative explorations, creative solutions, and creative expressions, and communication and collaboration in the forms of communicative scaffolding and collaborative debugging.
TL;DR: This work proposes a novel approach called TRANSFER, which lever-ages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair and outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair.
Abstract: Automatic software debugging mainly includes two tasks of fault lo-calization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep seman-tic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which lever-ages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark De-fects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.
TL;DR: This work co-designed and implemented a prototype system that allowed end-users to see why predictions were made, and then to change weights on features to “debug” fairness issues, and evaluated the use of this prototype system through an online study.
Abstract: Ensuring fairness in artificial intelligence (AI) is important to counteract bias and discrimination in far-reaching applications. Recent work has started to investigate how humans judge fairness and how to support machine learning experts in making their AI models fairer. Drawing inspiration from an Explainable AI approach called explanatory debugging used in interactive machine learning, our work explores designing interpretable and interactive human-in-the-loop interfaces that allow ordinary end-users without any technical or domain background to identify potential fairness issues and possibly fix them in the context of loan decisions. Through workshops with end-users, we co-designed and implemented a prototype system that allowed end-users to see why predictions were made, and then to change weights on features to “debug” fairness issues. We evaluated the use of this prototype system through an online study. To investigate the implications of diverse human values about fairness around the globe, we also explored how cultural dimensions might play a role in using this prototype. Our results contribute to the design of interfaces to allow end-users to be involved in judging and addressing AI fairness through a human-in-the-loop approach.
TL;DR: In this paper , the authors focus on two aspects of computational thinking, such as, algorithmic thinking and debugging skills with the use of scaffolded programming scripts in an educational technology undergraduate course.
TL;DR: In this article , the authors propose a framework to train reproducible deep learning models using a set of general criteria to evaluate the reproducibility of DL models for two different domains, and a unified framework which leverages a record-andreplay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism.
Abstract: Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a record-and-replay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism, and (3) a reproducibility guideline which explains the rationales and the mitigation strategies on conducting a reproducible training process for DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
TL;DR: In this paper , the learning effect of the debugging process is taken under advisement, and assumes that it is unstable since the process of the error removal is also imperfect, which may cause a fluctuation of errors in the system.
TL;DR: In this paper , the authors propose ConfusionFlow, an interactive, comparative visualization tool that combines the benefits of class confusion matrices with the visualization of performance characteristics over time, which can be used to compare performances for different model types, model architectures, and/or training and test datasets.
Abstract: Classifiers are among the most widely used supervised machine learning algorithms. Many classification models exist, and choosing the right one for a given task is difficult. During model selection and debugging, data scientists need to assess classifiers' performances, evaluate their learning behavior over time, and compare different models. Typically, this analysis is based on single-number performance measures such as accuracy. A more detailed evaluation of classifiers is possible by inspecting class errors. The confusion matrix is an established way for visualizing these class errors, but it was not designed with temporal or comparative analysis in mind. More generally, established performance analysis systems do not allow a combined temporal and comparative analysis of class-level information. To address this issue, we propose ConfusionFlow, an interactive, comparative visualization tool that combines the benefits of class confusion matrices with the visualization of performance characteristics over time. ConfusionFlow is model-agnostic and can be used to compare performances for different model types, model architectures, and/or training and test datasets. We demonstrate the usefulness of ConfusionFlow in a case study on instance selection strategies in active learning. We further assess the scalability of ConfusionFlow and present a use case in the context of neural network pruning.
TL;DR: Experimental results show that LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and LTDD is more actionable for fairness improvement in realistic scenarios.
Abstract: With the widespread application of machine learning (ML) software, especially in high-risk tasks, the concern about their unfairness has been raised towards both developers and users of ML software. The unfairness of ML software indicates the software behavior affected by the sensitive features (e.g., sex), which leads to biased and illegal decisions and has become a worthy problem for the whole software engineering community. According to the “data-driven” programming paradigm of ML software, we consider the root cause of the unfairness as biased features in training data. Inspired by software debugging, we propose a novel method, Linear-regression based Training Data Debugging (LTDD), to debug feature values in training data, i.e., (a) identify which features and which parts of them are biased, and (b) exclude the biased parts of such features to recover as much valuable and unbiased information as possible to build fair ML software. We conduct an extensive study on nine data sets and three classifiers to evaluate the effect of our method LTDD compared with four baseline methods. Experimental results show that (a) LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and (b) LTDD is more actionable for fairness improvement in realistic scenarios.
TL;DR: A comprehensive survey of recent progress in assertion-based hardware verification, outlining how to define assertions using temporal logic to specify expected behaviors in different abstraction levels and describing state-of-the art approaches for automated generation of assertions.
Abstract: Hardware verification of modern electronic systems has been identified as a major bottleneck due to the increasing complexity and time-to-market constraints. One of the major objectives in hardware verification is to drastically reduce the validation and debug time without sacrificing the design quality. Assertion-based verification is a promising avenue for efficient hardware validation and debug. In this article, we provide a comprehensive survey of recent progress in assertion-based hardware verification. Specifically, we outline how to define assertions using temporal logic to specify expected behaviors in different abstraction levels. Next, we describe state-of-the art approaches for automated generation of assertions. We also discuss test generation techniques for activating assertions to ensure that the generated assertions are valid. Finally, we present both pre-silicon and post-silicon assertion-based validation approaches that utilize simulation, formal methods as well as hybrid techniques. We conclude with a discussion on utilizing assertions for verifying both functional and non-functional requirements.
TL;DR: In this article , the authors review existing studies from the following three aspects along with the pipeline highly related to machine learning: data preparation, data discovery, data cleaning and data labeling.
Abstract: Machine learning(ML) has widespread applications and has revolutionized many industries, but suffers from several challenges. First, sufficient high-quality training data is inevitable for producing a well-performed model, but the data is always human expensive to acquire.Second, a large amount of training data and complicated model structures lead to the inefficiency of training and inference. Third, given an ML task, one always needs to train lots of models, which are hard to manage in real applications. Fortunately, database techniques can benefit ML by addressing the above three challenges. In this paper, we review existing studies from the following three aspects along with the pipeline highly related to ML. (1) Data preparation(Pre-ML): it focuses on preparing high-quality training data that can improve the performance of the ML model, where we review data discovery, data cleaning and data labeling. (2) Model training \& inference(In-ML): researchers in ML community focus on improving the model performance during training, while in this survey we mainly study how to accelerate the entire training process, also including feature selection and model selection. (3) Model management(Post-ML): in this part, we survey how to store, query, deploy and debug the models after training. Finally, we provide research challenges and future directions.
TL;DR: Binder as discussed by the authors is a training-free neural-symbolic framework that maps the task input to a program, which allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python).
Abstract: Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few in-context exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. Binder achieves state-of-the-art results on WikiTableQuestions and TabFact datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while Binder only uses dozens of annotations as in-context exemplars without any training. Our code is available at https://github.com/HKUNLP/Binder .
TL;DR: MINJIE as mentioned in this paper integrates a broad set of tools for logic design, functional verification, performance modelling, pre-silicon validation and debugging for better development efficiency of state-of-the-art processor designs.
Abstract: While research has shown that the agile chip design methodology is promising to sustain the scaling of computing performance in a more efficient way, it is still of limited usage in actual applications due to two major obstacles: 1) Lack of tool-chain and developing framework supporting agile chip design, especially for large-scale modern processors. 2) The conventional verification methods are less agile and become a major bottleneck of the entire process. To tackle both issues, we propose MINJIE, an open-source platform supporting agile processor development flow. MINJIE integrates a broad set of tools for logic design, functional verification, performance modelling, pre-silicon validation and debugging for better development efficiency of state-of-the-art processor designs. We demonstrate the usage and effectiveness of MINJIE by building two generations of an open-source superscalar out-of-order RISC-V processor code-named XIANGSHAN using agile methodologies. We quantify the performance of XIANGSHAN using SPEC CPU2006 benchmarks and demonstrate that XIANGSHAN achieves industry-competitive performance.
TL;DR: The Spectrum Based Fault Localization (SBFL) as mentioned in this paper is one of the most prominent techniques in this respect due to its efficiency and effectiveness, however, it requires significant human participation and is difficult to automate its sub-tasks.
Abstract: In software debugging, fault localization is the most difficult, expensive, tedious, and time-consuming task, particularly for large-scale software systems. This is due to the fact that it requires significant human participation and it is difficult to automate its sub-tasks. Therefore, there is a high demand for automatic fault localization techniques that can help software engineers effectively find the locations of faults with minimal human intervention. This has led to the proposal of implementing different types of such techniques. However, Spectrum Based Fault Localization (SBFL) is considered amongst the most prominent techniques in this respect due to its efficiency and effectiveness. In SBFL, the probability of each program element (e.g., statement, block, or function) being faulty is calculated based on the results of executing test cases and their corresponding code coverage information. However, SBFL techniques are not yet widely adopted in the industry. The rationale behind this is that they pose a number of issues and their performance is affected by several influential factors. For example, the characteristics of bugs, target programs, test suites, and supporting tools make their effectiveness differ dramatically from one case to another. There are massive studies on SBFL that cover its usage, formulas, performance, etc. So far, no dedicated survey points out comprehensively the issues of SBFL. In this paper, various SBFL challenges and issues have been identified, categorized, and discussed alongside many directions. Also, the paper raises awareness of the works being achieved to address the identified issues and suggests some potential solutions too.