TL;DR: The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible, including boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
Abstract: Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns. 1 Machine Learning and Complex Systems Real world software engineers are often faced with the challenge of moving quickly to ship new products or services, which can lead to a dilemma between speed of execution and quality of engineering. The concept of technical debt was first introduced by Ward Cunningham in 1992 as a way to help quantify the cost of such decisions. Like incurring fiscal debt, there are often sound strategic reasons to take on technical debt. Not all debt is necessarily bad, but technical debt does tend to compound. Deferring the work to pay it off results in increasing costs, system brittleness, and reduced rates of innovation. Traditional methods of paying off technical debt include refactoring, increasing coverage of unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation [4]. The goal of these activities is not to add new functionality, but to make it easier to add future improvements, be cheaper to maintain, and reduce the likelihood of bugs. One of the basic arguments in this paper is that machine learning packages have all the basic code complexity issues as normal code, but also have a larger system-level complexity that can create hidden debt. Thus, refactoring these libraries, adding better unit tests, and associated activity is time well spent but does not necessarily address debt at a systems level. In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design.
TL;DR: This paper primarily deals with the key concepts of ATaG and the program syntax and semantics and the end-to-end application development methodology is discussed briefly.
Abstract: The Abstract Task Graph (ATaG) is a data driven programming model for end-to-end application development on networked sensor systems. An ATaG program is a system-level, architecture-independent specification of the application functionality. The application is modeled as a set of abstract tasks that represent types of information processing functions in the system, and a set of abstract data items that represent types of information exchanged between abstract tasks. Input and output relationships between abstract tasks and data items are explicitly indicated as channels. Each abstract task is associated with user-provided code that implements the actual information processing functions in the system. Appropriate numbers and types of tasks can then be instantiated at compile-time or run-time to match the actual hardware and network configuration, with each node incorporating the user-provided code, automatically generated glue code, and a runtime engine that manages all coordination and communication in the network. This paper primarily deals with the key concepts of ATaG and the program syntax and semantics. The end-to-end application development methodology is discussed briefly.
TL;DR: This book describes the basic principles, trends in research and practice of CBSE with emphasizes on dependable systems.
Abstract: This is a book about CBSE Component-Based Software Engineering. CBSE is the emerging discipline of the development of software components and the development of systems incorporating such components. Component-based systems are built by assembling components developed independently of the systems. To assemble components, a proprietary code, which connects the components, is usually needed. This code is often referred to as "glue code". In an ideal world of components, the assembly process is smooth and simple: the effort required to obtain the glue code is practically negligible; a system incorporating components knows everything about them their operational interfaces and their non-functional properties and the components are exactly what the system needs; in short, components can be assembled as easily as Lego blocks. In the real world, the component-based development process is complex and often difficult; systems are built from pre-existing components when appropriate and possible and by developing a new code specific to the particular system. The system may know about the syntax of the operational interfaces of the components, but not necessarily their other properties. Developing the glue code can be costly it may take a longer time to develop it than the components concerned. Software components are in fact much harder to assemble than Lego blocks. “Constructing software systems from components is more like having a bathtub full of Tinkertoy, Lego, Erector set, Lincoln logs, Block City, and six other incompatible kits picking out parts that fit specific functions and expecting them to fit together" (Mary Shaw: Architectural Issues in Software Reuse: It's Not Just the Functionality, It's the Packaging, Presentation at the Symposium on Software Reusability SSR'99). CBSE tries to make the real world as close as possible to the ideal world of component-based development. There is a long way to go to achieve this goal. In spite of many difficulties, the component-based approach has achieved remarkable success in many domains. A majority of the software programs we use everyday take advantage of component-based technologies. There are however many classes of software in which the utilization of the component-based approach is rudimentary. For these classes of software the specification of "how" is at least as important as the specification of "what". Example of these classes of systems are reliable systems, safety-, businessor missioncritical systems, (also known as dependable systems), embedded systems. The general-purpose component technologies currently available cannot cope with the non-functional (or more correctly extra-functional) requirements of such systems. These additional requirements call for new technologies, new methods and a specific approach of component-based software engineering. This book describes the basic principles, trends in research and practice of CBSE with emphasizes on dependable systems.
TL;DR: This paper describes an approach for the dynamic reconfiguration of applications based on CORBA components running in an environment called LuaSpace that is composed by the dynamically typed language Lua and a set of tools based on Lua.
Abstract: Component-based programming is a current trend in the development of software. The application is created using components and binding their interfaces appropriately at the configuration level. This is especially interesting for applications that, for availability reasons, claim dynamic reconfiguration. This paper describes an approach for the dynamic reconfiguration of applications based on CORBA components running in an environment called LuaSpace that is composed by the dynamically typed language Lua and a set of tools based on Lua. Components, scripts and glue code are the elements that form an application expressed in Lua. LuaSpace provides support for both programmed and ad-hoc reconfiguration. Although our work focuses on the configuration level, LuaSpace also handles component updating.
TL;DR: The design and implementation of the Glue-Nail database system is described, which is largely complete and has been tested using a suite of representative applications.
Abstract: We describe the design and implementation of the Glue-Nail database system. The Nail language is a purely declarative query language; Glue is a procedural language used for non-query activities. The two languages combined are sufficient to write a complete application. Nail and Glue code both compile into the target language IGlue. The Nail compiler uses variants of the magic sets algorithm, and supports well-founded models. Static optimization is performed by the Glue compiler using techniques that include peephole methods and data flow analysis. The IGlue code is executed by the IGlue interpreter, which features a run-time adaptive optimizer. The three optimizers each deal with separate optimization domains, and experiments indicate that an effective synergism is achieved. The Glue-Nail system is largely complete and has been tested using a suite of representative applications.