TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Abstract: A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. It permits a high degree of concurrency, yet it permits programmers to specify and reason about concurrent objects using known techniques from the sequential domain. Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response, implying that the meaning of a concurrent object's operations can be given by pre- and post-conditions. This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
TL;DR: A concurrent programming approach to digital VLSI design is proposed, which is first implemented as a concurrent program that fulfills the logical specification of the circuit, and then compiled manually or automatically into a circuit by applying semantic-preserving program transformations.
Abstract: : see report With chip size reaching one million transistors. the complexity of VLSI algorithms -i.e., algorithms implemented as digital VLSI circuits is approaching that of software algorithms i.e., algorithms implemented as code for a stored-program computer. Yet design methods for VLSI algorithms lag far behind the potential of the technology. Since a digital circuit is the implementation of a concurrent algorithm. we propose a concurrent programming approach to digital VLSI design. The circuit to be designed is first implemented as a concurrent program that fulfills the logical specification of the circuit. The program is then compiled manually or automatically into a circuit by applying semantic-preserving program transformations. Hence, the circuit obtained is correct by construction. The main obstacle to such a method is finding an interface that provides a good separation of the physical and algorithmic concerns. Among the physical parameters of the implementation, timing is the most difficult to isolate from the logical design. because the timing properties of a circuit are essential not only to its real-time behavior but also to its logical correctness if the usual synchronous techniques are used to implement sequencing. For this reason. delay-insensitive techniques are particularly attractive for VLSI synthesis. A circuit is delay-insensitive when its correct operation is independent of any assumption on delays in operators and wires except that the delays be finite [17]. Such circuits do not use a clock signal or knowledge about delays. Let us clarify a matter of definitions right away: The class of entirely delay insensitive circuits is very limited. Different asynchronous techniques distinguish themselves in the choice of the compromises about delay-insensitivity. Speed-independent techniques assume that delays in gates are arbitrary, but that there are no delays in wires.
TL;DR: A highly concurrent Toeplitz system solver, featuring maximum parallelism and localized communication, and a pipelined processor architecture is proposed which uses only localized interconnections and yet retains themaximum parallelism attainable.
Abstract: The design of VLSI parallel processors requires a fundamental understanding of the parallel computing algorithm and an appreciation of the implementational constraint on communications. Based on such consideration, this paper develops a highly concurrent Toeplitz system solver, featuring maximum parallelism and localized communication. More precisely, a highly parallel algorithm is proposed which achieves O(N) computing time with a linear array of O(N) processors. This compares very favorably to the O(N \log_{2} N) computing time attainable with the traditional Levinson algorithm implemented in parallel. Furthermore, to comply with the communication constraint, a pipelined processor architecture is proposed which uses only localized interconnections and yet retains the maximum parallelism attainable.
TL;DR: This paper discusses the development of a concurrent algorithm for the solution of systems of equations arising in finite element applications based on a hybrid of direct elimination method and preconditioned conjugate iteration.
Abstract: This paper discusses the development of a concurrent algorithm for the solution of systems of equations arising in finite element applications. The approach is based on a hybrid of direct elimination method and preconditioned conjugate iteration. Two different preconditioners are used; diagonal scaling and a concurrent implementation of incomplete LU factorization. First, an automatic procedure is used to partition the finite element mesh into sub-structures. The particular mesh partition is chosen to minimize an estimate of the cost for evaluating the solution using this algorithm on a concurrent computer. These procedures are implemented in a finite element program on the JPL/CalTech MARK III hypercube computer. An overview of the structure of this program is presented. The performance of the solution method is demonstrated with the aid of a number of numerical test runs, and its advantages for concurrent implementations are discussed. Efficiency and speed-up factors over sequential machines for the numerical examples are highlighted.
TL;DR: The results indicate that algorithms with updaters that lock-couple using exclusive locks perform poorly as compared to those that permit more optimistic index descents, and the need for a highly concurrent long-term lock holding strategy to obtain the full benefits of ahighly concurrent algorithm for index operations is demonstrated.
Abstract: A number of algorithms have been proposed to access B+-trees concurrently, but they are not well understood. In this article, we study the performance of various B+-tree concurrency control algorithms using a detailed simulation model of B+-tree operations in a centralized DBMS. Our study covers a wide range of data contention situations and resource conditions. In addition, based on the performance of the set of B+-tree concurrency control algorithms, which includes one new algorithm, we make projections regarding the performance of other algorithms in the literature. Our results indicate that algorithms with updaters that lock-couple using exclusive locks perform poorly as compared to those that permit more optimistic index descents. In particular, the B-link algorithms are seen to provide the most concurrency and the best overall performance. Finally, we demonstrate the need for a highly concurrent long-term lock holding strategy to obtain the full benefits of a highly concurrent algorithm for index operations.