TL;DR: Building and prototyping an agricultural electronic marketing system involved experimenting with distributed synchronization, atomic activity, and commit protocols and recovery algorithms.
Abstract: Building and prototyping an agricultural electronic marketing system involved experimenting with distributed synchronization, atomic activity, and commit protocols and recovery algorithms.
TL;DR: It is observed that to obtain this limited factor of 10-fold speed-up, it is necessary to exploit parallelism at a very fine granularity, and it is proposed that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors.
Abstract: Rule-based systems, on the surface, appear to be capable of exploiting large amounts of parallelism—it is possible to match each rule to the data memory in parallel. In practice, however, we show that the speed-up from parallelism is quite limited, less than 10-fold. The reasons for the small speed-up are: (1) the small number of rules relevant to each change to data memory; (2) the large variation in the processing required by the relevant rules; and (3) the small number of changes made to data memory between synchronization steps. Furthermore, we observe that to obtain this limited factor of 10-fold speed-up, it is necessary to exploit parallelism at a very fine granularity. We propose that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors. Using such a multiprocessor (with individual processors working at 2 MIPS), it is possible to obtain execution speeds of about 3800 rule-firings/sec. This speed is significantly higher than that obtained by other proposed parallel implementations of rule-based systems.
TL;DR: The algorithm is faster than the traditionallocked counter approach for two processors and has an attractive log2N time scaling for largerN, and requires a shared memory bandwidth which grows linearly withN, the number of participating processors.
Abstract: We describe and algorithm for barrier synchronization that requires only read and write to shared store. The algorithm is faster than the traditionallocked counter approach for two processors and has an attractive log2N time scaling for largerN. The algorithm is free of hot spots and critical regions and requires a shared memory bandwidth which grows linearly withN, the number of participating processors. We verify the technique using both a real shared memory multiprocessor, for numbers of processors up to 30, and a shared memory multiprocessor simulator, for number of processors up to 256.
TL;DR: A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated, which results in better response time when performing operations on replicated data.
Abstract: Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. In this paper, we describe a technique that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. We also discuss how this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism.
TL;DR: A way of combining algebraic specifications and Petri nets for specifying parallel systems formally, where the data structure of a system is algebraically specified while its behaviour, and especially the synchronization constraints, are specified by a Petri net-like schema.
Abstract: We present a way of combining algebraic specifications and Petri nets for specifying parallel systems formally. The data structure of a system is algebraically specified while its behaviour, and especially the synchronization constraints, are specified by a Petri net-like schema. The semantics of a specification is defined as a class of coloured Petri nets.
TL;DR: This work describes a protocol that is currently exploring in a cache synchronization scheme for a broadcast system, and analyzes the evolution of options that have been proposed under write-in (or write-back) policy.
Abstract: Many options are possible in a cache synchronization (or consistency) scheme for a broadcast system. We clarify basic concepts, analyze the handling of shared data, and then describe a protocol that we are currently exploring. Finally, we analyze the evolution of options that have been proposed under write-in (or write-back) policy. We show how our protocol extends this evolution with new methods for efficient busy-wait locking, waiting, and unlocking. The lock scheme allows locking and unlocking to occur in zero time, eliminating the need for test-and-set. The scheme also integrates processor atomic read-modify-write instructions and programmer/compiler busy-wait-synchronized operations under the same mechanism. The wait scheme eliminates all unsuccessful retries from the bus, and allows a process to work while waiting.
TL;DR: In this paper, a synchronizing circuit synchronizes the asynchronous ready signals for two separate microprocessor subsystems that are running synchronously as part of a fault tolerant computer system, confined in a master-slave arrangement.
Abstract: A synchronizing circuit synchronizes the asynchronous ready signals for two separate microprocessor subsystems that are running synchronously as part of a fault tolerant computer system. Duplicated synchronization circuits, confined in a master-slave arrangement, are utilized with the duplicate microprocessors. Storage and gating circuitry are used to provide the precise timing signals required for such synchronization.
TL;DR: Three parallel algorithms for computing the QR-factorization of a matrix are presented and computational results indicate that the Pipelined Givens method is preferred and that this is primarily due to the number of array references required by the various algorithms.
Abstract: Three parallel algorithms for computing the QR-factorization of a matrix are presented. The discussion is primarily concerned with implementation of these algorithms on a computer that supports tightly coupled parallel processes sharing a large common memory. The three algorithms are a Householder method based upon high-level modules, a Windowed Householder method that avoids fork-join synchronization, and a Pipelined Givens method that is a variant of the data-flow type algorithms offering large enough granularity to mask synchronization costs. Numerical experiments were conducted on the Denelcor HEP computer. The computational results indicate that the Pipelined Givens method is preferred and that this is primarily due to the number of array references required by the various algorithms.
TL;DR: In this article, a method and means for maintaining continuous bit synchronization of data transmitted from a remote unit through a base unit to a landline unit is disclosed, where an input data stream at a first bit rate, such as digitized or encrypted speech, is interleaved with a plurality of signalling words, and transmitted over an RF channel at a second bit rate.
Abstract: A method and means for maintaining continuous bit synchronization of data transmitted from a remote unit through a base unit to a landline unit is disclosed. An input data stream at a first bit rate, such as digitized or encrypted speech, is interleaved with a plurality of signalling words, and transmitted over an RF channel at a second bit rate. The base site recovers the clock of the received data, strips off the signalling word, modifies the bit rate of the received data, and adjusts the recovered clock rate to provide an output data stream which is in bit synchronization with the input data stream and within a predetermined modem specification. The encrypted data is then sent over landlines to a decryption unit which requires bit synchronization.
TL;DR: A survey is given of work performed by the authors in recent years concerning the semantics of imperative concurrency, for which a number of operational and denotational semantic models are developed.
Abstract: A survey is given of work performed by the authors in recent years concerning the semantics of imperative concurrency. Four sample languages are presented for which a number of operational and denotational semantic models are developed. All languages have parallel execution through interleaving, and the last three have as well a form of synchronization. Three languages are uniform, i.e., they have uninterpreted elementary actions; the fourth is nonuniform and has assignment, tests and value-passing communication. The operational models build on Hennessy-Plotkin transition systems; as denotational structures both metric spaces and cpo domains are employed. Two forms of nondeterminacy are distinguished, viz. the local and global variety. As associated model-theoretic distinction that of linear time versus branching time is investigated. In the former we use streams, i.e. finite or infinite sequences of actions; in the latter the (metrically based) notion of process is introduced. We furthermore study a model with only finite observations. Ready sets also appear, used as technical tool to compare various semantics. Altogether, ten models for the four languages are described, and precise statements on (the majority of) their interrelationships are made. The paper supplies no proofs; for these references to technical papers by the authors are provided.
TL;DR: The robotics group of the Stanford Artificial Intelligence Laboratory is currently developing a new computational system for robotics applications that uses multiple NSC 32016 processors and one MC68010 based processor, sharing a common Intel Multibus.
Abstract: The robotics group of the Stanford Artificial Intelligence Laboratory is currently developing a new computational system for robotics applications. Stanford's NYMPH system uses multiple NSC 32016 processors and one MC68010 based processor, sharing a common Intel Multibus. The 32K processors provide the raw computational power needed for advanced robotics applications, and the 68K provides a pleasant interface with the rest of the world. Software has been developed to provide useful communications and synchronization primitives, without consuming excessive processor resources or bus bandwidth. NYMPH provides both large amounts of computing power and a good programming environment, making it an effective research tool.
TL;DR: A distributed microprocessor-based architecture is described and a two-level machine tool synchronization structure is presented, given in terms of Petri nets: a graphical-mathematical technique suitable for representing process parallelism and asynchronism for flexible control of machine tools in a job shop environment.
TL;DR: An object-oriented approach for building distributed systems using ADA as the target language, exploiting its tasking and structuring mechanisms and the possibility of using a knowledge-based user interface promote rapid prototyping and reusability.
Abstract: This paper presents an object-oriented approach for building distributed systems. An example taken from the field of computer integrated manufacturing systems is taken as a guideline. According to this approach a system is built up through three steps: control and synchronization aspects for each class of objects are treated first using PROT nets, which are a high-level extension to Petri nets; then data are introduced specifying the internal states of the objects as well as the messages they send each other; finally the connections between the objects are introduced by means of a data flow diagram between classes. The implementation uses ADA as the target language, exploiting its tasking and structuring mechanisms. The flexibility of the approach and the possibility of using a knowledge-based user interface promote rapid prototyping and reusability.
TL;DR: A block synchronization data communication system enables data communication to be properly effected even through a transmission channel under an extremely unfavorable conditions as in mobile radio such as automobiles as discussed by the authors, which comprises the steps of encoding data to be transmitted and blocking the encoded data, adding a block synchronization signal having a plurality of different successive patterns positioned in the prescribed order to the head of said block data, transmitting the block data added with said block synchronizing signal, receiving said transmitted data and recognizing one of said patterns constituting the block synchronising signal, estimating the data position based on a block synchronized signal constitution position of
Abstract: A block synchronization data communication system enables data communication to be properly effected even through a transmission channel under an extremely unfavorable conditions as in mobile radio such as automobiles. The block synchronization data communicating system comprises the steps of encoding data to be transmitted and blocking the encoded data, adding a block synchronization signal having a plurality of different successive pattern positioned in the prescribed order to the head of said block data, transmitting the block data added with said block synchronizing signal, receiving said transmitted data and recognizing one of said patterns constituting the block synchronizing signal, estimating the data position based on a block synchronization signal constitution position of the properly recognized pattern, and decoding the block data beginning from said estimated position as received data.
TL;DR: The paper gives a notation for the pattern of rendezvous, a framework for translating a software/hardware system structure into an active-server queueing network model, and an implicit decomposition algorithm for solving for the system performance.
TL;DR: The main functions of the network controller are switch state selection and synchronization and the number of switching elements required is significantly less than the elements required in the universal permutation network, which makes this architecture suitable for VLSI implementation.
Abstract: The (M, L) -algorithm has been widely used in speech and image encoding. Recently, use of (M, L) -Iike algorithms has been suggested for decoding phase codes. With its ever-increasing use, there arises a need to explore architectures suitable for real-time applications. Toward this end, we present a multiprocessor architecture for the (M, L) algorithm that employs an SIMD (single instruction-multiple data) machine structure. The considerations involved in interconnection network design are discussed. The main functions of the network controller are switch state selection and synchronization. The number of switching elements required is significantly less than the elements required in the universal permutation network. These features make this architecture suitable for VLSI implementation. The tradeoff between number of processors and encoding time is also discussed.
TL;DR: Delta Prolog is presented, a distributed logic programming language that extends Prolog to include AND-parallelism (in a single processor or across a network of processors), interprocess communication via message passing with two-way pattern matching, interprocess synchronization with simultaneous message passing, and distributed backtracking among a family of processes.
Abstract: We present Delta Prolog, a distributed logic programming language that extends Prolog to include AND-parallelism (in a single processor or across a network of processors), interprocess communication via message passing with two-way pattern matching, interprocess synchronization with simultaneous message passing, and distributed backtracking among a family of processes. The extension is achieved, at the language level, by just two additional types of goals — events and splits. The implementation is written part in Prolog and part in C, with a small number of core primitives, to help portability. It is still experimental and expected to evolve. In this work we present the language's distinguishing features, describe its semantics, exhibit programs and analyse their behaviour, examine the implementation, and mention conclusions, advantages of the approach and the next developments.
TL;DR: Alfalfa is implementation of a functional language on the Intel iPSC multiprocessor based on a heterogeneous abstract machine model consisting of both graph reduction and stack oriented execution.
Abstract: Alfalfa is implementation of a functional language on the Intel iPSC multiprocessor. It is based on a heterogeneous abstract machine model consisting of both graph reduction and stack oriented execution. Alfalfa consists of two major components, a compiler and a run-time system. The source language, Alfl, contains no constructs that allow the programmer to specify parallelism or synchronization and thus it is the task of the compiler to detect the exploitable parallelism in a program. The run-time system supports dynamic scheduling, interprocessor communication, and storage management. A number of statistics gathered during execution are presented.
TL;DR: It is shown how the lock managers can be enabled to deal with the so-called buffer invalidation problem that results from the existence of a database buffer in each processor.
TL;DR: A widely applicable measuring system, especially suited for all types of 'voltage-clamp' (including 'patch-Clamp') experiments, is presented, mainly based on commercially available hardware and software components.
TL;DR: In this paper, word synchronization of a digital message comprised of spaced predetermined synchronizing words is indicated by a first initial detection of a synchronizing word followed by detection of two synchronized words out of the next four synchronized words.
Abstract: Word synchronization of a digital message comprised of spaced predetermined synchronizing words is indicated by a first initial detection of a synchronizing word followed by detection of two synchronizing words out of the next four synchronizing words.
TL;DR: In this paper, the authors present a parallel execution model for Horn Clause logic programs based on the generator-consumer approach, which can be implemented efficiently with small run-time overhead.
Abstract: This paper presents a parallel execution model for exploiting AND-parallelism in Horn Clause logic programs. The model is based upon the generator-consumer approach, and can be implemented efficiently with small run-time overhead. Other related models that have been proposed to minimize the run-time overhead are unable to exploit the full parallelism inherent in the generator-consumer approach. Furthermore, our model performs backtracking more intelligently than these models. We also present two implementation schemes to realize our model: one has a coordinator to control the activities of processes solving different literals in the same clause; and the other achieves synchronization by letting processes pass messages to each other in a distributed fashion. Trade-offs between these two schemes are then discussed.
TL;DR: In this paper, a servo system for flexible magnetic disks is described, in which the synchronization pulses do not have to be perfectly aligned, and the disk is encoded such that consecutive servo tracks alternate between having a synchronization and an alternate synchronization mark.
Abstract: A servo system for flexible magnetic disks such that the synchronization pulses do not have to be perfectly aligned. The disk is encoded such that consecutive servo tracks alternate between having a synchronization and an alternate synchronization mark. Four pulses represent a synchronization mark and two pulses represent an alternate synchronization mark. The alternate synchronization marks are off-set from the synchronization marks by a certain distance, such that they do not interfere with each other when a transducer head located between the servo tracks reads them. The disk drive system will read and identify the synchronization marks or alternate synchronization marks and start generating timing signals as appropriate. The servo bursts in the servo sector can then be read.
TL;DR: The reliability of a high-speed digital data traffic channel is determined by establishing end-to-end synchronization by use of confirming error detecting data transmitted on a side channel from an originating end as mentioned in this paper.
Abstract: The reliability of a high-speed digital data traffic channel is determined by establishing end-to-end synchronization by use of confirming error detecting data transmitted on a side channel from an originating end, comparing the traffic data and the error detecting data and from time to time re-confirming the validity of the synchronization.
TL;DR: This paper presents the integration of a concurrency control mechanism in class-based languages and addresses language and system issues: canonical examples of synchronized objects are provided and an implementation of the mechanism is outlined.
Abstract: This paper presents the integration of a concurrency control mechanism in class-based languages. Synchronization constraints are expressed as separate control clauses and are factorized for a class of objects. Interference of this mechanism with inheritance and transactions is examined and solutions are proposed. This paper addresses language and system issues: canonical examples of synchronized objects are provided and an implementation of the mechanism is outlined.