TL;DR: The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures.
Abstract: Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory.We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes.We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract
TL;DR: In this article, a lock data structure for concurrent access to a resource object, such as a database object, is proposed. But the approach is not suitable for the use of a legacy database without requiring burdensome changes to a database table schema.
Abstract: Techniques for concurrent access to a resource object, such as a database object, include generating a lock data structure for a particular resource object. The lock data structure includes data values for a resource object identification, a lock type, and a version number. The version number is related to a number of changes to the resource object since the lock data structure was generated. By carrying a lock version number in a lock data structure managed by a lock manager, improved optimistic locking is provided in a database. In particular, the approach enables introduction of optimistic locking to a legacy database without requiring burdensome changes to a database table schema.
TL;DR: In this article, a lock/unlock mechanism to control concurrent access to objects in a multi-threaded computer processing system comprises two parts: a thread pointer (or thread identifier), and a one-bit flag called a “Bacon bit”.
Abstract: A lock/unlock mechanism to control concurrent access to objects in a multi-threaded computer processing system comprises two parts: a thread pointer (or thread identifier), and a one-bit flag called a “Bacon bit”. Preferably, when an object is not locked (i.e., no thread has been granted access to the object), the thread identifier and Bacon bit are set to 0. When an object is locked by a particular thread (i.e., the thread has been granted access to the object), the thread identifier is set to a value that identifies the particular thread; if no other threads are waiting to lock the object, the Bacon bit is set to 0; however, if other threads are waiting to lock the object, the Bacon bit is set to ‘1’, which indicates the there is a queue of waiting threads associated with the object. To lock an object, a single CompareAndSwap operation is preferably used, much like with spin-locks; if the lock is already held by another thread, enqueueing is handled in out-of-line code. To unlock an object, in the normal case, a single CompareAndSwap operation may be used. This single operation atomically tests that the current thread owns the lock, and that no other threads are waiting for the object (i.e., the Bacon bit is ‘0’). A global lock is preferably used to change the Bacon bit of the lock. This provides an lock/unlock mechanism which combines many of the desirable features of both spin locking and queued locking, and can be used as the basis for a very fast implementation of the synchronization facilities of the Java language.
TL;DR: In this article, a shared coupling facility contains system lock management (SLM) means for supporting a distributed locking protocol used by a plurality of sharing lock managers each executing on a processor having access to the shared memory and to any other processors in the processor complex.
Abstract: A shared coupling facility contains system lock management (SLM) means for supporting a distributed locking protocol used by a plurality of sharing lock managers each executing on a processor having access to the shared memory and to any other processors in the processor complex. A request to lock a resource shared among the lock managers is first checked against a local hash table and then, if necessary, forwarded to the system lock management means in the shared memory for synchronous or asynchronous processing. List structures are maintained in the shared coupling facility to support the protocol, and are used by the system lock management means to record data recovery status. The sharing lock managers interact with the SLM means to control/manage lock contention, waiter queueing, and compatibility processing.
TL;DR: An object lock management system for use in a parallel data processing system where objects are accessible by processing activities on computing nodes within the parallel system is described in this article, where lock information is selectively reported to a global deadlock detector which performs deadlock detection.
Abstract: An object lock management system for use in a parallel data processing system where objects are accessible by processing activities on computing nodes within the parallel system. The system includes local lock control elements, where each of the local lock control element coordinates the locking of a predetermined set of objects. In particular, each local lock control element grants locks or queues lock requests in response to lock requests. Lock information is selectively reported to a global deadlock detector which performs deadlock detection. The global deadlock detector instructs the local lock control elements to release selected locks and queued lock requests upon detecting a deadlock. Lock information is reported to the global deadlock detector periodically and only for queued lock requests that have timed-out, whereby message traffic and processing overhead is reduced in the parallel system.