TL;DR: This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes and introduces interesting synchronization problems that are at the heart of the PDES problem.
Abstract: Parallel discrete event simulation (PDES), sometimes called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military applications, to mention a few, consume enormous amounts of time on sequential machines. From an academic point of view, parallel simulation is interesting because it represents a problem domain that often contains substantial amounts of parallelism (e.g., see [59]), yet paradoxically, is surprisingly difficult to parallelize in practice. A sufficiently general solution to the PDES problem may lead to new insights in parallel computation as a whole. Historically, the irregular, data-dependent nature of PDES programs has identified it as an application where vectorization techniques using supercomputer hardware provide little benefit [14].A discrete event simulation model assumes the system being simulated only changes state at discrete points in simulated time. The simulation model jumps from one state to another upon the occurrence of an event. For example, a simulator of a store-and-forward communication network might include state variables to indicate the length of message queues, the status of communication links (busy or idle), etc. Typical events might include arrival of a message at some node in the network, forwarding a message to another network node, component failures, etc.We are especially concerned with the simulation of asynchronous systems where events are not synchronized by a global clock, but rather, occur at irregular time intervals. For these systems, few simulator events occur at any single point in simulated time; therefore parallelization techniques based on lock-step execution using a global simulation clock perform poorly or require assumptions in the timing model that may compromise the fidelity of the simulation. Concurrent execution of events at different points in simulated time is required, but as we shall soon see, this introduces interesting synchronization problems that are at the heart of the PDES problem.This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes. For completeness, we conclude this section by mentioning other approaches to exploiting parallelism in simulation problems.Comfort and Shepard et al. have proposed using dedicated functional units to implement specific sequential simulation functions, (e.g., event list manipulation and random number generation [20, 23, 47]). This method can provide only a limited amount of speedup, however. Zhang, Zeigler, and Concepcion use the hierarchical decomposition of the simulation model to allow an event consisting of several subevents to be processed concurrently [21, 98]. A third alternative is to execute independent, sequential simulation programs on different processors [11, 39]. This replicated trials approach is useful if the simulation is largely stochastic and one is performing long simulation runs to reduce variance, or if one is attempting to simulate a specific simulation problem across a large number of different parameter settings. However, one drawback with this approach is that each processor must contain sufficient memory to hold the entire simulation. Furthermore, this approach is less suitable in a design environment where results of one experiment are used to determine the experiment that should be performed next because one must wait for a sequential execution to be completed before results are obtained.
TL;DR: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed, and the underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived.
Abstract: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed. A solution that has been proposed for bounding and minimizing synchronization delays in real-time systems is briefly reviewed. The waiting times introduced by synchronization requirements in multiple-processor environments are identified, and a set of goals for priority-based multiprocessor synchronization protocols is derived. The underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived. >
TL;DR: In this paper, a performance evaluation of the Symmetry multiprocessor system revealed that the synchronization mechanism did not perform well for highly contested locks, like those found in certain parallel applications.
Abstract: A performance evaluation of the Symmetry multiprocessor system revealed that the synchronization mechanism did not perform well for highly contested locks, like those found in certain parallel applications. Several software synchronization mechanisms were developed and evaluated, using a hardware monitor, on the Symmetry multiprocessor system; the mechanisms were to reduce contention for the lock. The mechanisms remain valuable even when changes are made to the hardware synchronization mechanism to improve support for highly contested locks. The Symmetry architecture is described, and a number of lock algorithms and their use of hardware resources are examined. The performance of each lock is observed from the perspective of both the program itself and the total system performance. >
TL;DR: Free partialy commutative monoids with full semi-thue systems and mobius functions and trace replacement systems are presented.
Abstract: Free partialy commutative monoids.- Recognizable and rational trace languages.- Petri nets and synchronization.- Complete semi-thue systems and mobius functions.- Trace replacement systems.
TL;DR: In this article, a synchronization controller is provided for each processor in a multiprocessor system, which is commonly connected to a synchronization signal bus, and each of the synchronization controllers has a synchronization wait signal transmitting means for receiving a synchronization request signal from a corresponding processor.
Abstract: A synchronization controller is provided for each processor in a multiprocessor system. The synchronization controllers are commonly connected to a synchronization signal bus. Each of the synchronization controllers has a synchronization wait signal transmitting means for receiving a synchronization request signal from a corresponding processor, signal means for transmitting a synchronization wait signal to the synchronization signal bus, a synchronization register for specifying the other processors to be synchronized with the corresponding processor, a comparator means for comparing the signal from the synchronization signal bus with the content of the synchronization resister, and a means for transmitting to the corresponding processor a synchronization-acknowledge signal based on the result of comparison by the comparator means.
TL;DR: In this article, the authors present static analysis methods that can be applied to parallel programs with event variable synchronization, with the focus on how dependencies and synchronization statements inside loops can be used to analyze complete programs with parallel loop and parallel case style parallelism.
Abstract: Understanding synchronization is important for a parallel programming tool that uses dependence analysis as the basis for advising programmers on the correctness of parallel constructs. This paper discusses static analysis methods that can be applied to parallel programs with event variable synchronization. The objective is to be able to predict potential data races in a parallel program. The focus is on how dependencies and synchronization statements inside loops can be used to analyze complete programs with parallel loop and parallel case style parallelism.
TL;DR: The design of an extra performance architecture for Delta-4, which explicitly supports the requirements of real-time systems with respect to throughput and response, is presented and a solution based on message selection and preemption synchronization messages is proposed.
Abstract: The design of an extra performance architecture for Delta-4, which explicitly supports the requirements of real-time systems with respect to throughput and response, is presented. The Delta-4 approach to fault tolerance is based on the replication of software components on distinct host computers using a range of different replication strategies. The problems of replicate divergence are discussed, and a solution based on message selection and preemption synchronization messages is proposed. A description of the ongoing implementation of such a system within the overall Delta-4 framework is included. >
TL;DR: In this article, a power saving method and apparatus in a time division multiplexed system capable of providing a synchronous full duplex communication between a telephone network (12) and a plurality of remote communication units (18) is presented.
Abstract: A power saving method and apparatus in a time division multiplexed system (10) capable of providing a synchronous full duplex communication between a telephone network (12) and a plurality of remote communication units (18). A communication resource controller (14) provides system synchronization, by periodically transmitting synchronization messages through one or more remote sites (11). The communication units (18) attempt to acquire synchronization during a synchronization acquisition interval. If synchronization is acquired, the communication units (18) enter a synchronous battery saving mode (515). In the synchronous battery saving mode (515), the communication units (18) can detect a call request either to their own address or to the address of another communication unit. If no call request is detected, the communication units (18) reduce power consumption for a synchronous power saving time interval, and thereafter merely verify synchronization. However, if synchronization is not acquired, the communication units enter an asynchronous power saving mode (525), wherein they reduce power consumption for an asynchronous power saving interval.
TL;DR: In this paper, a comparison and synchronization logic is used between N processors in redundant configuration and peripheral devices to insure that the redundant processors are performing the same read/write operations, and an overall watchdog timer provides for detecting an error condition for nonresponsive or lead responding processors.
Abstract: N redundant processors operating in functional lockstep synchronization for maintaining system integrity. Comparison and synchronization logic are connected between N processors in redundant configuration and peripheral devices. The comparison and synchronization logic act to insure that the redundant processors are performing the same read/write operations. Calculation or processing not requiring access to peripherals may take place in an asynchronous manner. Processors are halted from performing further operations until all appropriate read or write operations are synchronized. The processors are then allowed to proceed. An overall watchdog timer provides for detecting an error condition for non-responsive or lead responding processors.
TL;DR: A set of f functional requirements is identified for a multimedia server considering database management, object synchronization and integration, and multimedia query processing for a distributed system.
TL;DR: The impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique finds that even though there can be a lot of parallelism at the fine grain level, synchronization and scheduling strategies determine the ultimate performance of the system.
Abstract: In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, synchronization and scheduling strategies determine the ultimate performance of the system. Loop-iteration level parallelism seems to be a more appropriate level when those factors are considered. We also study barrier synchronization and data synchronization at the loop iteration level and found both schemes are needed for a better performance.
TL;DR: In this paper, the authors propose a network entry synchronization scheme for the synchronization of a frequency hopping transceiver to a network by embedding synchronization codes in the pseudo-random frequency hopping transmission sequence.
Abstract: Synchronization of a frequency hopping transceiver to a network by embedding synchronization codes in the pseudo-random frequency hopping transmission sequence A receiver is implemented with a frequency detector and a correlator to generate a correlator signal in response to the synchronization codes in the pseudo-random frequency hopping transmission sequence Detection of a peak in the correlator signal is indicative of synchronization of the receiver with the network The network entry synchronization scheme is such that, when two transceivers A and B are communicating, a third unnetworked transceiver C extracts the hidden network entry code pattern from the A-B transmission in order to enter the network As a part of the communication between the two transceivers A and B, transceiver A transmits a known pattern as a hidden part of the communication which allows transceiver C to enter the A-B network This hidden code pattern permits rapid synchronization and correction of large initial time errors, and permits correction of time drift from then on
TL;DR: In this article, a battery saving algorithm for supplying power to a selective call communication receiver for enabling the detection of a synchronization codeword in data received in a predetermined signaling format comprises circuits (132, 140, 142), and a circuit for detecting valid data received during a first portion of the first predetermined time interval.
Abstract: A battery saving apparaus (100) for supplying power to a selective call communication receiver for enabling the detection of a synchronization codeword in data received in a predetermined signaling format comprises circuits (132, 140, 142) for supplying power to the receiver, and a circuit (112) for detecting valid data received during a first portion of the first predetermined time interval. Power is maintained to the receiver for the remainder of the first predetermined time interval when valid data is detected in the first portion. A synchronization codeword detector (116) is included for detecting a synchronization codeword. When valid data is subsequently detected following a second portion of the first predetermined time interval and the synchronization codeword is not detected in the first predetermined time interval, power is maintained to the receiver for a second predetermined time interval to further enable detection of the synchronization codeword.
TL;DR: In this paper, the authors proposed a self-correcting synchronization signal for a communication interconnect which connects multiple units, the synchronization method including the steps of: waiting for a predetermined period of time until the next synchronization signal is due (nominal cycle period); waiting a further period for the absence of communications on the interconnect among any of the units, and forming a count of the passage of time during this delay (start delay); forming a synchronization signal including the count of delay time; and sending the synchronization signal on the Interconnect.
Abstract: A self-correcting synchronization signal for a communication interconnect which connects multiple units, the synchronization method including the steps of: waiting for a predetermined period of time until the next synchronization signal is due (nominal cycle period); waiting a further period of time for the absence of communications on the interconnect among any of the units, and forming a count of the passage of time during this delay (start delay); forming a synchronization signal including the count of the delay time; and sending the synchronization signal on the interconnect. Also according to this invention, a unit receiving the synchronization signal on the interconnect can update its count of time by the predetermined period of time plus the count of the delay period of time. Also according to this invention, some units can be assigned sequential access numbers, and upon recieving the synchronization signal, units without an assigned access number refrain from using the interconnect until after all units with an assigned access number have sequentially used the interconnect. This provides at last one clear opportunity for each of the units with assigned access numbers to access the interconnect in the period between synchronization signals before the interconnect is available for general access by all units. An apparatus for providing the self-correcting synchronization signal in accordance with this invention includes a cycle master unit for generating cycle start signals and transmitting them on the interconnect, and a receiving unit for receiving cycle sync signals from the interconnect and extracting the timing information to generate a local cycle synch signal.
TL;DR: In this paper, a microprocessor controlled video monitor is presented, which is able to automatically adjust the values of its parameters to adjust to operation on a number of different computer systems, including control lines (35-39,43,53-60), digital-to-analog converters (3,45), and a control processor (1).
Abstract: A microprocessor controlled video monitor is presented. The video monitor is able to automatically adjust the values of its parameters to adjust to operation on a number of different computer systems. The video monitor includes control lines (35-39,43,53-60), digital-to-analog converters (3,45) and a control processor (1). The control processor (1), through the digital-to-analog converters (3,45), controls the values of the parameters of the video monitor. Stored in a non-volatile memory (2) are entries which contain values of video monitor parameters. The control processor (1) recognizes different computing systems on the basis of the frequency and polarity of horizontal and vertical synchronization signals. When either frequency or polarity of either the horizontal or vertical synchronization signals changes, the control processor (1) will search the non-volatile memory (2) for an entry in which values stored for both the frequency and polarity of both the horizontal and vertical synchronization signals matches the currently measured frequency and polarity of the horizontal and vertical synchronization signals. If a match is found the values for the parameters stored in the entry are applied by the control processor (1) through the digital-to-analog converters (3,45) to the control lines (35-39,43,53-60). A user may adjust certain parameters through the use of switches (183,184,185) which are periodically polled by the control processor (1). When the control processor (1) receives instructions from a user through manipulation of the switches (183,184,185) the control processor (1) makes the specified changes to the video monitor parameters and stores the new values in non - volatile memory (2).
TL;DR: This paper considers the use of massively parallel architectures to execute discrete-event simulations of what are term “self-initiating” models, and considers the performance of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and higher bounds on the performance on a new consevative protocol.
Abstract: The use is considered of massively parallel architectures to execute discrete-event simulations of what is termed self-initiating models. A logical process in a self-initiating model schedules its own state re-evaluation times, independently of any other logical process, and sends its new state to other logical processes following the re-evaluation. The interest is in the effects of that communication on synchronization. The performance is considered of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new conservative protocol. The analysis of Time Warp includes the overhead costs of state-saving and rollback. The analysis points out sufficient conditions for the conservative protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fan-out, lookahead ability, and the probability distributions underlying the simulation.
TL;DR: In this paper, an event driven journaling mechanism which is not dependent on the timing of execution of processes is implemented, where synchronization events, referred to as synchronization points, mark locations in the journal file of events where previously initiated processing must be completed before initiating the subsequent process.
Abstract: In the system of the present invention, an event driven journaling mechanism which is not dependent on the timing of execution of processes is implemented. Special events, referred to as synchronization events, mark locations, referred to as synchronization points, in the journal file of events where previously initiated processing must be completed before initiating the subsequent process. The synchronization points are located between processes which are exchanging state. The synchronization events are put into the journal file during the recording phase. On playback, the journaling mechanism waits for a synchronization event to occur before proceeding to the next action in the journal file and initiating subsequent execution of the process.
TL;DR: As an exercise in synchronization without mutual exclusion, algorithms are developed to implement both a monotonic and a cyclic multiple-word clock that is updated by one process and read by one or more other processes.
Abstract: As an exercise in synchronization without mutual exclusion, algorithms are developed to implement both a monotonic and a cyclic multiple-word clock that is updated by one process and read by one or more other processes.
TL;DR: In this paper, the performance of distributed window systems which employ shared libraries is provided, where synchronization events are preset in a journal file which, during playback of the same, trigger a mechanism which reads the page table of the current applications.
Abstract: In the system of the present invention, the performance of distributed window systems which employ shared libraries is provided. A shared library is a library which is referenced and accessed by multiple processes. Synchronization events are preset in a journal file which, during playback of the same, trigger a mechanism which reads the page table of the current applications. Once the playback is complete, the page table measurements taken during playback are reviewed and information is extracted which is used to determine the working set for shared libraries. Using the working set, the number of pages referenced and the rate of reference can be used to improve performance and predict behavior of other systems.
TL;DR: Epsilon -2 implements a hybrid computation model that combines the fine-grain parallelism of dataflow computing with the sequential efficiency characteristic of von Neumann computing and provides instruction-level synchronization, single-cycle context switches, and RISC-like sequential execution.
TL;DR: It is concluded that the main advantage of the proposed receiver structure is the enormous flexibility which makes it possible to implement and compare many different schemes of code synchronization and data demodulation.
Abstract: A modified RAKE-receiver resolving the multipath components of the channel impulse response (time diversity) is introduced for a microcellular indoor communications system with code-division multiple-access (CDMA) using direct-sequence spread spectrum (DSSS). Data demodulation and pseudonoise (PN) code synchronization are performed by a parallel processing architecture consisting of multiple digital signal processors (DSPs). Algorithms for code acquisition and tracking or scanning are presented which are based on real-time measurements of the channel impulse response. The parameters of these algorithms are adapted to the currently measured signal-to-noise ratio in the data band. It is concluded that the main advantage of the proposed receiver structure is the enormous flexibility which makes it possible to implement and compare many different schemes of code synchronization and data demodulation. Another advantage is that the same hardware is used for synchronization and data demodulation. >
TL;DR: In this article, a technique for automatically removing the skew between multiple correlated synchronous data streams provides Vernier Skew compensation, where data streams are marked into data frames under the control of local synchronized transmitter clocks.
Abstract: A technique for automatically removing the skew between multiple correlated synchronous data streams provides Vernier Skew compensation. The data streams are marked into data frames under the control of local synchronized transmitter clocks. The data streams received at a receiver are loaded into FIFO registers under the control of recovered clocks. Data from the FIFO registers are unloaded under the control of a local receiver clock synchronized with the transmitter clocks. The frame marked data in the data streams is checked for a synchronization fault at the receiver. When a synchronization fault is detected in a data stream, the loading and unloading of the FIFO register corresponding to that data stream is inhibited and then the FIFO register is purged. A frame header in the data stream is detected and then the loading of the FIFO register is enabled with the first value to arrive which is marked as a frame header. At the next succeeding time for an expected frame header, normal unloading of the FIFO register is initiated. An alternate embodiment substitutes bi-port register arrays for the FIFO registers providing greater simplicity and flexibility.
TL;DR: In this paper, the synchronization of multi-media events on a computer is discussed, and a way of loading such processes into such memory so as to approximate optimally the desired synchronized production when such processes are played is presented.
Abstract: The invention relates to the synchronization of multi-media events on a computer. A computer of limited core or random access memory makes it difficult to run concurrently various media processes, such as video, music, and titling. The invention provides a way of loading such processes into such memory so as to approximate optimally the desired synchronized production when such processes are played. The invention also provides a way of parameterizing the play of the video process to the play of the music process.
TL;DR: The design, and performance analysis, of a new, highly efficient, synchronization mechanism called “ Static Barrier M IM D” or “ SBM” is given, designed to facilitate static code scheduling for eliminating some synchroniza tions.
Abstract: In this paper, we give the design, and performance analysis, o f a new, highly efficient, synchronization mechanism called “ Static Barrier M IM D” or “ SBM .” Unlike traditional barrier synchronization, the proposed barriers are designed to facili tate the use of static (compile-time) code scheduling for eliminating some synchroniza tions. For this reason, our barrier hardware is more general than most hardware barrier mechanisms, allowing any subset of the processors to participate in each barrier. Since code scheduling typically operates on fine-grain parallelism, it is also vital that barriers be able to execute in a small number of clock ticks. The SBM is actually only one of two new classes of barrier machines proposed to facilitate static code scheduling; the other architecture is the “ Dynamic Barrier MIMD,” or “ DBM ,” which is described in a companion paper1. The DBM differs from the SBM in that the DBM employs more complex hardware to make the system less dependent on the precision of the static analysis and code scheduling; for example, an SBM cannot efficiently manage simultaneous execution of independent parallel pro grams, whereas a DBM can.
TL;DR: In this paper, a distributed synchronization method for a wireless fast packet communication system (100) is disclosed, which provides for the combination of both voice and data in a single switch using a common packet structure.
Abstract: A distributed synchronization method for a wireless fast packet communication system (100) is disclosed. The distributed synchronization method, according to the invention, provides for the combination of both voice and data in a single switch using a common packet structure. It allows for the dynamic synchronization of packets. This includes not only bandwidth within the voice or data areas of the frame, but also between the voice and data portions.
TL;DR: This article presents a system which computes Grobner bases on a shared memory multiprocessor with details of an implementation on a 16 processors Encore machine and results of tests performed with well-known examples of the literature.
Abstract: This article presents a system which computes Grobner bases on a shared memory multiprocessor. The basic idea is that each processor picks an element in the set of unreduced critical pairs, reduces the S-polynomial associated with it and updates the basis and the set of pairs according to the result. The originality of this algorithm relies on the small amount of synchronization it requires among the processes. The details of an implementation on a 16 processors Encore machine are given together with results of tests performed with well-known examples of the literature.
TL;DR: In this paper, a real-time synchronization of radio signal data is achieved by a feedback loop between a clearinghouse generating data and a radio station transmitting data by inserting padding data between certain data packets to shift data packet position within a data stream transmitted at a predetermined transmission rate.
Abstract: Real time synchronization of radio signal data is achieved by a feedback loop between a clearinghouse generating data and a radio station transmitting data. Transmission time is adjusted in response to detected timing errors by selectively inserting padding data between certain data packets to shift data packet position within a data stream transmitted at a predetermined transmission rate. Data flow control is achieved by use of variable length flags delimiting data packets whereby longer flags result in less data flow and shorter flags result in greater data flow.
TL;DR: Three, more efficient, strong semaphore solutions are proposed in this paper, based on the main theorem of the paper, the Deferred Bus theorem, and one of them is an extension to the existing Burns' algorithm.
Abstract: Predictability is of paramount concern for hard real-time systems. In one approach to pre- dictability, every aspect of a real-time system and every primitive provided by the underlying operating system must be bounded and predictable in order to achieve overall predictability. In this paper, we describe several concurrency control synchronization mechanisms developed for a next generation multiprocessor real-time kernel, the Spring Kernel. The important features of these mechanisms include semaphore support for mutual exclusion with linear waiting and bounded resource usage, termed strong semaphores. Three, more efficient, strong semaphore solutions are proposed in this paper. Two of them are based on the main theorem of the paper, the deferred bus theorem. These two solutions can either be implemented in hardware or software. The third solution, a pure software solution, is an extension to the existing burns'' algorithm. A performance comparison and a complexity analysis in terms of time, space and bus traffic are presented.
TL;DR: Tango is a software-based multiprocessor simulator that can generate traces of synchronization events and data references and offers flexible and accurate tracing by allowing the user to incorporate various memory and synchronization models.
Abstract: Tango is a software-based multiprocessor simulator that can generate traces of synchronization events and data references. The system runs on a uniprocessor and provides a simulated multiprocessor environment. The user code is augmented during compilation to produce a compiled simulation system with optional logging. Tango offers flexible and accurate tracing by allowing the user to incorporate various memory and synchronization models. Tango achieves high efficiency by running compiled user code, by focusing on information that is of specific interest to multiprocessing studies and by allowing the user to select the most efficient memory simulation that is appropriate for a set of experiments.
TL;DR: The authors propose a technique for formally specifying and modeling the temporal composition of multimedia data based on timed Petri nets and the logic of temporal intervals which accomplishes the specification of synchronization requirements for complex structures of temporally related objects.
Abstract: The authors propose a technique for formally specifying and modeling the temporal composition of multimedia data. The proposed model is based on timed Petri nets and the logic of temporal intervals. A strategy based on the inter media timing relationships established by the proposed modeling tool is presented for constructing a database schema to facilitate data storage and retrieval of data elements. An algorithm which allows the retrieval of media elements from the database in a manner which preserves the temporal requirements of the initial specification is proposed. The proposed model accomplishes the specification of synchronization requirements for complex structures of temporally related objects. >