TL;DR: An approach based on a modular state-transition representation of a parallel system called the stochastic automata network (SAN) is developed, which is automatically derived using tensor algebra operators, under a format which involves a very limited storage cost.
Abstract: A methodology for modeling a system composed of parallel activities with synchronization points is proposed. Specifically, an approach based on a modular state-transition representation of a parallel system called the stochastic automata network (SAN) is developed. The state-space explosion is handled by a decomposition technique. The dynamic behavior of the algorithm is analyzed under Markovian assumptions. The transition matrix of the chain is automatically derived using tensor algebra operators, under a format which involves a very limited storage cost. >
TL;DR: It is specified that the provision of a synchronization function be performed within a packet switched network, and, accordingly, a two-level communication architecture is presented.
Abstract: Protocols to provide synchronization of data elements with arbitrary temporal relationships of both stream and non-stream broadband traffic types are proposed. It is specified that the provision of a synchronization function be performed within a packet switched network, and, accordingly, a two-level communication architecture is presented. The lower level, called the network synchronization protocol (NSP), provides the ability to establish and maintain individual connections with specified synchronization characteristics. The upper level, the application synchronization protocol (ASP), supports an integrated synchronization service for multimedia applications. The ASP identifies the temporal relationships among an application's data objects and manages the synchronization of arriving data for playout. The proposed NSP and ASP are mapped to the session and application layers of the open-systems-interconnection (OSI) reference model, respectively. >
TL;DR: In this paper, the authors discuss the parallel implementation of the auction algorithm for the classical assignment problem and explore computationally the tradeoffs involved in using asynchronism to reduce the synchronization penalty.
Abstract: In this paper we discuss the parallel implementation of the auction algorithm for the classical assignment problem. We show that the algorithm admits a totally asynchronous implementation and we consider several implementations on a shared memory machine, with varying degrees of synchronization. We also discuss and explore computationally the tradeoffs involved in using asynchronism to reduce the synchronization penalty.
TL;DR: This work considers iterative algorithms of the form x := f ( x ), executed by a parallel or distributed computing system, and considers synchronous executions of such iterations and study their communication requirements, as well as issues related to processor synchronization.
TL;DR: Reader-writer locks that similarly exploit locality to achieve scalability are presented, with variants for reader preference, writer preference, and reader-writer fairness.
Abstract: Reader-writer synchronization relaxes the constraints of mutual exclusion to permit more than one process to inspect a shared object concurrently, as long as none of them changes its value. On uniprocessors, mutual exclusion and readerwriter locks are typically designed to de-schedule blocked processes; however, on shared-memory multiprocessors it is often advantageous to have processes busy wait. Unfortunately, implementations of busy-wait locks on sharedmemory multiprocessors typically cause memory and network contention that degrades performance. Several researchers have shown how to implement scalable mutual exclusion locks that exploit locality in the memory hierarchies of shared-memory multiprocessors to eliminate contention for memory and for the processor-memory interconnect. In this paper we present reader-writer locks that similarly exploit locality to achieve scalability, with variants for reader preference, writer preference, and reader-writer fairness. Performance results on a BBN TC2000 multiprocessor demonstrate that our algorithms provide low latency and excellent scalability.
TL;DR: Fast, simple algorithms for contention-free mutual exclusion, reader-writer control, and barrier synchronization are presented, based on widely available fetch-and-@ instructions, that exploit local access to shared memory to avoid contention.
Abstract: Conventional wisdom holds that contention due to busy-wait synchronization is a major obstacle to scalability and acceptable performance in large shared-memory multiprocessors. We argue the contrary, and present fast, simple algorithms for contention-free mutual exclusion, reader-writer control, and barrier synchronization. These algorithms, based on widely available fetch-and-@ instructions, exploit local access to shared memory to avoid contention. We compare our algorithms to previous approaches in both qualitative and quantitative terms, presenting their performance on the Sequent Symmetry and BBN Butterfly multiprocessors. Our results highlight the importance of local access to shared memory, provide a case against the construction of so-called "dance hall" machines, and suggest that special-purpose hardware support for synchronization is unlikely to be cost effective on machines with sequentially consistent memory.
TL;DR: In this article, a language-based approach to deterministic execution debugging of concurrent Ada programs is presented, where synchronization (SYN)-sequences of a concurrent Ada program in terms of Ada language constructs are defined without the need for system-dependent debugging tools.
Abstract: A language-based approach to deterministic execution debugging of concurrent Ada programs is presented. The approach is to define synchronization (SYN)-sequences of a concurrent Ada program in terms of Ada language constructs and to replay such SYN-sequences without the need for system-dependent debugging tools. It is shown how to define a SYN-sequence of a concurrent Ada program in order to provide sufficient information for deterministic execution. It is also shown how to transform a concurrent Ada program P so that the SYN-sequences of previous executions of P can be replayed. This transformation adds an Ada task to P that controls program execution by synchronizing with the original tasks in P. A brief description is given of the implementation of tools supporting deterministic execution debugging of concurrent Ada programs. >
TL;DR: The Explicit Token Store (ETS) as mentioned in this paper is a general purpose dataflow multiprocessor based on the TaggedToken Dataflow Architecture, which was proposed by Papadopoulos et al. The ETS architecture achieves the power of previous tagged-token dataflow architectures with a much leaner cycle and much less complexity.
Abstract: Dataflow is one of the major models of parallel computation. Implementation of a General Purpose Dataflow Multiprocessor extends work in this area by introducing an unusually simple model of dynamic dataflow execution, called the Explicit Token Store (ETS) architecture, and its realization in Monsoon, a large-scale dataflow multiprocessor. Monsoon is currently under construction at the Motorola Microcomputer Division. Papadopoulos argues that the underlying sequential architecture of contemporary multiprocessors has not been able to support the synchronization demands of parallel execution and that these systems have largely failed to meet expectations for programmability and performance. He points out that processors must be fundamentally changed to execute a parallel machine language that coordinates parallel activities efficiently as instructions are scheduled. Although dataflow architectures have met this challenge by radically reformulating the basic specification of a machine program, they have suffered from substantial implementation shortcomings, notable the need for large associative memories. The ETS architecture Papadopoulos introduces here achieves the power of previous tagged-token dataflow architectures, but with a much leaner cycle and much less complexity. Gregory Papadopoulos is an Assistant Professor of Electrical Engineering and Computer Science in the Laboratory for Computer Science at MIT. Contents: General Purpose Multiprocessing. The TaggedToken Dataflow Architecture. The Explicit Token Store. Compiling for an ETS Dataflow Processor. Compiling Imperative Languages for an ETS. Monsoon: An ETS Multiprocessor. A Monsoon Instruction Decoding.
TL;DR: In this article, the authors consider the use of massively parallel architectures to execute discrete-event simulations of self-initiating models and derive upper and lower bounds on optimal performance.
Abstract: This paper considers the use of massively parallel architectures to execute discrete-event simulations of what we term “self-initiating” models. A logical process in a self-initiating model schedules its own state reevaluation times, independently of any other logical process, and sends its new state to other logical processes following the reevaluation. Our interest is in the effects of that communication on synchronization. Using a model that idealizes the communication topology of a simulation, we consider the performance of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new consevative protocol. Our analysis of Time Warp includes some of the overhead costs of state saving and rollback; the effects of propogating rollbacks are ignored. The analysis points out sufficient conditions for the conservitive protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fanout, lookahead ability, and the probability distributions underlying the simulation.
TL;DR: The CWI Multimedia Interchange Format (CMIF) as mentioned in this paper is a document structure for describing transportable, dynamic multimedia documents, which is used to describe the temporal and structural relationships that exist in multimedia documents.
Abstract: This paper presents a document structure for describing transportable, dynamic multimedia documents. Multimedia documents consist of a set of discrete data components that are joined together in time and space to present a user (or reader) with a single coordinated whole. Transportable documents are those in which the document structure can be accessed across system environments independently of individual component input or output dependencies; dynamic documents are those in which the synchronization of document components are not staticly defined as an integral part of the data definition but are dynamicly defined as attributes of the general document structure. The focus of this paper is the presentation of the basic building blocks of the CWI Multimedia Interchange Format (CMIF). CMIF is used to describe the temporal and structural relationships that exist in multimedia documents. In order to put our work in a concrete context, we start our discussion with a brief description of the portability requirements for documents used within the CWI/Multimedia Pipeline. We then provide a layered description of our document structure format; this format provides a means for expressing a document in terms of synchronization channels, event descriptors, data descriptors, data blocks and synchronization arcs, each element of which contains a set of appropriate descriptive attributes. The paper describes each of these concepts abstractly as well as in the context of a uniform example. The paper concludes with a discussion of our intended future direction in using the various attribute descriptors to control a broad range of activities within the CWI/Multimedia Pipeline.
TL;DR: In this paper, a simple algorithm for phase synchronization is proposed, where a counter variable c is initially 0; c is incremented by 1 whenever a process completes a phase; a process begins its phase (k + 1) only if c ≥ k × N, where N is the number of processes.
Abstract: Assume that the processes communicate through shared variables; contentions for access (read or write) to a shared variable by different processes are resolved arbitrarily but fairly (i.e., any process attempting to read/write a shared variable will do so eventually). Nothing may be assumed about the initial values of the shared variables. In the absence of this requirement, the following simple algorithm suffices: A counter variable c is initially 0; c is incremented by 1 whenever a process completes a phase; a process begins its phase (k + 1) only if c ≥ k × N , where N is the number of processes. One of the applications of phase synchronization is to initialize the variables of a multiprocess system before any variable is read, where different processes initialize different portions of the shared store. Here, initialization may be thought of as the first phase and regular computation as the second phase. In order to solve such problems, we assume nothing about the initial values of shared variables. Phase synchronization arises in a variety of problems (in addition to the shared store initialization problem described above). It is a basic paradigm for constructing synchronous systems out of asynchronous components: A PRAM, for instance, consists of processes that read a common store, compute, and write the common store in one step; steps are synchronized in the sense that no process begins its next step until all processes have completed their current step. Another application of phase synchronization is to abort a computation if a process detects a condition under which the computation should be aborted; it simply does not complete its current phase, thus preventing the remaining processes from starting their next phase. It is easy to take global snapshots [1] or system
TL;DR: Synchronization of ABD networks assertional verification distributed infimum approximation garbage collection and distributed infMax garbage collection are verified.
Abstract: Synchronization of ABD networks assertional verification distributed infimum approximation garbage collection.
TL;DR: This work introduces a new class of networks called counting networks, i.e., networks that can be used to count, and provides coordination algorithms that avoid the sequential bottlenecks inherent to former solutions, and have subst ant of lower contention.
Abstract: Many fundamental multi-processor coordination problems can be expressed aa counting problems: processes must, cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention. Motivated by observations on the behavior of sorting networks, we offer a completely new approach to solving such problems. We introduce a new class of networks called counting networks, i.e., networks that can be used to count. We give a counting network construction of depth Iogz n using n log2 n “gates, ” Based on this construction, we provide coordination algorithms that avoid the sequential bottlenecks inherent to former solutions, and have subst ant i all y lower contention. Finally, to show that counting networks are *Carnegie Mellon University. t D&taf Equipment Corporation, Cambridge Research Lab. i MIT Lab. for Computer Science. Supported by ONR contract NOOO14-91-J-1O46, NSF grant CCR-S915206, DARPA contract NOO014-89-J-198S, and by a Rothschild postdoctoral fellowship. A large part of this work was performed while the author was at IBM’s Almaden Research Center. not merely mathematical creatures, we provide experimental evidence that they outperform conventional synchronization techniques under a variety of circumstances.
TL;DR: In this article, a hardware configuration definition program (HCD) builds I/O definition files (IODFs), each IODF containing at least one I/Os configuration definition, each IO configuration definition has a hardware token for identification.
Abstract: A data processing I/O system having a main storage for storing data including a software configuration definition and data processing instructions arranged in programs including an operating system, a storage device for storing I/O definition files including hardware configuration information, a processor controller for containing the hardware configuration information, and a hardware storage area (HSA) connected to the processor controller for storing a hardware configuration definition. A hardware configuration definition program (HCD) builds I/O definition files (IODFs), each IODF containing at least one I/O processor configuration definition. Each processor I/O configuration definition has a hardware token for identification. The hardware configuration information for an I/O processor configuration definition, along with a copy of its hardware token, is transferred to the processor controller by an I/O configuration program (IOCP), and a hardware configuration definition is established in the HSA. The copy of the hardware token may be fetched from the HSA and compared to hardware token of the configuration definition used to establish the software configuration definition in the main storage to determine that the software and hardware configuration definitions are synchronized. If the software and hardware configuration definitions are synchronized, dynamic changes may be made to the hardware configuration definition in the HSA. A program parameter is provided to store recovery information such that if a failure occurs during a dynamic change, the previous hardware I/O configuration may be recovered or subsequent changes can be made from the point of failure.
TL;DR: In this paper, a triple modular redundancy computing system including three asynchronously connected processing elements, each having its own memory, a plurality of arbiters cross connecting processor elements for enforcing synchronization for tasks and for voting arbitration on output and without voting for inputs.
Abstract: A triple modular redundancy computing system including three asynchronously connected processing elements, each having its own memory, a plurality of arbiters cross connecting processor elements for enforcing synchronization for tasks and for voting arbitration on output and without voting for inputs.
TL;DR: A technique for applying data flow analysis to concurrent programs that use the rendezvous model of inter-task communication in Ada, Distributed Processes and CSP is described and how the resulting information can be employed to detect anomalies in concurrent programs.
Abstract: Because of the complex communication patterns supported in concurrent systems, it is extremely difficult for developers to understand and reason about these systems. Thus, it is important that automated analysis techniques be developed to help detect problems and assist in software understanding for these systems. There has been considerable research on various analysis techniques for concurrent systems, including static analysis techniques [ADW89, MR90, McD89, SC88, T080, Tay83b], dynamic analysis techniques [CT91, HL85, RL89, Tai86], and hybrid techniques [Di188, HK88, YT88, YTFB89]. Data flow analysis is a well-recognized, static analysis technique that has been successfully used on sequential systems to support program optimization, static type checking, and anomaly detection. In addition, there has been considerable research on efficient algorithms for implementing intraprocedural and interprocedural data flow analysis techniques. In this paper we describe a technique for applying data flow analysis to concurrent programs that use the rendezvous model of inter-task communication. Such languages include Ada [Ref83], Distributed Processes [BH78] and CSP [Hoa78]. We also show how the resulting information can be employed to detect anomalies in concurrent programs. One of the major benefits of applying data flow analysis for anomaly detection is that it can discover in-
TL;DR: A parallelized Petri-net simulator which has been implemented on an Intel iPSC/2 distributed memory multiprocessor is discussed, and a graphics-based front-end for the simulator, used to build timed petri-nets, is described.
Abstract: The authors consider the problem of using a parallel computer to execute discrete-event simulation of timed Petri-nets. They first develop synchronization and simulation algorithms for this task, and discuss a parallelized Petri-net simulator which has been implemented on an Intel iPSC/2 distributed memory multiprocessor. A graphics-based front-end for the simulator, used to build timed Petri-net models, is described. Empirical studies of the simulator's performance on a variety of timed Petri-net models are described. >
TL;DR: A discussion is presented of the control and coordination functions that must be associated with communication networks supporting multimedia conferencing systems, determined that the systems need underlying networks with large numbers of connections available to each user.
Abstract: A discussion is presented of the control and coordination functions that must be associated with communication networks supporting multimedia conferencing systems. The authors have determined that the systems need underlying networks with large numbers of connections available to each user, direct support for message multicasting, and synchronization of transmissions over associated links. Furthermore, each of these capabilities needs associated, user-accessible control functions. >
TL;DR: A general synchronization model for the description of presentation sequences of Multimedia Objects is introduced and is applied to the Open Document Architecture (ODA) Standard and ODA Extensions are defined to integrate temporal relationships into ODA.
Abstract: The presentation of Multimedia Objects requires simultaneous and/or sequential presentation of several representation types (text, graphics, images, audio and video sequences). Therefore the presentation has to he structured and the temporal relations of different actions have to be described. The temporal relations are realized by applying synchronization mechanisms. In this paper a general synchronization model for the description of presentation sequences of Multimedia Objects is introduced. This model is applied to the Open Document Architecture (ODA) Standard and ODA Extensions are defined to integrate temporal relationships into ODA.
TL;DR: In this paper, a communications system receiver (100) is disclosed which receives a transmitted signal over a radio channel, including a demodulator and a stored replica (207) of the predetermined synchronization sequence.
Abstract: A communications system receiver (100) is disclosed which receives a transmitted signal over a radio channel. The transmitted signal includes data and a predetermined synchronization sequence. The receiver (100) includes a demodulator (215), and a stored replica (207) of the predetermined synchronization sequence. The receiver (100) further includes apparatus (102) for computing (308) a reconstructed signal, by using the channel impulse response characteristic and the stored replica (207) of the synchronization sequence. A feature of the invention is to estimate (310) a phase offset value between an incoming signal and the reconstructed signal, for a plurality of synchronization symbols. This serves to establish (315) a relationship between the phase offset and a synchronization symbol index. The receiver then employs this relationship to derive (317) at least one "previous" phase state (214) for initializing (314) the demodulator (215).
TL;DR: Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes that allows prototypes to be tested, evolved and finally implemented through refinement techniques targeting specific architectures.
Abstract: This paper presents Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes. Although a shared-memory model is the basis for communication between processes, this memory can be partitioned into shared and private variables. Parallel processes operate on individual copies of private variables, which are independently updated and may be merged into the shared state at specifiable barrier synchronization points. Several examples are given to illustrate how the various parallel programming models, such as synchronous data-parallelism and asynchronous control-parallelism, can be expressed in terms of this foundation. This common foundation allows prototypes to be tested, evolved and finally implemented through refinement techniques targeting specific architectures. >
TL;DR: In this article, a serial data signal, which includes a frame synchronization code constituted by an M number of bits in one frame, is converted by a serial/parallel converting circuit to a parallel data signal of 2M-1 bits.
Abstract: In a frame synchronization circuit, a serial data signal, which includes a frame synchronization code constituted by an M number of bits in one frame, is converted by a serial/parallel converting circuit to a parallel data signal of a 2M-1 number of bits. An M number of pattern detectors of a first synchronization detecting circuit detect the code pattern of the first block of the frame synchronization code from the parallel data signal. A selection signal generating circuit holds outputs of the pattern detectors, and outputs them as a selection signal designating the bit position allotted to the pattern detector which detects the synchronization code pattern. An output of the serial/parallel converting circuit is delayed by a time required for the above-mentioned processing, and supplied to a selector, which selectively outputs an M-bit data signal corresponding to the bit position designated by the selection signal.
TL;DR: Simple and efficient, workingC-language routines for the parallel barrier synchronization and reduction computations are presented and examples of applications for these routines and results of performance testing on the Sequent Balance 21000 computer are presented.
Abstract: The synchronization barrier is a point in the program where the processing elements (PEs) wait until all the PEs have arrived at this point. In a reduction computation, given a commutative and associative binary operationop, one needs to reduce valuesa0,...,aN-1, stored in PEs 0,...,N-1 to a single valuea*=a0op a, op...op aN-1 and then to broadcast the resulta* to all PEs. This computation is often followed by a synchronization barrier. Routines to perform these functions are frequently required in parallel programs. Simple and efficient, workingC-language routines for the parallel barrier synchronization and reduction computations are presented. The codes are appropriate for a CREW (concurrent-read-exclusive-write) or EREW parallel random access shared memory MIMD computer. They require only shared memory read and write; no locks, semaphores etc. are needed. The running time of each of these routines isO(logN). The amount of shared memory required and the number of shared memory accesses generated are botO(N). These are the asymptotically minimum values for the three parameters. The algorithms employ the obvious computational scheme involving a binary tree. Examples of applications for these routines and results of performance testing on the Sequent Balance 21000 computer are presented.
TL;DR: In this article, the SYNC-NET apparatus synchronizes processor nodes of a parallel system over a multi-stage communication network that normally transmits data between nodes as point-to-point communications, broadcast, multi-cast, or multi-sender transfers.
Abstract: A SYNC-NET apparatus synchronizes processor nodes of a parallel system over a multi-stage communication network that normally transmits data between nodes as point-to-point communications, broadcast, multi-cast, or multi-sender transfers. The apparatus performs priority driven arbitration over the network to resolve conflicts amongst multiple processing nodes simultaneously requesting use of the multi-stage network for performing barrier synchronization over the network in relation to the same or different barriers. The apparatus uses a special capability multi-stage network that can support only one barrier synchronization operation at any given time, and which makes it necessary to perform a priority arbitration to determine which barrier synchronization gets performed first, second, and so on. Any number of processor nodes can arbitrate simultaneously for use of the barrier synchronization facilities, and the arbitration will be resolved quickly and consistently by selecting the highest priority requestor. The highest priority requestor uses the apparatus and network facilities to examine the status of eight barriers simultaneously and to determine whether all processing nodes have reached those barriers or not. The priority resolution and barrier status calculation for eight barriers at a time involve both the joint and simultaneous participation by all processor nodes and the multi-stage network in one common operation. All nodes transmit priority in formation or barrier status simultaneously, and all nodes simultaneously monitor the result of the priority and barrier calculations.
TL;DR: In this paper, the synchronization planning and clock distribution for a network of interconnected digital equipment is achieved by designating a network node at the highest stratum level as the master clock node, forming a group of all unassigned nodes connected to the assigned node or nodes, selecting subgroup of all nodes from the group, limiting the subgroup to the nodes which have a desired characteristic, determining the synchronization performance of each node in the sub group according to a predetermined criterion, assigning one node from the sub-group as a clock timing receiver wherein the one node exhibits the best performance
Abstract: Optimized synchronization planning and clock distribution for a network of interconnected digital equipment is achieved by designating a network node at the highest stratum level as the master clock node, forming a group of all unassigned nodes connected to the assigned node or nodes, selecting subgroup of all nodes from the group wherein the subgroup includes all nodes having the highest stratum level of the group, limiting the subgroup to the nodes which have a desired characteristic when such nodes are included in the subgroup, determining the synchronization performance of each node in the subgroup according to a predetermined criterion, assigning one node from the subgroup as a clock timing receiver wherein the one node exhibits the best performance for nodes in the subgroup, and iterating the method at the forming step. In order to obtain an optimum synchronization plan, it is desirable to repeat the entire method described above for the complete set of nodes which are capable of being designated as a master clock node. When more than one node is capable of being considered as a master clock node, the synchronization planning method is then completed by computing the network synchronization perforamnce for each synchronization plan related to a different designated master clock node and choosing the synchronization plan which offers the best network synchronization performance as computed above.
TL;DR: A new synchronization mechanism is proposed, the priority spinlock, that takes into account the priorities of the processes that want to acquire it, and favors high priority processes.
TL;DR: This paper describes an algorithm to size these synchronization queues while permitting the maximum parallelism between the communicating processes (circuits).
Abstract: In synthesizing a circuit from its description in a concurrent programming language, it is necessary to make decisions about how to implement synchronization constructs such as send and receive statements. The semantic model of these constructs is an infinite length FIFO queue that can handle all send events until they are paired up with corresponding receive events. In this paper, we describe an algorithm to size these synchronization queues while permitting the maximum parallelism between the communicating processes (circuits). It is an example of higher level synthesis in that the user does not include an explicit description of the queue in the specification as is necessary in current high level synthesis systems.
TL;DR: A VHDL methodology for the design of synchronous parallel controllers that supports RTL representation and verification, state assignment and decomposition, logic synthesis, and gate-level consistency checking is described.
Abstract: The contribution of this work is a VHDL methodology for the design of synchronous parallel controllers that supports RTL representation and verification, state assignment and decomposition, logic synthesis, and gate-level consistency checking. A simple extension to FSM techniques, based on Petri nets, is used to represent concurrency and check for parallel synchronization errors; the concept of synchronous-safeness is introduced to enable maximum latching of data path units. A synthesizable VHDL template is described in which ASSERTION statements are used to enable the syntactic and semantic correctness of the model to be tested in unison. The method yields more efficient implementations than FSM designs when concurrency forms part of the specification, and in a practical design, a 50% area reduction and 40% speed improvement over the best FSM synthesis were achieved. >
TL;DR: The author describes a method for designing communication protocols which can perform several distinct functions, but are limited to the execution of one function at a time.
Abstract: The author describes a method for designing communication protocols which can perform several distinct functions, but are limited to the execution of one function at a time. The construction of such a protocol consists of two steps: (1) developing a component protocol for each function to be included, and (2) integrating the components into the target protocol. The integration involves the resolution of potential component competition and process synchronization problems. A sufficient condition for the safety of the integrated protocol is also discussed. This design method is simple to use and promotes reuse of existing protocols. The construction of two protocols-the call setup phase of a data link control protocol and a portion of the CCITT's X.21 Recommendation-is demonstrated. >