TL;DR: The symbolic model checking technique revealed subtle errors in this protocol, resulting from complex execution sequences that would occur with very low probability in random simulation runs, and an alternative method is developed for avoiding the state explosion in the case of asynchronous control circuits.
Abstract: Finite state models of concurrent systems grow exponentially as the number of components of the system increases. This is known widely as the state explosion problem in automatic verification, and has limited finite state verification methods to small systems. To avoid this problem, a method called symbolic model checking is proposed and studied. This method avoids building a state graph by using Boolean formulas to represent sets and relations. A variety of properties characterized by least and greatest fixed points can be verified purely by manipulations of these formulas using Ordered Binary Decision Diagrams.
Theoretically, a structural class of sequential circuits is demonstrated whose transition relations can be represented by polynomial space OBDDs, though the number of states is exponential. This result is born out by experimental results on example circuits and systems. The most complex of these is the cache consistency protocol of a commercial distributed multiprocessor. The symbolic model checking technique revealed subtle errors in this protocol, resulting from complex execution sequences that would occur with very low probability in random simulation runs.
In order to model the cache protocol, a language was developed for describing sequential circuits and protocols at various levels of abstraction. This language has a synchronous dataflow semantics, but allows nondeterminism and supports interleaving processes with shared variables. A system called SMV can automatically verify programs in this language with respect to temporal logic formulas, using the symbolic model checking technique.
A technique for proving properties of inductively generated classes of finite state systems is also developed. The proof is checked automatically, but requires a user supplied process called a process invariant to act as an inductive hypothesis. An invariant is developed for the distributed cache protocol, allowing properties of systems with an arbitrary number of processors to be proved.
Finally, an alternative method is developed for avoiding the state explosion in the case of asynchronous control circuits. This technique is based on the unfolding of Petri nets, and is used to check for hazards in a distributed mutual exclusion circuit.
TL;DR: Cellular IP is proposed, a new lightweight and robust protocol that is optimized to support local mobility but efficiently interworks with Mobile IP to provide wide area mobility support.
Abstract: This paper describes a new approach to Internet host mobility. We argue that by separating local and wide area mobility, the performance of existing mobile host protocols (e.g. Mobile IP) can be significantly improved. We propose Cellular IP, a new lightweight and robust protocol that is optimized to support local mobility but efficiently interworks with Mobile IP to provide wide area mobility support. Cellular IP shows great benefit in comparison to existing host mobility proposals for environments where mobile hosts migrate frequently, which we argue, will be the rule rather than the exception as Internet wireless access becomes ubiquitous. Cellular IP maintains distributed cache for location management and routing purposes. Distributed paging cache coarsely maintains the position of 'idle' mobile hosts in a service area. Cellular IP uses this paging cache to quickly and efficiently pinpoint 'idle' mobile hosts that wish to engage in 'active' communications. This approach is beneficial because it can accommodate a large number of users attached to the network without overloading the location management system. Distributed routing cache maintains the position of active mobile hosts in the service area and dynamically refreshes the routing state in response to the handoff of active mobile hosts. These distributed location management and routing algorithms lend themselves to a simple and low cost implementation of Internet host mobility requiring no new packet formats, encapsulations or address space allocation beyond what is present in IP.
TL;DR: Reactive NUCA (R-NUCA), a distributed cache design which reacts to the class of each cache access and places blocks at the appropriate location in the cache, is proposed.
Abstract: Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests. At the same time, the growing on-chip communication delay favors core-private caches that replicate data to minimize delays on global wires. Recent hybrid proposals offer lower average latency than conventional designs, but they address the placement requirements of only a subset of the data accessed by the application, require complex lookup and coherence mechanisms that increase latency, or fail to scale to high core counts.In this work, we observe that the cache access patterns of a range of server and scientific workloads can be classified into distinct classes, where each class is amenable to different block placement policies. Based on this observation, we propose Reactive NUCA (R-NUCA), a distributed cache design which reacts to the class of each cache access and places blocks at the appropriate location in the cache. R-NUCA cooperates with the operating system to support intelligent placement, migration, and replication without the overhead of an explicit coherence mechanism for the on-chip last-level cache. In a range of server, scientific, and multiprogrammed workloads, R-NUCA matches the performance of the best cache design for each workload, improving performance by 14% on average over competing designs and by 32% at best, while achieving performance within 5% of an ideal cache design.
TL;DR: In this paper, the authors propose a cache having copies distributed among a plurality of different locations, where the cache stores state information for a session with any of the server devices so that it is accessible to at least one other server device.
Abstract: A network arrangement that employs a cache having copies distributed among a plurality of different locations. The cache stores state information for a session with any of the server devices so that it is accessible to at least one other server device. Using this arrangement, when a client device switches from a connection with a first server device to a connection with a second server device, the second server device can retrieve state information from the cache corresponding to the session between the client device and the first server device. The second server device can then use the retrieved state information to accept a session with the client device.
TL;DR: The design of YouTube video delivery system consists of a “flat” video id space, multiple DNS namespaces reflecting a multi-layered logical organization of video servers, and a 3-tier physical cache hierarchy.
Abstract: We deduce key design features behind the YouTube video delivery system by building a distributed active measurement infrastructure, and collecting and analyzing a large volume of video playback logs, DNS mappings and latency data. We find that the design of YouTube video delivery system consists of three major components: a “flat” video id space, multiple DNS namespaces reflecting a multi-layered logical organization of video servers, and a 3-tier physical cache hierarchy. We also uncover that YouTube employs a set of sophisticated mechanisms to handle video delivery dynamics such as cache misses and load sharing among its distributed cache locations and data centers.