About: High-availability cluster is a research topic. Over the lifetime, 259 publications have been published within this topic receiving 6511 citations.
TL;DR: In this paper, the authors propose to assign failover priorities to virtual servers in a cluster of two or more autonomous server nodes, where each virtual server has one or more virtual IP addresses and load balancing can be provided by distributing virtual servers from a failed node to multiple different nodes.
Abstract: Systems and methods, including computer program products, providing high-availability in server systems. In one implementation, a server system is cluster of two or more autonomous server nodes, each running one or more virtual servers. When a node fails, its virtual servers are migrated to one or more other nodes. Connectivity between nodes and clients is based on virtual IP addresses, where each virtual server has one or more virtual IP addresses. Virtual servers can be assigned failover priorities, and, in failover, higher priority virtual servers can be migrated before lower priority ones. Load balancing can be provided by distributing virtual servers from a failed node to multiple different nodes. When a port within a node fails, the node can reassign virtual IP addresses from the failed port to other ports on the node until no good ports remain and only then migrate virtual servers to another node or nodes.
TL;DR: In this paper, a failover recovery approach encapsulates the knowledge of failureover recovery between components within a storage server and between storage server systems, including information about what components are participating in a Failover Set, how they are configured for failover, what is the Fail-Stop policy, and what are the steps to perform when "failing-over" a component.
Abstract: Failover processing in storage server system utilizes policies for managing fault tolerance (FT) and high availability (HA) configurations. The approach encapsulates the knowledge of failover recovery between components within a storage server and between storage server systems. This knowledge includes information about what components are participating in a Failover Set, how they are configured for failover, what is the Fail-Stop policy, and what are the steps to perform when “failing-over” a component.
TL;DR: The “Wiredville” story illustrates some of the finer points that motivated the work in the Transis project, a large-scale multicast service designed with the following goals:
Abstract: In the local elections system of the municipality of “Wiredville”,1 several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which replicated the updates to all of the other computers. Whenever the current tally was desired, any computer could be used to supply an up-to-the-moment count. On the night of an important election, a room with one of the computers became crowded with lobbyists and politicians. Unexpectedly, someone accidentally stepped on the network wire, cutting communication between two parts of the network. The vote counting stopped until the network was repaired, and the entire tally had to be restarted from scratch. This would not have happened if the vote-counting system had been built with partitions in mind. After the unexpected severance, vote counting could have continued at all the computers, and merged appropriately when the network was repaired. The “Wiredville” story illustrates some of the finer points that motivated our work in the Transis project [1], a large-scale multicast service designed with the following goals:
TL;DR: Transis as mentioned in this paper is a large-scale multicast service designed with the following goals: large scale multicast support for large scale elections, large scale election counting, and the ability to send votes to all the computers in the network.
Abstract: In the local elections system of the municipality of “Wiredville”,1 several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which replicated the updates to all of the other computers. Whenever the current tally was desired, any computer could be used to supply an up-to-the-moment count. On the night of an important election, a room with one of the computers became crowded with lobbyists and politicians. Unexpectedly, someone accidentally stepped on the network wire, cutting communication between two parts of the network. The vote counting stopped until the network was repaired, and the entire tally had to be restarted from scratch. This would not have happened if the vote-counting system had been built with partitions in mind. After the unexpected severance, vote counting could have continued at all the computers, and merged appropriately when the network was repaired. The “Wiredville” story illustrates some of the finer points that motivated our work in the Transis project [1], a large-scale multicast service designed with the following goals:
TL;DR: In this paper, an application deployment model for enterprise applications enables such applications to be deployed to and executed from a globally distributed computing platform, such as an edge server in an Internet content delivery network (CDN).
Abstract: An application deployment model for enterprise applications enables such applications to be deployed to and executed from a globally distributed computing platform, such as an edge server in an Internet content delivery network (CDN). In a representative embodiment, a CDN edge server supports application server code that executes a Web tier and/or Enterprise tier component of a given Java-based application. When multiple instances of the application server code are executed, given resources (e.g., memory, CPU, disk and network I/O) are monitored, and the application server instances are terminated or rate-limited to prevent over-utilization by any particular instance. In addition, a given application running in a given application server instance is restricted from taking certain actions, e.g., reading or writing from a file system, so that it cannot interfere with or access data from another customer's application.