Topic

Geo-replication

About: Geo-replication is a research topic. Over the lifetime, 39 publications have been published within this topic receiving 12287 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers

Journal Article•10.1145/1165389.945450•

The Google file system

[...]

Sanjay Ghemawat¹, Howard Gobioff¹, Shun-Tak Albert Leung¹•Institutions (1)

Google¹

19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

...read moreread less

6,397 citations

Journal Article•10.1145/1773912.1773922•

Cassandra: a decentralized structured storage system

[...]

Avinash Lakshman¹, Prashant Malik¹•Institutions (1)

Facebook¹

14 Apr 2010-Operating Systems Review

TL;DR: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.

...read moreread less

Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

...read moreread less

3,176 citations

Journal Article•10.14778/1454159.1454167•

PNUTS: Yahoo!'s hosted data serving platform

[...]

Brian F. Cooper¹, Raghu Ramakrishnan¹, Utkarsh Srivastava¹, Adam Silberstein¹, Philip Bohannon¹, Hans-Arno Jacobsen¹, Nick Puz¹, Daniel Weaver¹, Ramana Yerneni¹ - Show less +5 more•Institutions (1)

Yahoo!¹

1 Aug 2008

TL;DR: PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees and utilizes automated load-balancing and failover to reduce operational complexity.

...read moreread less

Abstract: We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

...read moreread less

1,182 citations

Proceedings Article•

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

[...]

Jason Baker¹, Christopher N. Bond¹, James C. Corbett¹, J. J. Furman¹, Andrey Khorlin¹, James Larson¹, Jean-Michel Leon¹, Yawei Li¹, Alexander Lloyd¹, Vadim Yushprakh¹ - Show less +6 more•Institutions (1)

Google¹

1 Jan 2011

TL;DR: Megastore provides fully serializable ACID semantics within ne-grained partitions of data, which allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.

...read moreread less

Abstract: Megastore is a storage system developed to meet the requirements of today’s interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within ne-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore’s semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.

...read moreread less

849 citations

Journal Article•

Conflict-free Replicated Data Types

[...]

Marc Shapiro, Nuno M. Preguia, Carlos Baquero, Marek Zawirski

01 Jan 2011-Lecture Notes in Computer Science

TL;DR: This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions and studies a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and considers in depth the more complex Graph data type.

...read moreread less

Abstract: Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in large-scale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflict-free Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

...read moreread less

752 citations

...

Expand

Performance Metrics

Papers

291

Citations

No. of papers in the topic in previous years
Year	Papers
2021	1
2020	3
2019	2
2018	5
2017	2
2016	5

Geo-replication

Topic Tools

Papers

The Google file system

Cassandra: a decentralized structured storage system

PNUTS: Yahoo!'s hosted data serving platform

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Conflict-free Replicated Data Types

Related Topics (5)

Performance Metrics