TL;DR: In this paper, an improved system for accessing data within a distributed data storage network (DDSN) is disclosed, in which traffic is routed to individual slice servers within the DDSN in accordance with objective criteria as well as user-defined policies.
Abstract: An improved system for accessing data within a distributed data storage network (“DDSN”) is disclosed. In a system implementing the disclosed invention, traffic is routed to individual slice servers within the DDSN in accordance with objective criteria as well as user-defined policies. In accordance with one aspect of the disclosed invention, when a data segment is written to a DDSN, the segment is divided into multiple data slices, which are simultaneously transmitted to different slice servers. In accordance with another aspect of the disclosed invention, when a data segment is read from a DDSN, a list of slice servers, each containing a data slice that could be used to reconstruct the requested data segment, is assembled, and sorted in accordance with a preference rating assigned to each of the slice servers. Sufficient data slices to reconstruct the data segment are then read in accordance with the preference ranking of the slice servers.
TL;DR: Segment Routing leverages the source routing paradigm and allows to enforce a flow through any topological path while maintaining per-flow state only at the ingress nodes to the SR domain.
Abstract: Segment Routing (SR) leverages the source routing paradigm. A node
steers a packet through an ordered list of instructions, called
segments. A segment can represent any instruction, topological or
service-based. A segment can have a semantic local to an SR node or
global within an SR domain. SR allows to enforce a flow through any
topological path while maintaining per-flow state only at the ingress
nodes to the SR domain. Segment Routing can be directly applied to the
MPLS architecture with no change on the forwarding plane. A segment is
encoded as an MPLS label. An ordered list of segments is encoded as a
stack of labels. The segment to process is on the top of the stack.
Upon completion of a segment, the related label is popped from the
stack. Segment Routing can be applied to the IPv6 architecture, with a
new type of routing header. A segment is encoded as an IPv6 address.
An ordered list of segments is encoded as an ordered list of IPv6
addresses in the routing header. The active segment is indicated by
the Destination Address of the packet. The next active segment is
indicated by a pointer in the new routing header.
TL;DR: It is shown that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data and can be readily applied to segment real imagery and bioinformatic data.
Abstract: In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
TL;DR: In this paper, a system and method for providing efficient data storage is described, which determines whether a data segment has been stored previously in a low latency memory and returns an identifier for the previously stored data segment.
Abstract: A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system determines whether a data segment has been stored previously in a low latency memory. In the event that the data segment is determined to have been stored previously, an identifier for the previously stored data segment is returned.
TL;DR: In this paper, the encoder determines whether the segment is to be a referenced segment or an unreferenced segment, replacing the segment data of each referenced segment with a reference label and storing a reference binding in a persistent segment store.
Abstract: In a coding system, input data within a system is encoded. The input data might include sequences of symbols that repeat in the input data or occur in other input data encoded in the system. The encoding includes determining a target segment size, determining a window size, identifying a fingerprint within a window of symbols at an offset in the input data, determining whether the offset is to be designated as a cut point and segmenting the input data as indicated by the set of cut points. For each segment so identified, the encoder determines whether the segment is to be a referenced segment or an unreferenced segment, replacing the segment data of each referenced segment with a reference label and storing a reference binding in a persistent segment store for each referenced segment, if needed. Hierarchically, the process can be repeated by grouping references into groups, replacing the grouped references with a group label, storing a binding between the grouped references and group label, if one is not already present, and repeating the process. The number of levels of hierarchy can be fixed in advanced or it can be determined from the content encoded.