TL;DR: A parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the constant in the running time is small.
Abstract: We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and $O(\log n)$ time; the constant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and $O(\log n)$ time. The constant in the running time is still moderate, though not as small.
TL;DR: In this paper, a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time is given, and the constant in the running time is small.
Abstract: We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the constant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and O(logn) time. The constant in the running time is still moderate, though not as small.
TL;DR: This article argues in favor of a new type of Merge, Parallel Merge, which combines the properties of External Merge and Internal Merge, and shows that a number of otherwise puzzling properties of across-the-board questions follow naturally from such an account.
Abstract: This article argues in favor of a new type of Merge, Parallel Merge, which combines the properties of External Merge and Internal Merge. Parallel Merge creates symmetric, multidominant structures, which become antisymmetric in the course of the derivation. The main empirical goal of the article is to revive a multidominance approach to across-the-board wh-questions and to show that a number of otherwise puzzling properties of across-the-board questions follow naturally from such an account.
TL;DR: In this paper, a method of and an apparatus for merging a sequence of delta files is described, where delta files together define a series of changes between a base file and an updated file, each delta file defining one or more changes in terms of unique tokens each identifying original data or of reused data reused from the immediately preceding delta file or the base file.
Abstract: A method of and an apparatus for merging a sequence of delta files is described The delta files together define a series of changes between a base file and an updated file, each delta file defining one or more changes in terms of one or more unique tokens each identifying original data or of one or more reused tokens identifying data reused from the immediately preceding delta file or the base file The method comprises creating an initial merge structure from the base file and the first delta file in the sequence A further merge structure is created from the initial merge structure and the next delta file in the sequence by comparing tokens in the initial merge structures and replacing reused tokens in the further merge structure with tokens in the initial merge structure The initial merge structure is then replaced with the further merge structure so that the further merge structure becomes the initial merge structure The operations of creating a further merge structure and replacing the initial merge structure with a further merge structure is repeated for all delta files in sequence order The thus created merge structure represents all changes between the base file and the updated file The apparatus, which may comprise a suitably configured computer, comprises means suitable for carrying out the method steps
TL;DR: This paper presents an adaptive segmented sort mechanism on GPUs that shows great improvements over the methods from CUB, CUSP and ModernGPU on NVIDIA K80-Kepler and TitanX-Pascal GPUs and applies it on two applications, i.e., suffix array construction and sparse matrix-matrix multiplication, and obtains obvious gains over state-of-the-art implementations.
Abstract: Segmented sort, as a generalization of classical sort, orders a batch of independent segments in a whole array Along with the wider adoption of manycore processors for HPC and big data applications, segmented sort plays an increasingly important role than sort In this paper, we present an adaptive segmented sort mechanism on GPUs Our mechanisms include two core techniques: (1) a differentiated method for different segment lengths to eliminate the irregularity caused by various workloads and thread divergence; and (2) a register-based sort method to support N-to-M data-thread binding and in-register data communication We also implement a shared memory-based merge method to support non-uniform length chunk merge via multiple warps Our segmented sort mechanism shows great improvements over the methods from CUB, CUSP and ModernGPU on NVIDIA K80-Kepler and TitanX-Pascal GPUs Furthermore, we apply our mechanism on two applications, ie, suffix array construction and sparse matrix-matrix multiplication, and obtain obvious gains over state-of-the-art implementations