Unicode collation algorithm

Topic Tools

Papers

Book•

PAN localization : a study on collation of languages from developing Asia

[...]

Sarmad Hussain, Nadir Durrani

1 Jan 2008

9 citations

i;unicode-casemap - Simple Unicode Collation Algorithm

[...]

Mark R. Crispin

1 Oct 2007

TL;DR: This document describes "i;unicode-casemap", a simple case-insensitive collation for Unicode strings that provides equality, substring, and ordering operations.

...read moreread less

Abstract: This document describes "i;unicode-casemap", a simple case-insensitive collation for Unicode strings. It provides equality, substring, and ordering operations. [STANDARDS-TRACK]

...read moreread less

5 citations

Creating Order out of Character Chaos: Collation Capabilities of the SAS System

[...]

Scott Mebust, Michael Bridgers, Sas Presents

1 Jan 2007

TL;DR: This paper describes the four collation capabilities offered by PROC SORT in SAS and further detail the applicability, the advantages, the processing requirements, and the processing implications of each approach.

...read moreread less

Abstract: Traditionally, data is ordered to facilitate further processing or to enable you to quickly find information in a report or other form of data presentation. The SAS System’s primary means of achieving an alternative collating sequence has been to specify a translation table (TRANTAB), using the PROC SORT SORTSEQ option, with which PROC SORT can reorder individual characters. SAS® 9.2 extends the SORTSEQ option to enable the specification of an arbitrary encoding for non-native binary collation. SAS 9.2 also extends the SORTSEQ option to enable the specification of linguistic collation, which is useful for presenting data because it produces results that are more intuitive and culturally acceptable. The linguistic collation capability is highly compatible with the Unicode Collation Algorithm and adaptable to user preference using various options. In this paper, we describe the four collation capabilities offered by PROC SORT in SAS. We further detail the applicability, the advantages, the processing requirements, and the processing implications of each approach. We conclude with information regarding the future directions of collation and sorting within the SAS System.

...read moreread less

2 citations

Journal Article•10.21105/JOSS.00021•

pyuca: a Python implementation of the Unicode Collation Algorithm

[...]

James K. Tauber

18 May 2016-The Journal of Open Source Software

1 citations

Proceedings Article•

Implementing Language-Dependent Lexicographic Orders in Scheme

[...]

Jean-Michel Hufflen¹•Institutions (1)

University of Franche-Comté¹

1 Jan 2007

TL;DR: The lexicographical order relations used within dictionar ies are language-dependent, and this work explains how it implemented such orders in Scheme using generators of sorti ng orders.

...read moreread less

Abstract: The lexicographical order relations used within dictionar ies are language-dependent, and we explain how we implemented such orders in Scheme We show how our sorting orders are derived from the Unicode collation algorithm Since the result of a Scheme function can be itself a function, we use generators of sorti ng orders Specifying a sorting order for a new natural language has been made as easy as possible and can be done by a programmer who just has basic knowledge of Scheme We also show how Scheme data structures allow our functions to be programmed efficiently

...read moreread less

Topic Tools

Papers

PAN localization : a study on collation of languages from developing Asia

i;unicode-casemap - Simple Unicode Collation Algorithm

Creating Order out of Character Chaos: Collation Capabilities of the SAS System

pyuca: a Python implementation of the Unicode Collation Algorithm

Implementing Language-Dependent Lexicographic Orders in Scheme

Related Topics (5)

Performance Metrics