An Algorithm for Variable-Length Proper-Name Compression
TL;DR: This paper reviews several proper name matching schemes and provides an updated version of these schemes which tests out nicely on the proper name equivalence classes of a suburban telephone book.
read more
Abstract: Viable on-line search systems require reasonable capabilities to automatically detect (and hopefully correct) variations between request format and stored format An important requirement is the solution of the problem of matching proper names, not only because both input specifications and storage specifications are subject to error, but also because various transliteration schemes exist and can provide variant proper name forms in the same data base This paper reviews several proper name matching schemes and provides an updated version of these schemes which tests out nicely on the proper name equivalence classes of a suburban telephone book An appendix lists the corpus of names used for algorithm test
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Under-reporting of motor vehicle traffic crash victims in new zealand
J. C. Alsop,John Desmond Langley +1 more
TL;DR: Those using these police files for prioritization, resource allocation and evaluation purposes need to be aware of the extent and nature of these biases contained within these databases.
206
A practitioner's guide to data base compression tutorial
TL;DR: A wealth of literature on data compression is reviewed and facts and guidelines are presented which will assist system designers in evaluating the costs and benefits of compression and in selecting techniques appropriate for their needs.
48
Storage Analysis Of A Compression Coding For Document Data Bases
TL;DR: Analysis is made of the effect of using an efficient code for compression of terms within a document data base in terms of the vocabulary length and the values of certain parameters which describe the structure of the code.
29
Use of record linkage techniques to maintain the leicestershire diabetes register
J.D. Langley,J.L. Botha +1 more
TL;DR: Successful use is made of record linkage techniques, including phonetic name matching and a heuristic algorithm to improve reporting of results and to ensure the accuracy and completeness of the register, and much manual checking is thus avoided.
23
Title-Only Entries Retrieved by Use of Truncated Search Keys
TL;DR: An experiment testing utility of truncated search keys as inquiry terms in an on-line system was perform on a file of 16,792 title-only bibliographic entries, yielding eight or fewer entries 99.0% of the time.
References
A program for correcting spelling errors
TL;DR: A program using a simple, heuristic procedure for associating “similar” spellings is able to correct misspelled words using only a vocabulary of properly spelled words.
99
Retrieval of misspelled names in an airlines passenger record system
TL;DR: It is evident that a policg statemerd regarding publishb~9 of p(tpers m~ business subjects would be bellyful, and it is highlg desirable to hate papers on busb~ess attd sei(utt~/ie ttpplications in the same department.
91
A Study of Methods for Systematically Abbreviating English Words and Names
Charles P. Bourne,Donald F. Ford +1 more
TL;DR: This study investigated various techniques for systematically abbreviating English words and names and particular attention was paid to techniques that could process incoming information without prior knowledge of its existence.
50
Bibliographic Retrieval from Bibliographic Input; the Hypothesis and Construction of a Test
TL;DR: A study of problems associated with bibliographic retrieval using unverified input data supplied by requesters using a code derived from compression of title and author information to four, four-character abbreviations each.
Compression word coding techniques for information retrieval
TL;DR: A description and comparison is presented of four compression techniques for word coding having application to information retrieval, with the emphasis on codes useful in creating directories to large data files.