Topic

Record linkage

About: Record linkage is a research topic. Over the lifetime, 1560 publications have been published within this topic receiving 45533 citations. The topic is also known as: duplicate detection.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1093/IJE/DYS066•

Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort

[...]

Abigail Fraser¹, Corrie Macdonald-Wallis¹, Kate Tilling¹, Andy Boyd¹, Jean Golding¹, George Davey Smith¹, John Henderson¹, John Macleod¹, Lynn Molloy¹, Andy R Ness¹, S M Ring¹, Scott M. Nelson¹, Debbie A Lawlor¹ - Show less +9 more•Institutions (1)

University of Glasgow¹

01 Feb 2013-International Journal of Epidemiology

TL;DR: The Avon Longitudinal Study of Children and Parents (ALSPAC) was established to understand how genetic and environmental characteristics influence health and development in parents and children.

...read moreread less

Abstract: Summary The Avon Longitudinal Study of Children and Parents (ALSPAC) was established to understand how genetic and environmental characteristics influence health and development in parents and children. All pregnant women resident in a defined area in the South West of England, with an expected date of delivery between 1st April 1991 and 31st December 1992, were eligible and 13 761 women (contributing 13 867 pregnancies) were recruited. These women have been followed over the last 19–22 years and have completed up to 20 questionnaires, have had detailed data abstracted from their medical records and have information on any cancer diagnoses and deaths through record linkage. A follow-up assessment was completed 17–18 years postnatal at which anthropometry, blood pressure, fat, lean and bone mass and carotid intima media thickness were assessed, and a fasting blood sample taken. The second follow-up clinic, which additionally measures cognitive function, physical capability, physical activity (with accelerometer) and wrist bone architecture, is underway and two further assessments with similar measurements will take place over the next 5 years. There is a detailed biobank that includes DNA, with genome-wide data available on >10 000, stored serum and plasma taken repeatedly since pregnancy and other samples; a wide range of data on completed biospecimen assays are available. Details of how to access these data are provided in this cohort profile.

...read moreread less

2,478 citations

Journal Article•10.1109/TKDE.2007.250581•

Duplicate Record Detection: A Survey

[...]

Elmagarmid, Ipeirotis, Verykios

01 Jan 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.

...read moreread less

Abstract: Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area

...read moreread less

2,190 citations

Journal Article•10.1111/J.1467-842X.1999.TB01297.X•

Population-based linkage of health records in Western Australia: development of a health services research linked database.

[...]

C. D'Arcy J. Holman¹, A. John Bass¹, Ian L. Rouse, Michael Hobbs•Institutions (1)

University of Western Australia¹

01 Oct 1999-Australian and New Zealand Journal of Public Health

TL;DR: The Western Australian Health Services Research Linked Database is introduced as infrastructure to support aetlologic, utilisation and outcomes research and to compare the study population, data resources, technical systems and organisational supports with international best practice.

...read moreread less

1,088 citations

The State of Record Linkage and Current Research Problems

[...]

William E. Winkler

1 Jan 1999

TL;DR: This paper provides an overview of methods and systems developed for record linkage based on the formal mathematical model of Fellegi and Sunter, and highlights the work of Larsen and Rubin.

...read moreread less

Abstract: This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin.

...read moreread less

1,060 citations

Patent•

Data storage system with set lists which contain elements associated with parents for defining a logical hierarchy and general record pointers identifying specific data sets

[...]

Duncan Charles Mackay, Babak Ahmadi

15 Sep 1992

TL;DR: In this article, the concept of uniquely identifiable data sets is introduced to eliminate problems normally associated with referencing the location of data after the data has been moved, using the principal idea that a data set is uniquely identifiable.

...read moreread less

Abstract: In a computer having one or more secondary storage devices attached thereto, a Finite Data Environment Processor (FDEP) manages Data Sets residing on the secondary storage devices and in memory using Set Lists (SLs) and General Record Pointers (GRPs). The Data Sets contain either data or logical organizational information. The Set Lists comprise Data Sets organized into a hierarchy by listing a identifier for each of the data sets with a corresponding identifier for the logical parent of that data set. These set lists are also data sets and can be identified as child or parent in a set list. The General Record Pointers identify information in terms of Data Sets and records within them. Using the principal idea that a Data Set is uniquely identifiable, the present invention eliminates problems normally associated with referencing the location of data after the data has been moved.

...read moreread less

984 citations

...

Expand

Performance Metrics

2,061

Papers

12,275

Citations

No. of papers in the topic in previous years
Year	Papers
2026	2
2025	37
2024	98
2023	103
2022	183
2021	74

Record linkage

Topic Tools

Papers published on a yearly basis

Papers

Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort

Duplicate Record Detection: A Survey

Population-based linkage of health records in Western Australia: development of a health services research linked database.

The State of Record Linkage and Current Research Problems

Data storage system with set lists which contain elements associated with parents for defining a logical hierarchy and general record pointers identifying specific data sets

Related Topics (5)

Performance Metrics