Patent
Efficiently finding potential duplicate values in data
Kabra Namit,Saillet Yannick +1 more
- 28 Apr 2020
1
TL;DR: In this paper, a method, system and computer program product for finding groups of potential duplicates in attribute values is presented, where each attribute value of the attribute values are converted to a respective set of bigrams.
read more
Abstract: A method, system and computer program product for finding groups of potential duplicates in attribute values. Each attribute value of the attribute values is converted to a respective set of bigrams. All bigrams present in the attribute values may be determined. Bigrams present in the attribute values may be represented as bits. This may result in a bitmap representing the presence of the bigrams in the attribute values. The attribute values may be grouped using bitwise operations on the bitmap, where each group includes attribute values that are determined based on pairwise bigram-based similarity scores. The pairwise bigram-based similarity score reflects the number of common bigrams between two attribute values.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
Automatic identification of relevant software projects for cross project learning
Ripon K. Saha,Mukul R. Prasad +1 more
- 29 Aug 2019
TL;DR: In this paper, the authors proposed a method for accessing features including feature information of one or more candidate target projects and of a subject project, in which the candidate target project and the subject project are software programs.
2
References
Patent
Efficient duplicate detection for machine learning data sets
Leo Parker Dirac,Aleksandr Mikhaylovich Ingerman +1 more
- 12 Dec 2014
TL;DR: In this paper, a machine learning service is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed.
144
Patent
System and method for providing lossless compression of n-gram language models in a real-time decoder
Dimitri Kanevsky,Srinivasa Patibandla Rao +1 more
- 05 Feb 1998
TL;DR: In this paper, the authors present a system for compressing n-gram language models for use in real-time decoding, whereby the size of the model is significantly reduced without increasing the decoding time of the recognizer.
111
Patent
Handling Data Sets
Sebastian Nelke,Martin Oberhofer,Yannick Saillet,Jens Seifert +3 more
- 30 Jun 2011
TL;DR: In this article, a method, system and computer program product provides a first characteristic associated with a first data set and a single data value, and a second characteristic attached with a second data set; and calculates at least one of: 1) the similarity of the first dataset with the second dataset based on the first and second characteristics; and 2) confidence indicating how well the first feature reflects the properties of the second feature.
25
Patent
Systems And Methods For Identifying Potential Duplicate Entries In A Database
Brian Carl Rineer
- 18 Nov 2011
TL;DR: In this article, a system and methods for identifying potential duplicate entries in a database is described, in which a matchcode for a record may be generated by: receiving a character string from the record; determining whether the character string includes a non-essential character substring; if the nonessential character sub-strings is missing from the character strings, then generating the matchcode from the characters string and adding a wildcard character to the match code in place of the missing character substrings; the matchcodes for the plurality of records may be compared to identify matching pairs of matchcodes
13
Patent
Text input system and method
Shailaja Gummadidala,Prima Dona Kurian,Sandeep Yelubolu,Sumit Goswami,Sunil Motaparti +4 more
- 20 Jan 2014
TL;DR: In this paper, a computer-implemented method for inputting text into an electronic device is described, where a virtual keyboard with a plurality of keys is displayed on a display screen and one or more characters are associated with each key.
9
Related Papers (5)
Takashi Suzuki,Katsuhiko Mori,Hiroshi Sato +2 more
- 16 Dec 2010
Felix Beier,Andreas Brodt,Oliver Schiller +2 more
- 09 Jul 2019
Li Zhixu,Yang Qiang,Jun Jiang +2 more
- 11 Nov 2015