Journal Article10.1002/WIDM.36
Similarity measures for sequential data
18
TL;DR: This paper reviews three major classes of similarity measures: edit distances, bag‐of‐word models, and string kernels, and presents these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications.
read more
Abstract: Expressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag-of-word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 296–304 DOI: 10.1002/widm.36
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Eleven quick tips for data cleaning and feature engineering
TL;DR: In this article , the authors propose quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls, which can more in general be applied to any scientific area.
24
Pair-Activity Analysis From Video Using Qualitative Trajectory Calculus
TL;DR: A novel and robust qualitative method, which can be used for both the classification and the clustering of pair-activities, and a comprehensive video data set of fish behaviors, collected from lab-based experiments.
Trace-based contextual recommendations
TL;DR: This paper describes how interaction traces allow the building of contextual recommendations using a Trace-Based Reasoning approach and validate this approach by proposing a variation to the classical accuracy definition, which is called "acceptance rate".
10
Similarity Measures to Compare Episodes in Modeled Traces
Raafat Zarka,Raafat Zarka,Amélie Cordier,Elöd Egyed-Zsigmond,Elöd Egyed-Zsigmond,Luc Lamontagne,Alain Mille +6 more
- 08 Jul 2013
TL;DR: This paper relies on the definition of a similarity measure for comparing elements of episodes, combined with the implementation of the Smith-Waterman Algorithm for comparison of episodes to offer quite satisfactory comparison quality and response time.
9
A weighted string kernel for protein fold recognition.
Saghi Nojoomi,Patrice Koehl +1 more
TL;DR: Improvements to SeqKernel are proposed that develop a weighted version of the kernel and expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences, which provides a framework for extracting sequence information from structure.
References
A general method applicable to the search for similarities in the amino acid sequence of two proteins
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
13.2K
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
Identification of common molecular subsequences.
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
11.3K