Shingling

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1007/S13174-010-0003-X•

Web graph similarity for anomaly detection

[...]

Panagiotis Papadimitriou¹, Ali Dasdan², Hector Garcia-Molina¹•Institutions (2)

Stanford University¹, Yahoo!²

25 Feb 2010-Journal of Internet Services and Applications

TL;DR: This paper empirically evaluate and compare all five similarity schemes, adapted from existing graph similarity measures, and adapted from well-known document and vector similarity methods (namely, the shingling method and random projection based method).

...read moreread less

Abstract: Web graphs are approximate snapshots of the web, created by search engines. They are essential to monitor the evolution of the web and to compute global properties like PageRank values of web pages. Their continuous monitoring requires a notion of graph similarity to help measure the amount and significance of changes in the evolving web. As a result, these measurements provide means to validate how well search engines acquire content from the web. In this paper, we propose five similarity schemes: three of them we adapted from existing graph similarity measures, and two we adapted from well-known document and vector similarity methods (namely, the shingling method and random projection based method). We empirically evaluate and compare all five schemes using a sequence of web graphs from Yahoo!, and study if the schemes can identify anomalies that may occur due to hardware or other problems.

...read moreread less

267 citations

Journal Article•10.1109/TASL.2008.925883•

Analysis of Minimum Distances in High-Dimensional Musical Spaces

[...]

Michael A. Casey¹, Christophe Rhodes¹, Malcolm Slaney²•Institutions (2)

University of London¹, Yahoo!²

01 Jul 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems and compatible with locality-sensitive hashing-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations.

...read moreread less

Abstract: We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.

...read moreread less

120 citations

Journal Article•10.1186/S12920-017-0279-9•

Secure approximation of edit distance on genomic data.

[...]

Momin Al Aziz¹, Dima Alhadidi², Noman Mohammed¹•Institutions (2)

University of Manitoba¹, Zayed University²

26 Jul 2017-BMC Medical Genomics

TL;DR: This paper proposes two different approximation methods to securely compute the edit distance among genomic sequences and uses shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods.

...read moreread less

Abstract: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets.

...read moreread less

50 citations

Journal Article•10.1016/J.EGYPRO.2018.09.010•

Shingling Technology For Cell Interconnection: Technological Aspects And Process Integration

[...]

Diego Tonini¹, Giorgio Cellere¹, Matteo Bertazzo¹, A. Fecchio¹, L. Cerasti¹, Marco Galiazzo¹ - Show less +2 more•Institutions (1)

Applied Materials¹

01 Sep 2018-Energy Procedia

TL;DR: This work demonstrates the performance gain obtained with shingling interconnection technology in terms of module output power and efficiency and describes the technological challenges for each step in theShingling assembly process flow.

...read moreread less

46 citations

Patent•

Card shingling machine and method

[...]

Harlan L Krinke

12 Aug 1964

24 citations

...

Expand

Year	Papers
2022	1
2021	2
2020	1
2019	5
2018	6
2017	1

Topic Tools

Papers published on a yearly basis

Papers

Web graph similarity for anomaly detection

Analysis of Minimum Distances in High-Dimensional Musical Spaces

Secure approximation of edit distance on genomic data.

Shingling Technology For Cell Interconnection: Technological Aspects And Process Integration

Card shingling machine and method

Related Topics (5)

Performance Metrics