Open AccessDissertation
Audio pattern discovery and retrieval
Lei Wang
- 01 Jan 2012
2
TL;DR: This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data and explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise.
read more
Abstract: This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns. Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modeling techniques may suffer from misclassification errors, and the exhaustive search requires high computation cost and can also be affected by channel distortion and speaker variation in audio data. Such limitations motivate me to explore more efficient and robust approaches to automatically detect repeating information in audio data. In this thesis, different unsupervised techniques are examined to analyze music and speech separately. For music, an efficient approach which extends Ukkonon’s suffix tree construction algorithm is proposed to detect repeating segments. For speech data, an iterative merging approach which is based on Acoustic Segment Model (ASM) is proposed to discover recurrent phrases/words in speech. This thesis also explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise. Existing audio pattern retrieval techniques focus only on specific
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Information retrieval
C. J. Van Rijsbergen
- 01 Jan 1975
TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.
822
References
A tutorial on hidden Markov models and selected applications in speech recognition
Lawrence R. Rabiner
- 01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Data clustering: a review
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
•Book
Modern Information Retrieval
Ricardo Baeza-Yates,Berthier Ribeiro-Neto +1 more
- 15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Dynamic programming algorithm optimization for spoken word recognition
TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
6.7K
Statistical Learning by 8-Month-Old Infants
TL;DR: The present study shows that a fundamental task of language acquisition, segmentation of words from fluent speech, can be accomplished by 8-month-old infants based solely on the statistical relationships between neighboring speech sounds.
5.3K
Related Papers (5)
Wen Li
- 01 Jan 2014
Amir Shahroudy
- 01 Jan 2016
Cong. Geng
- 01 Jan 2012
Junlin Hu
- 01 Jan 2018
Xiaozhou. Yu
- 01 Jan 2007