Audio pattern discovery and retrieval

Open AccessDissertation

Audio pattern discovery and retrieval

- 01 Jan 2012

2

TL;DR: This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data and explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise.

Abstract: This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns. Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modeling techniques may suffer from misclassification errors, and the exhaustive search requires high computation cost and can also be affected by channel distortion and speaker variation in audio data. Such limitations motivate me to explore more efficient and robust approaches to automatically detect repeating information in audio data. In this thesis, different unsupervised techniques are examined to analyze music and speech separately. For music, an efficient approach which extends Ukkonon’s suffix tree construction algorithm is proposed to detect repeating segments. For speech data, an iterative merging approach which is based on Acoustic Segment Model (ASM) is proposed to discover recurrent phrases/words in speech. This thesis also explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise. Existing audio pattern retrieval techniques focus only on specific

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Book

Information retrieval

C. J. Van Rijsbergen

- 01 Jan 1975

TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.

...read moreread less

822

Book Chapter•10.1142/9789812778222_0004

On-line construction of suffix trees

Maxime Crochemore, +1 more

- 01 Sep 2002

472

References

Journal Article•10.1109/5.18626

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

- 01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

24.3K

Journal Article•10.1145/331499.331504

Data clustering: a review

Anil K. Jain, +2 more

- 01 Sep 1999

- ACM Computing Surveys

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.

...read moreread less

15.1K

•Book

Modern Information Retrieval

Ricardo Baeza-Yates, +1 more

- 15 May 1999

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

11.6K

Journal Article•10.1109/TASSP.1978.1163055

Dynamic programming algorithm optimization for spoken word recognition

H. Sakoe, +1 more

- 01 Feb 1978

- IEEE Transactions on Acoustics, Speech, ...

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

...read moreread less

6.7K

Journal Article•10.1126/SCIENCE.274.5294.1926

Statistical Learning by 8-Month-Old Infants

Jenny R. Saffran, +2 more

- 13 Dec 1996

- Science

TL;DR: The present study shows that a fundamental task of language acquisition, segmentation of words from fluent speech, can be accomplished by 8-month-old infants based solely on the statistical relationships between neighboring speech sounds.

...read moreread less

5.3K