Generating Expressive Summaries for Speech and Musical Audio using Self-Similarity Clues
Mustafa Sert,Buyurman Baykal,Adnan Yazici +2 more
- 09 Jul 2006
- pp 941-944
TL;DR: A novel algorithm for structural analysis of audio to detect repetitive patterns that are suitable for content-based audio information retrieval systems, since repetitive patterns can provide valuable information about the content of audio, such as a chorus or a concept.
read more
Abstract: We present a novel algorithm for structural analysis of audio to detect repetitive patterns that are suitable for content-based audio information retrieval systems, since repetitive patterns can provide valuable information about the content of audio, such as a chorus or a concept The Audio Spectrum Flatness (ASF) feature of the MPEG-7 standard, although not having been considered as much as other feature types, has been utilized and evaluated as the underlying feature set Expressive summaries are chosen as the longest patterns by the k-means clustering algorithm Proposed approach is evaluated on a test bed consisting of popular song and speech clips based on the ASF feature The well known Mel Frequency Cepstral Coefficients (MFCCs) are also considered in the experiments for the evaluation of features Experiments show that, all the repetitive patterns and their locations are obtained with the accuracy of 93% and 78% for music and speech, respectively
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Robust and Time-Efficient Fingerprinting Model for Musical Audio
Mustafa Sert,Buyurman Baykal,Adnan Yazici +2 more
- 11 Sep 2006
TL;DR: The audio spectrum flatness (ASF) and the audio signature (AS) features of the MPEG-7 standard are made use, which are new to the audio feature family and have not been considered as much as other feature types.
13
Patent
A system and method for streaming music repair and error concealment
Jonathan Doherty,Kevin Curran,P McKevitt +2 more
- 20 May 2010
TL;DR: In this article, a method for analyzing the self-similarity of audio data is presented, which involves obtaining the audio spectrum envelope data of an audio file to be analyzed; performing a clustering operation on the spectrum envelope to produce a clustered set of data; for a first portion of the clustered data, performing a string matching operation on at least one other portion of clustered data; and based on the results of the string-matching operation, determining the at least part of the cluster data most similar to the first portion, the most similar subset of data.
Combining Structural Analysis and Computer Vision Techniques for Automatic Speech Summarization
Mustafa Sert,Buyurman Baykal,Adnan Yazici +2 more
- 15 Dec 2008
TL;DR: This work transforms a 1-D time-domain speech signal to a 2-D image representation, namely (dis)similarity matrix and detects possible repetitions within the matrix by using proper computer vision techniques and can be generalized as speech-to-speech summarization method, in which summarization results are presented by speech instead of text.
8
Audiotory Movie Summarization by Detecting Scene Changes and Sound Events
Tong Lu,Weng Yangbing,Gongyou Wang +2 more
- 24 Aug 2014
TL;DR: A novel movie audio summarization framework is presented, which consists of three processing levels, namely, low-level audio feature extraction, mid- level audio event detection, and high-level auditory movie summarization.
3
Structural and semantic modeling of audio for content-based querying and browsing
Mustafa Sert,Buyurman Baykal,Adnan Yazici +2 more
- 07 Jun 2006
TL;DR: In this article, the authors integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data, where the clients can express their queries in the form of point, range and k-nearest neighbor, which are particularly significant in the multimedia domain.
2
References
•Journal Article
Iso/iec jtc 1/sc 29
TL;DR: Technologies de l'information — Classement international et comparaison de chaînes de caractères and description du modèle commun et adaptable d'ordre de classement AMENDEMENT 1.
Summarizing popular music via structural similarity analysis
Matthew Cooper,Jonathan Foote +1 more
- 19 Oct 2003
TL;DR: A framework for summarizing digital media based on structural analysis on characterizing the repetitive structure in popular music by combining segments representing the clusters most frequently repeated throughout the piece is presented.
A chorus-section detecting method for musical audio signals
Masataka Goto
- 06 Apr 2003
TL;DR: This method, called RefraiD, can detect all the chorus sections in a song and estimate both ends of each section and can also detect modulated chorus sections by introducing a similarity that enables modulated repetition to be judged correctly.
Visualizing music and audio using self-similarity
Jonathan Foote
- 30 Oct 1999
TL;DR: The acoustic similarity between any two instants of an audio recording is displayed in a 2D representation, allowing identification of structural and rhythmic characteristics, as well as tempo and structure extraction.