Journal Article10.1109/TSA.2005.854103
Audio-based context recognition
Antti Eronen,V.T. Peltonen,J.T. Tuomi,Anssi Klapuri,Seppo Fagerlund,Timo Sorsa,Gaetan Lorho,Jyri Huopaniemi +7 more
484
TL;DR: This paper investigates the feasibility of an audio-based context recognition system developed and compared to the accuracy of human listeners in the same task, with particular emphasis on the computational complexity of the methods.
read more
Abstract: The aim of this paper is to investigate the feasibility of an audio-based context recognition system. Here, context recognition refers to the automatic classification of the context or an environment around a device. A system is developed and compared to the accuracy of human listeners in the same task. Particular emphasis is placed on the computational complexity of the methods, since the application is of particular interest in resource-constrained portable devices. Simplistic low-dimensional feature vectors are evaluated against more standard spectral features. Using discriminative training, competitive recognition accuracies are achieved with very low-order hidden Markov models (1-3 Gaussian components). Slight improvement in recognition accuracy is observed when linear data-driven feature transformations are applied to mel-cepstral features. The recognition rate of the system as a function of the test sequence length appears to converge only after about 30 to 60 s. Some degree of accuracy can be achieved even with less than 1-s test sequence lengths. The average reaction time of the human listeners was 14 s, i.e., somewhat smaller, but of the same order as that of the system. The average recognition accuracy of the system was 58% against 69%, obtained in the listening tests in recognizing between 24 everyday contexts. The accuracies in recognizing six high-level classes were 82% for the system and 88% for the subjects.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Environmental Sound Recognition With Time–Frequency Audio Features
TL;DR: An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
TUT database for acoustic scene classification and sound event detection
Annamaria Mesaros,Toni Heittola,Tuomas Virtanen +2 more
- 01 Aug 2016
TL;DR: The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
667
Detection and Classification of Acoustic Scenes and Events
TL;DR: The state of the art in automatically classifying audio scenes, and automatically detecting and classifyingaudio events is reported on.
Ambient Sound Provides Supervision for Visual Learning
Andrew Owens,Jiajun Wu,Josh H. McDermott,William T. Freeman,William T. Freeman,Antonio Torralba +5 more
- 08 Oct 2016
TL;DR: This work trains a convolutional neural network to predict a statistical summary of the sound associated with a video frame, and shows that this representation is comparable to that of other state-of-the-art unsupervised learning methods.
Acoustic Scene Classification: Classifying environments from the sounds they produce
TL;DR: An account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce, and a range of different algorithms submitted for a data challenge to provide a general and fair benchmark for ASC techniques.
References
•Book
Fundamentals of speech recognition
Lawrence R. Rabiner,Biing-Hwang Juang +1 more
- 01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
9.4K
Fast and robust fixed-point algorithms for independent component analysis
TL;DR: Using maximum entropy approximations of differential entropy, a family of new contrast (objective) functions for ICA enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions.
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more
TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
5.3K
A Survey of Context-Aware Mobile Computing Research
Guanling Chen,David Kotz +1 more
- 01 Nov 2000
TL;DR: This survey of research on context-aware systems and applications looked in depth at the types of context used and models of context information, at systems that support collecting and disseminating context, and at applications that adapt to the changing context.
Construction and evaluation of a robust multifeature speech/music discriminator
Eric D. Scheirer,Malcolm Slaney +1 more
- 21 Apr 1997
TL;DR: A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided.