Content-Based Multimedia Indexing

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/CBMI.2016.7500246•

Experimenting with musically motivated convolutional neural networks

[...]

Jordi Pons¹, Thomas Lidy², Xavier Serra¹•Institutions (2)

Pompeu Fabra University¹, Vienna University of Technology²

15 Jun 2016

TL;DR: This article explores various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning and proposes several musically motivated architectures.

...read moreread less

Abstract: A common criticism of deep learning relates to the difficulty in understanding the underlying relationships that the neural networks are learning, thus behaving like a black-box. In this article we explore various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning. We first discuss how convolutional filters with different shapes can fit specific musical concepts and based on that we propose several musically motivated architectures. These architectures are then assessed by measuring the accuracy of the deep learning model in the prediction of various music classes using a known dataset of audio recordings of ballroom music. The classes in this dataset have a strong correlation with tempo, what allows assessing if the proposed architectures are learning frequency and/or time dependencies. Additionally, a black-box model is proposed as a baseline for comparison. With these experiments we have been able to understand what some deep learning based algorithms can learn from a particular set of data.

...read moreread less

177 citations

Proceedings Article•10.1109/CBMI.2018.8516489•

Automatic Detection of Patient with Respiratory Diseases Using Lung Sound Analysis

[...]

Gaetan Chambres¹, Pierre Hanna¹, Myriam Desainte-Catherine¹•Institutions (1)

L'Abri¹

4 Sep 2018

TL;DR: A new model is created at the patient level, which is able to decide if a patient sounds sick or not, by taking as input the predicted results of the first classification model, which permits to reach 85% of good predictions and could be used as a tool for helping doctors to make better diagnosis.

...read moreread less

Abstract: In modern medicine, every cardiac assessment or respiratory check-up includes an audio auscultation during which one the medical specialist listens to sounds from the patient body with different tools (stethoscope, sonography). This shows how important sound analysis is for heart and lungs disease detection. During the IeBRI 2017 challenge, a database of 920 records acquired from 126 subject, was used to find a method that predicted if a respiratory cycle contains, or not, adventitious sounds like crackles, wheezes or both of them. The team which submits the best results reached around 50% of correct detection. Using a machine learning approach with a boosted decisional tree model and more audio features leads to the same results. A new approach consists in creating a new model at the patient level, which is able to decide if a patient sounds sick or not, by taking as input the predicted results of the first classification model. This new model permits to reach 85% of good predictions and could be used as a tool for helping doctors to make better diagnosis.

...read moreread less

147 citations

Proceedings Article•10.1109/CBMI.2007.385392•

Large-Scale Study of Chord Estimation Algorithms Based on Chroma Representation and HMM

[...]

Hélène Papadopoulos¹, Geoffroy Peeters¹•Institutions (1)

IRCAM¹

25 Jun 2007

TL;DR: This paper deals with the automatic estimation of chord progression over time of an audio file by taking into account music theory, perception of key and presence of higher harmonics of pitch notes.

...read moreread less

Abstract: This paper deals with the automatic estimation of chord progression over time of an audio file. From the audio signal, a set of chroma vectors representing the pitch content of the file over time is extracted. From these observations the chord progression is then estimated using hidden Markov models. Several methods are proposed that allow taking into account music theory, perception of key and presence of higher harmonics of pitch notes. The proposed methods are then compared to existing algorithms. A large-scale evaluation on 110 hand-labeled songs from the Beatles allows concluding on improvement over the state of the art.

...read moreread less

106 citations

Proceedings Article•10.1109/CBMI.2018.8516556•

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

[...]

Michael Gygli¹•Institutions (1)

Google¹

1 Sep 2018

TL;DR: In this article, the authors propose to learn shot boundary detection end-to-end, from pixels to final shot boundaries, using a CNN which is fully convolutional in time.

...read moreread less

Abstract: Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition’ video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs to predict shot changes. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results on the RAI dataset, while running at an unprecedented speed of more than 120x real-time.

...read moreread less

96 citations

Proceedings Article•10.1109/CBMI.2012.6269851•

A presentation of the REPERE challenge

[...]

Juliette Kahn, Olivier Galibert, Ludovic Quintard, Matthieu Carr'e, Aude Giraudel¹, Philippe Joly - Show less +2 more•Institutions (1)

Direction générale de l'armement¹

27 Jun 2012

TL;DR: The REPERE corpus, a French video corpus with multimodal annotation, has been developed and the systems have to answer the following questions: Who is speaking? Who is present in the video?What names are cited?

...read moreread less

Abstract: The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progress, annual evaluation campaigns will be organized from 2012 to 2014. In this context the REPERE corpus, a French video corpus with multimodal annotation, has been developed. The systems have to answer the following questions: Who is speaking? Who is present in the video? What names are cited? What names are displayed? The challenge is to combine the various information coming from the speech and the images.

...read moreread less

94 citations

...

Expand

No. of papers from the Conference in previous years
Year	Papers
2021	44
2019	43
2018	38
2017	38
2016	40
2015	39

Conference Tools

Papers published on a yearly basis

Papers

Experimenting with musically motivated convolutional neural networks

Automatic Detection of Patient with Respiratory Diseases Using Lung Sound Analysis

Large-Scale Study of Chord Estimation Algorithms Based on Chroma Representation and HMM

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

A presentation of the REPERE challenge

Performance Metrics