Conference
Content-Based Multimedia Indexing
About: Content-Based Multimedia Indexing is an academic conference. The conference publishes majorly in the area(s): Image retrieval & Feature extraction. Over the lifetime, 654 publications have been published by the conference receiving 5990 citations.
Papers published on a yearly basis
Papers
15 Jun 2016
TL;DR: This article explores various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning and proposes several musically motivated architectures.
Abstract: A common criticism of deep learning relates to the difficulty in understanding the underlying relationships that the neural networks are learning, thus behaving like a black-box. In this article we explore various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning. We first discuss how convolutional filters with different shapes can fit specific musical concepts and based on that we propose several musically motivated architectures. These architectures are then assessed by measuring the accuracy of the deep learning model in the prediction of various music classes using a known dataset of audio recordings of ballroom music. The classes in this dataset have a strong correlation with tempo, what allows assessing if the proposed architectures are learning frequency and/or time dependencies. Additionally, a black-box model is proposed as a baseline for comparison. With these experiments we have been able to understand what some deep learning based algorithms can learn from a particular set of data.
177 citations
4 Sep 2018
TL;DR: A new model is created at the patient level, which is able to decide if a patient sounds sick or not, by taking as input the predicted results of the first classification model, which permits to reach 85% of good predictions and could be used as a tool for helping doctors to make better diagnosis.
Abstract: In modern medicine, every cardiac assessment or respiratory check-up includes an audio auscultation during which one the medical specialist listens to sounds from the patient body with different tools (stethoscope, sonography). This shows how important sound analysis is for heart and lungs disease detection. During the IeBRI 2017 challenge, a database of 920 records acquired from 126 subject, was used to find a method that predicted if a respiratory cycle contains, or not, adventitious sounds like crackles, wheezes or both of them. The team which submits the best results reached around 50% of correct detection. Using a machine learning approach with a boosted decisional tree model and more audio features leads to the same results. A new approach consists in creating a new model at the patient level, which is able to decide if a patient sounds sick or not, by taking as input the predicted results of the first classification model. This new model permits to reach 85% of good predictions and could be used as a tool for helping doctors to make better diagnosis.
147 citations
25 Jun 2007
TL;DR: This paper deals with the automatic estimation of chord progression over time of an audio file by taking into account music theory, perception of key and presence of higher harmonics of pitch notes.
Abstract: This paper deals with the automatic estimation of chord progression over time of an audio file. From the audio signal, a set of chroma vectors representing the pitch content of the file over time is extracted. From these observations the chord progression is then estimated using hidden Markov models. Several methods are proposed that allow taking into account music theory, perception of key and presence of higher harmonics of pitch notes. The proposed methods are then compared to existing algorithms. A large-scale evaluation on 110 hand-labeled songs from the Beatles allows concluding on improvement over the state of the art.
106 citations
1 Sep 2018
TL;DR: In this article, the authors propose to learn shot boundary detection end-to-end, from pixels to final shot boundaries, using a CNN which is fully convolutional in time.
Abstract: Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition’ video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs to predict shot changes. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results on the RAI dataset, while running at an unprecedented speed of more than 120x real-time.
96 citations
27 Jun 2012
TL;DR: The REPERE corpus, a French video corpus with multimodal annotation, has been developed and the systems have to answer the following questions: Who is speaking? Who is present in the video?What names are cited?
Abstract: The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progress, annual evaluation campaigns will be organized from 2012 to 2014. In this context the REPERE corpus, a French video corpus with multimodal annotation, has been developed. The systems have to answer the following questions: Who is speaking? Who is present in the video? What names are cited? What names are displayed? The challenge is to combine the various information coming from the speech and the images.
94 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 44 |
| 2019 | 43 |
| 2018 | 38 |
| 2017 | 38 |
| 2016 | 40 |
| 2015 | 39 |