Conference
ACM Multimedia
About: ACM Multimedia is an academic conference. The conference publishes majorly in the area(s): Computer science & Image retrieval. Over the lifetime, 7751 publications have been published by the conference receiving 253778 citations.
Papers published on a yearly basis
Papers
3 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
14,915 citations
25 Oct 2010
TL;DR: VLFeat is an open and portable library of computer vision algorithms that includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization.
Abstract: VLFeat is an open and portable library of computer vision algorithms. It aims at facilitating fast prototyping and reproducible research for computer vision scientists and students. It includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization. The source code and interfaces are fully documented. The library integrates directly with MATLAB, a popular language for computer vision research.
3,578 citations
13 Oct 2015
TL;DR: MatConvNet exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more.
Abstract: MatConvNet is an open source implementation of Convolutional Neural Networks (CNNs) with a deep integration in the MATLAB environment. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more. MatConvNet can be easily extended, often using only MATLAB code, allowing fast prototyping of new CNN architectures. At the same time, it supports efficient computation on CPU and GPU, allowing to train complex models on large datasets such as ImageNet ILSVRC containing millions of training examples
3,192 citations
25 Oct 2010
TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.
Abstract: We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
3,006 citations
29 Sep 2007
TL;DR: This paper uses a bag of words approach to represent videos, and presents a method to discover relationships between spatio-temporal words in order to better describe the video data.
Abstract: In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.
1,948 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2022 | 3 |
| 2021 | 760 |
| 2020 | 631 |
| 2019 | 484 |
| 2018 | 317 |
| 2017 | 365 |