Conference

ACM Multimedia

About: ACM Multimedia is an academic conference. The conference publishes majorly in the area(s): Computer science & Image retrieval. Over the lifetime, 7751 publications have been published by the conference receiving 253778 citations.

...read moreread less

Topics: Computer science, Image retrieval, Video tracking, Feature (computer vision), Deep learning ...read more

Conference Tools

Create Scientific Poster

Create Conference poster

Create Presentation with AI

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/2647868.2654889•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

3 Nov 2014

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

14,915 citations

Proceedings Article•10.1145/1873951.1874249•

Vlfeat: an open and portable library of computer vision algorithms

[...]

Andrea Vedaldi¹, Brian Fulkerson²•Institutions (2)

University of Oxford¹, University of California, Los Angeles²

25 Oct 2010

TL;DR: VLFeat is an open and portable library of computer vision algorithms that includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization.

...read moreread less

Abstract: VLFeat is an open and portable library of computer vision algorithms. It aims at facilitating fast prototyping and reproducible research for computer vision scientists and students. It includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization. The source code and interfaces are fully documented. The library integrates directly with MATLAB, a popular language for computer vision research.

...read moreread less

3,578 citations

Proceedings Article•10.1145/2733373.2807412•

MatConvNet: Convolutional Neural Networks for MATLAB

[...]

Andrea Vedaldi¹, Karel Lenc¹•Institutions (1)

University of Oxford¹

13 Oct 2015

TL;DR: MatConvNet exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more.

...read moreread less

Abstract: MatConvNet is an open source implementation of Convolutional Neural Networks (CNNs) with a deep integration in the MATLAB environment. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more. MatConvNet can be easily extended, often using only MATLAB code, allowing fast prototyping of new CNN architectures. At the same time, it supports efficient computation on CPU and GPU, allowing to train complex models on large datasets such as ImageNet ILSVRC containing millions of training examples

...read moreread less

3,192 citations

Proceedings Article•10.1145/1873951.1874246•

Opensmile: the munich versatile and fast open-source audio feature extractor

[...]

Florian Eyben¹, Martin Wöllmer¹, Björn Schuller¹•Institutions (1)

Technische Universität München¹

25 Oct 2010

TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.

...read moreread less

Abstract: We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.

...read moreread less

3,006 citations

Proceedings Article•10.1145/1291233.1291311•

A 3-dimensional sift descriptor and its application to action recognition

[...]

Paul Scovanner¹, Saad Ali¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

29 Sep 2007

TL;DR: This paper uses a bag of words approach to represent videos, and presents a method to discover relationships between spatio-temporal words in order to better describe the video data.

...read moreread less

Abstract: In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.

...read moreread less

1,948 citations

...

Expand

Performance Metrics

7,751

Papers

57,212

Citations

No. of papers from the Conference in previous years
Year	Papers
2022	3
2021	760
2020	631
2019	484
2018	317
2017	365