Action unit detection with segment-based SVMs

doi:10.1109/CVPR.2010.5539998

Proceedings Article10.1109/CVPR.2010.5539998

Action unit detection with segment-based SVMs

Tomas Simon, +3 more

- 13 Jun 2010

- pp 2737-2744

100

TL;DR: Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection and finds the best k-or-fewer segments that maximize the SVM score.

Abstract: Automatic facial action unit (AU) detection from video is a long-standing problem in computer vision. Two main approaches have been pursued: (1) static modeling — typically posed as a discriminative classification problem in which each video frame is evaluated independently; (2) temporal modeling — frames are segmented into sequences and typically modeled with a variant of dynamic Bayesian networks. We propose a segment-based approach, kSeg-SVM, that incorporates benefits of both approaches and avoids their limitations. kSeg-SVM is a temporal extension of the spatial bag-of-words. kSeg-SVM is trained within a structured output SVM framework that formulates AU detection as a problem of detecting temporal events in a time series of visual features. Each segment is modeled by a variant of the BoW representation with soft assignment of the words based on similarity. Our framework has several benefits for AU detection: (1) both dependencies between features and the length of action units are modeled; (2) all possible segments of the video may be used for training; and (3) no assumptions are required about the underlying structure of the action unit events (e.g., i.i.d.). Our algorithm finds the best k-or-fewer segments that maximize the SVM score. Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1073/PNAS.1322355111

Compound facial expressions of emotion.

Shichuan Du, +2 more

- 15 Apr 2014

- Proceedings of the National Academy of S...

TL;DR: A computational model of face perception is used to demonstrate that most of these categories are also visually discriminable from one another, suggesting that a larger number of categories is used by humans.

...read moreread less

855

•Journal Article•10.1109/TPAMI.2014.2366127

Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition

Evangelos Sariyanidi, +2 more

- 01 Jun 2015

- IEEE Transactions on Pattern Analysis an...

TL;DR: This paper provides a comprehensive analysis of facial representations by uncovering their advantages and limitations, and elaborate on the type of information they encode and how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias.

...read moreread less

760

Proceedings Article•10.1109/CVPR.2016.600

EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild

C. Fabian Benitez-Quiroz, +2 more

- 01 Jun 2016

TL;DR: A novel computer vision algorithm is presented to annotate a large database of one million images of facial expressions of emotion in the wild that can be readily queried using semantic descriptions for applications in computer vision, affective computing, social and cognitive psychology and neuroscience.

...read moreread less

704

•Journal Article•10.1109/TPAMI.2015.2430335

Anticipating Human Activities Using Object Affordances for Reactive Robotic Response

Hema Swetha Koppula, +1 more

- 01 Jan 2016

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work represents each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances and represents each ATCRF as a particle and represents the distribution over the potential futures using a set of particles.

...read moreread less

704

•Proceedings Article•10.15607/RSS.2013.IX.006

Anticipating Human Activities using Object Affordances for Reactive Robotic Response

Hema Swetha Koppula, +1 more

- 23 Jun 2013

TL;DR: In this article, an anticipatory temporal conditional random field (ATCRF) is proposed to model the spatial-temporal relations through object affordances, where each ATCRF is considered as a particle and represent the distribution over the potential future using a set of particles.

...read moreread less

558

...

Expand

References

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

Proceedings Article•10.1109/ICCV.1999.790410

Object recognition from local scale-invariant features

David G. Lowe

- 20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

19.3K

•Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

- 28 Jun 2001

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

15.4K

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

- 01 Jan 2005

11.3K

Proceedings Article•10.1109/ICCV.2003.1238663

Video Google: a text retrieval approach to object matching in videos

Sivic, +1 more

- 13 Oct 2003

TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.

...read moreread less

7.5K