Proceedings Article10.1109/CVPR.2010.5539998
Action unit detection with segment-based SVMs
Tomas Simon,Minh Hoai Nguyen,Fernando De la Torre,Jeffrey F. Cohn +3 more
- 13 Jun 2010
- pp 2737-2744
TL;DR: Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection and finds the best k-or-fewer segments that maximize the SVM score.
read more
Abstract: Automatic facial action unit (AU) detection from video is a long-standing problem in computer vision. Two main approaches have been pursued: (1) static modeling — typically posed as a discriminative classification problem in which each video frame is evaluated independently; (2) temporal modeling — frames are segmented into sequences and typically modeled with a variant of dynamic Bayesian networks. We propose a segment-based approach, kSeg-SVM, that incorporates benefits of both approaches and avoids their limitations. kSeg-SVM is a temporal extension of the spatial bag-of-words. kSeg-SVM is trained within a structured output SVM framework that formulates AU detection as a problem of detecting temporal events in a time series of visual features. Each segment is modeled by a variant of the BoW representation with soft assignment of the words based on similarity. Our framework has several benefits for AU detection: (1) both dependencies between features and the length of action units are modeled; (2) all possible segments of the video may be used for training; and (3) no assumptions are required about the underlying structure of the action unit events (e.g., i.i.d.). Our algorithm finds the best k-or-fewer segments that maximize the SVM score. Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Compound facial expressions of emotion.
TL;DR: A computational model of face perception is used to demonstrate that most of these categories are also visually discriminable from one another, suggesting that a larger number of categories is used by humans.
855
Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition
TL;DR: This paper provides a comprehensive analysis of facial representations by uncovering their advantages and limitations, and elaborate on the type of information they encode and how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias.
760
EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild
C. Fabian Benitez-Quiroz,Ramprakash Srinivasan,Aleix M. Martinez +2 more
- 01 Jun 2016
TL;DR: A novel computer vision algorithm is presented to annotate a large database of one million images of facial expressions of emotion in the wild that can be readily queried using semantic descriptions for applications in computer vision, affective computing, social and cognitive psychology and neuroscience.
Anticipating Human Activities Using Object Affordances for Reactive Robotic Response
TL;DR: This work represents each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances and represents each ATCRF as a particle and represents the distribution over the potential futures using a set of particles.
Anticipating Human Activities using Object Affordances for Reactive Robotic Response
Hema Swetha Koppula,Ashutosh Saxena +1 more
- 23 Jun 2013
TL;DR: In this article, an anticipatory temporal conditional random field (ATCRF) is proposed to model the spatial-temporal relations through object affordances, where each ATCRF is considered as a particle and represent the distribution over the potential future using a set of particles.
558
References
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Object recognition from local scale-invariant features
David G. Lowe
- 20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
•Proceedings Article
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
John Lafferty,Andrew McCallum,Fernando Pereira +2 more
- 28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Video Google: a text retrieval approach to object matching in videos
TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Related Papers (5)
Takeo Kanade,Jeffrey F. Cohn,Yingli Tian +2 more
- 26 Mar 2000
Xuehan Xiong,Fernando De la Torre +1 more
- 23 Jun 2013
Beat Fasel,Juergen Luettin +1 more