Top 3907 papers published in the topic of Support vector machine in 2011

Showing papers on "Support vector machine published in 2011"

Journal Article•10.1145/1961189.1961199•

LIBSVM: A library for support vector machines

[...]

Chih-Chung Chang¹, Chih-Jen Lin¹•Institutions (1)

06 May 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

46,343 citations

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

20,585 citations

Journal Article•10.1109/TASL.2010.2064307•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

4,436 citations

Journal Article•10.1007/S10107-010-0420-4•

Pegasos: primal estimated sub-gradient solver for SVM

[...]

Shai Shalev-Shwartz¹, Yoram Singer², Nathan Srebro³, Andrew Cotter³•Institutions (3)

Hebrew University of Jerusalem¹, Google², Toyota Technological Institute at Chicago³

01 Mar 2011-Mathematical Programming

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

Abstract: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

2,430 citations

Journal Article•

Multiple Kernel Learning Algorithms

[...]

Mehmet Gönen¹, Ethem Alpaydin¹•Institutions (1)

Boğaziçi University¹

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Overall, using multiple kernels instead of a single one is useful and it is believed that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

...read moreread less

Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

...read moreread less

2,039 citations

Book•

Support Vector Data Description

[...]

Chandan Srivastava

14 Jun 2011

1,343 citations

Journal Article•10.1109/TGRS.2011.2129595•

Hyperspectral Image Classification Using Dictionary-Based Sparse Representation

[...]

Yi Chen¹, Nasser M. Nasrabadi², Trac D. Tran¹•Institutions (2)

Johns Hopkins University¹, United States Army Research Laboratory²

12 May 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results show that the proposed sparsity-based algorithm for the classification of hyperspectral imagery outperforms the classical supervised classifier support vector machines in most cases.

...read moreread less

Abstract: A new sparsity-based algorithm for the classification of hyperspectral imagery is proposed in this paper. The proposed algorithm relies on the observation that a hyperspectral pixel can be sparsely represented by a linear combination of a few training samples from a structured dictionary. The sparse representation of an unknown pixel is expressed as a sparse vector whose nonzero entries correspond to the weights of the selected training samples. The sparse vector is recovered by solving a sparsity-constrained optimization problem, and it can directly determine the class label of the test sample. Two different approaches are proposed to incorporate the contextual information into the sparse recovery optimization problem in order to improve the classification performance. In the first approach, an explicit smoothing constraint is imposed on the problem formulation by forcing the vector Laplacian of the reconstructed image to become zero. In this approach, the reconstructed pixel of interest has similar spectral characteristics to its four nearest neighbors. The second approach is via a joint sparsity model where hyperspectral pixels in a small neighborhood around the test pixel are simultaneously represented by linear combinations of a few common training samples, which are weighted with a different set of coefficients for each pixel. The proposed sparsity-based algorithm is applied to several real hyperspectral images for classification. Experimental results show that our algorithm outperforms the classical supervised classifier support vector machines in most cases.

...read moreread less

1,272 citations

Journal Article•10.5555/1953048.2021036•

Differentially Private Empirical Risk Minimization

[...]

Kamalika Chaudhuri¹, Claire Monteleoni, Anand D. Sarwate•Institutions (1)

University of California, San Diego¹

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work proposes a new method, objective perturbation, for privacy-preserving machine learning algorithm design, and shows that both theoretically and empirically, this method is superior to the previous state-of-the-art, output perturbations, in managing the inherent tradeoff between privacy and learning performance.

...read moreread less

Abstract: Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the e-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

...read moreread less

1,198 citations

Proceedings Article•10.1109/ICCV.2011.6126229•

Ensemble of exemplar-SVMs for object detection and beyond

[...]

Tomasz Malisiewicz¹, Abhinav Gupta¹, Alexei A. Efros¹•Institutions (1)

Carnegie Mellon University¹

6 Nov 2011

TL;DR: This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach.

...read moreread less

Abstract: This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.

...read moreread less

1,146 citations

Proceedings Article•10.1109/CVPR.2011.5995702•

What you saw is not what you get: Domain adaptation using asymmetric kernel transforms

[...]

Brian Kulis¹, Kate Saenko¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

20 Jun 2011

TL;DR: This paper introduces ARC-t, a flexible model for supervised learning of non-linear transformations between domains, based on a novel theoretical result demonstrating that such transformations can be learned in kernel space.

...read moreread less

Abstract: In real-world applications, “what you saw” during training is often not “what you get” during deployment: the distribution and even the type and dimensionality of features can change from one dataset to the next. In this paper, we address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce ARC-t, a flexible model for supervised learning of non-linear transformations between domains. Our method is based on a novel theoretical result demonstrating that such transformations can be learned in kernel space. Unlike existing work, our model is not restricted to symmetric transformations, nor to features of the same type and dimensionality, making it applicable to a significantly wider set of adaptation scenarios than previous methods. Furthermore, the method can be applied to categories that were not available during training. We demonstrate the ability of our method to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types and codebooks.

...read moreread less

890 citations

Journal Article•10.1016/J.PATCOG.2011.01.017•

An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

[...]

Mikel Galar¹, Alberto Fernández², Edurne Barrenechea¹, Humberto Bustince¹, Francisco Herrera³ - Show less +1 more•Institutions (3)

Universidad Pública de Navarra¹, University of Jaén², University of Granada³

01 Aug 2011-Pattern Recognition

TL;DR: This work develops a double study, using different base classifiers in order to observe the suitability and potential of each combination within each classifier, and compares the performance of these ensemble techniques with the classifiers' themselves.

...read moreread less

Journal Article•10.1109/TNN.2011.2130540•

Improvements on Twin Support Vector Machines

[...]

Yuan-Hai Shao, Chunhua Zhang¹, Xiao-Bo Wang², Nai-Yang Deng•Institutions (2)

Renmin University of China¹, Tsinghua University²

01 Jun 2011-IEEE Transactions on Neural Networks

TL;DR: An improved version of the TBSVM is proposed, named twin bounded support vector machines (TBSVM), based on TWSVM, that the structural risk minimization principle is implemented by introducing the regularization term.

...read moreread less

Abstract: For classification problems, the generalized eigenvalue proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) are regarded as milestones in the development of the powerful SVMs, as they use the nonparallel hyperplane classifiers. In this brief, we propose an improved version, named twin bounded support vector machines (TBSVM), based on TWSVM. The significant advantage of our TBSVM over TWSVM is that the structural risk minimization principle is implemented by introducing the regularization term. This embodies the marrow of statistical learning theory, so this modification can improve the performance of classification. In addition, the successive overrelaxation technique is used to solve the optimization problems to speed up the training procedure. Experimental results show the effectiveness of our method in both computation time and classification accuracy, and therefore confirm the above conclusion further.

...read moreread less

Journal Article•10.1016/J.ENGGEO.2011.09.006•

Landslide susceptibility assessment using SVM machine learning algorithm

[...]

Miloš Marjanović¹, Miloš Kovačević², Branislav Bajat², Vít Voženílek¹•Institutions (2)

Palacký University, Olomouc¹, University of Belgrade²

13 Nov 2011-Engineering Geology

TL;DR: This paper introduces the current machine learning approach to solving spatial modeling problems in the domain of landslide susceptibility assessment, and selected Support Vector Machines as the model of choice to be compared with a common knowledge-driven method – the Analytical Hierarchy Process.

...read moreread less

Journal Article•10.1016/J.DSS.2010.11.006•

Detection of financial statement fraud and feature selection using data mining techniques

[...]

P. Ravisankar, Vadlamani Ravi, G. Raghava Rao, Indranil Bose¹•Institutions (1)

University of Hong Kong¹

1 Jan 2011

TL;DR: Data mining techniques such as Multilayer Feed Forward Neural Network, Support Vector Machines, genetic programming, Genetic Programming, Group Method of Data Handling, Logistic Regression, and Probabilistic Neural Network are used to identify companies that resort to financial statement fraud.

...read moreread less

Abstract: Recently, high profile cases of financial statement fraud have been dominating the news. This paper uses data mining techniques such as Multilayer Feed Forward Neural Network (MLFF), Support Vector Machines (SVM), Genetic Programming (GP), Group Method of Data Handling (GMDH), Logistic Regression (LR), and Probabilistic Neural Network (PNN) to identify companies that resort to financial statement fraud. Each of these techniques is tested on a dataset involving 202 Chinese companies and compared with and without feature selection. PNN outperformed all the techniques without feature selection, and GP and PNN outperformed others with feature selection and with marginally equal accuracies.

...read moreread less

Journal Article•10.1016/J.ESWA.2010.06.048•

A comparative assessment of ensemble learning for credit scoring

[...]

Gang Wang¹, Jin-Xing Hao¹, Jian Ma¹, Hongbing Jiang¹•Institutions (1)

City University of Hong Kong¹

01 Jan 2011-Expert Systems With Applications

TL;DR: Experimental results reveal that the three ensemble methods can substantially improve individual base learners, and in particular, Bagging performs better than Boosting across all credit datasets.

...read moreread less

Abstract: Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging DT in our experiments, get the best performance in terms of average accuracy, type I error and type II error.

...read moreread less

Journal Article•10.1186/1471-2105-12-489•

Predicting RNA-Protein Interactions Using Only Sequence Information

[...]

Usha Muppirala¹, Vasant Honavar¹, Drena Dobbs¹•Institutions (1)

Iowa State University¹

22 Dec 2011-BMC Bioinformatics

TL;DR: RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs, as well as competitive with that of a published method that requires information regarding many different features.

...read moreread less

Abstract: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. We propose RPISeq, a family of classifiers for predicting R NA-p rotein i nteractions using only seq uence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/ .

...read moreread less

Journal Article•10.1109/TIM.2011.2161140•

Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques

[...]

N. H. Dardas¹, Nicolas D. Georganas¹•Institutions (1)

University of Ottawa¹

15 Aug 2011-IEEE Transactions on Instrumentation and Measurement

TL;DR: This system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application.

...read moreread less

Abstract: This paper presents a novel and real-time system for interaction with an application or video game via hand gestures. Our system includes detecting and tracking bare hand in cluttered background using skin detection and hand posture contour comparison algorithm after face subtraction, recognizing hand gestures via bag-of-features and multiclass support vector machine (SVM) and building a grammar that generates gesture commands to control an application. In the training stage, after extracting the keypoints for every training image using the scale invariance feature transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multiclass SVM to build the training classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using our algorithm, then, the keypoints are extracted for every small image that contains the detected hand gesture only and fed into the cluster model to map them into a bag-of-words vector, which is finally fed into the multiclass SVM training classifier to recognize the hand gesture.

...read moreread less

Journal Article•10.1016/J.ESWA.2010.06.066•

A novel intrusion detection system based on hierarchical clustering and support vector machines

[...]

Shi-Jinn Horng¹, Ming-Yang Su², Yuan-Hsin Chen³, Tzong-Wann Kao, Rong-Jian Chen³, Jui-Lin Lai³, Citra Dwi Perkasa¹ - Show less +3 more•Institutions (3)

National Taiwan University of Science and Technology¹, Ming Chuan University², National United University³

01 Jan 2011-Expert Systems With Applications

TL;DR: This study proposed an SVM-based intrusion detection system, which combines a hierarchical clustering algorithm, a simple feature selection procedure, and the SVM technique, which showed better performance in the detection of DoS and Probe attacks and the beset performance in overall accuracy.

...read moreread less

Abstract: This study proposed an SVM-based intrusion detection system, which combines a hierarchical clustering algorithm, a simple feature selection procedure, and the SVM technique. The hierarchical clustering algorithm provided the SVM with fewer, abstracted, and higher-qualified training instances that are derived from the KDD Cup 1999 training set. It was able to greatly shorten the training time, but also improve the performance of resultant SVM. The simple feature selection procedure was applied to eliminate unimportant features from the training set so the obtained SVM model could classify the network traffic data more accurately. The famous KDD Cup 1999 dataset was used to evaluate the proposed system. Compared with other intrusion detection systems that are based on the same dataset, this system showed better performance in the detection of DoS and Probe attacks, and the beset performance in overall accuracy.

...read moreread less

Proceedings Article•10.1109/CVPR.2011.5995477•

Large-scale image classification: Fast feature extraction and SVM training

[...]

Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao¹, Thomas S. Huang¹ - Show less +4 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Jun 2011

TL;DR: A parallel averaging stochastic gradient descent (ASGD) algorithm for training one-against-all 1000-class SVM classifiers and a Hadoop scheme that performs feature extraction in parallel using hundreds of mappers, which achieves state-of-the-art performance on the ImageNet 1000- class classification.

...read moreread less

Abstract: Most research efforts on image classification so far have been focused on medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G∼48G). There are two main reasons for the limited effort on large-scale image classification. First, until the emergence of ImageNet dataset, there was almost no publicly available large-scale benchmark data for image classification. This is mostly because class labels are expensive to obtain. Second, large-scale classification is hard because it poses more challenges than its medium-scale counterparts. A key challenge is how to achieve efficiency in both feature extraction and classifier training without compromising performance. This paper is to show how we address this challenge using ImageNet dataset as an example. For feature extraction, we develop a Hadoop scheme that performs feature extraction in parallel using hundreds of mappers. This allows us to extract fairly sophisticated features (with dimensions being hundreds of thousands) on 1.2 million images within one day. For SVM training, we develop a parallel averaging stochastic gradient descent (ASGD) algorithm for training one-against-all 1000-class SVM classifiers. The ASGD algorithm is capable of dealing with terabytes of training data and converges very fast–typically 5 epochs are sufficient. As a result, we achieve state-of-the-art performance on the ImageNet 1000-class classification, i.e., 52.9% in classification accuracy and 71.8% in top 5 hit rate.

...read moreread less

Journal Article•10.1016/J.JNCA.2011.01.002•

Mutual information-based feature selection for intrusion detection systems

[...]

Fatemeh Amiri¹, Mohammadmahdi R. Yousefi¹, Caro Lucas¹, Azadeh Shakery¹, Nasser Yazdani¹ - Show less +1 more•Institutions (1)

University of Tehran¹

01 Jul 2011-Journal of Network and Computer Applications

TL;DR: This work proposes two feature selection algorithms and investigates the performance of using these algorithms compared to a mutual information-based feature selection method, using both a linear and a non-linear measure-linear correlation coefficient and mutual information, for the feature selection.

...read moreread less

Journal Article•10.1109/JSTSP.2011.2113170•

Sparse Representation for Target Detection in Hyperspectral Imagery

[...]

Yi Chen¹, Nasser M. Nasrabadi², Trac D. Tran¹•Institutions (2)

Johns Hopkins University¹, United States Army Research Laboratory²

10 Feb 2011-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This paper proposes a new sparsity-based algorithm for automatic target detection in hyperspectral imagery (HSI) based on the concept that a pixel in HSI lies in a low-dimensional subspace and thus can be represented as a sparse linear combination of the training samples.

...read moreread less

Abstract: In this paper, we propose a new sparsity-based algorithm for automatic target detection in hyperspectral imagery (HSI). This algorithm is based on the concept that a pixel in HSI lies in a low-dimensional subspace and thus can be represented as a sparse linear combination of the training samples. The sparse representation (a sparse vector corresponding to the linear combination of a few selected training samples) of a test sample can be recovered by solving an l0-norm minimization problem. With the recent development of the compressed sensing theory, such minimization problem can be recast as a standard linear programming problem or efficiently approximated by greedy pursuit algorithms. Once the sparse vector is obtained, the class of the test sample can be determined by the characteristics of the sparse vector on reconstruction. In addition to the constraints on sparsity and reconstruction accuracy, we also exploit the fact that in HSI the neighboring pixels have a similar spectral characteristic (smoothness). In our proposed algorithm, a smoothness constraint is also imposed by forcing the vector Laplacian at each reconstructed pixel to be minimum all the time within the minimization process. The proposed sparsity-based algorithm is applied to several hyperspectral imagery to detect targets of interest. Simulation results show that our algorithm outperforms the classical hyperspectral target detection algorithms, such as the popular spectral matched filters, matched subspace detectors, adaptive subspace detectors, as well as binary classifiers such as support vector machines.

...read moreread less

Journal Article•10.1007/S10994-010-5221-8•

Dual coordinate descent methods for logistic regression and maximum entropy models

[...]

Hsiang-Fu Yu¹, Fang-Lan Huang¹, Chih-Jen Lin¹•Institutions (1)

National Taiwan University¹

01 Oct 2011-Machine Learning

TL;DR: This paper applies coordinate descent methods to solve the dual form of logistic regression and maximum entropy, and shows that many details are different from the situation in linear SVM.

...read moreread less

Abstract: Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for linear support vector machines (SVM), methods have been shown to be very effective for solving the dual problem. In this paper, we apply coordinate descent methods to solve the dual form of logistic regression and maximum entropy. Interestingly, many details are different from the situation in linear SVM. We carefully study the theoretical convergence as well as numerical issues. The proposed method is shown to be faster than most state of the art methods for training logistic regression and maximum entropy.

...read moreread less

Journal Article•10.1109/TGRS.2011.2153861•

Hyperspectral Image Classification With Independent Component Discriminant Analysis

[...]

A. Villa¹, Jon Atli Benediktsson², Jocelyn Chanussot¹, Christian Jutten•Institutions (2)

Grenoble Institute of Technology¹, University of Iceland²

22 Dec 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The influence of the algorithm used to enforce independence and of the number of IC retained for the classification of hyperspectral images is studied, proposing an effective method to estimate the most suitable number.

...read moreread less

Abstract: In this paper, the use of Independent Component (IC) Discriminant Analysis (ICDA) for remote sensing classification is proposed. ICDA is a nonparametric method for discriminant analysis based on the application of a Bayesian classification rule on a signal composed by ICs. The method uses IC Analysis (ICA) to choose a transform matrix so that the transformed components are as independent as possible. When the data are projected in an independent space, the estimates of their multivariate density function can be computed in a much easier way as the product of univariate densities. A nonparametric kernel density estimator is used to compute the density functions of each IC. Finally, the Bayes rule is applied for the classification assignment. In this paper, we investigate the possibility of using ICDA for the classification of hyperspectral images. We study the influence of the algorithm used to enforce independence and of the number of IC retained for the classification, proposing an effective method to estimate the most suitable number. The proposed method is applied to several hyperspectral images, in order to test different data set conditions (urban/agricultural area, size of the training set, and type of sensor). Obtained results are compared with one of the most commonly used classifier of hyperspectral images (support vector machines) and show the comparative effectiveness of the proposed method in terms of accuracy.

...read moreread less

Book Chapter•10.1007/978-3-642-23626-6_44•

Fully automatic segmentation of brain tumor images using support vector machine classification in combination with hierarchical conditional random field regularization

[...]

Stefan Bauer¹, Lutz-P. Nolte¹, Mauricio Reyes¹•Institutions (1)

University of Bern¹

18 Sep 2011

TL;DR: A fully automatic method for brain tissue segmentation, which combines Support Vector Machine classification using multispectral intensities and textures with subsequent hierarchical regularization based on Conditional Random Fields, which is fast and tailored to standard clinical acquisition protocols.

...read moreread less

Abstract: Delineating brain tumor boundaries from magnetic resonance images is an essential task for the analysis of brain cancer. We propose a fully automatic method for brain tissue segmentation, which combines Support Vector Machine classification using multispectral intensities and textures with subsequent hierarchical regularization based on Conditional Random Fields. The CRF regularization introduces spatial constraints to the powerful SVM classification, which assumes voxels to be independent from their neighbors. The approach first separates healthy and tumor tissue before both regions are subclassified into cerebrospinal fluid, white matter, gray matter and necrotic, active, edema region respectively in a novel hierarchical way. The hierarchical approach adds robustness and speed by allowing to apply different levels of regularization at different stages. The method is fast and tailored to standard clinical acquisition protocols. It was assessed on 10 multispectral patient datasets with results outperforming previous methods in terms of segmentation detail and computation times.

...read moreread less

Journal Article•10.1587/TRANSINF.E94.D.1854•

A Short Introduction to Learning to Rank

[...]

Hang Li¹•Institutions (1)

Microsoft¹

01 Oct 2011-IEICE Transactions on Information and Systems

TL;DR: Several learning to rank methods using SVM techniques are described in details and the fundamental problems, existing approaches, and future work of learning toRank are explained.

...read moreread less

Abstract: Learning to rank refers to machine learning techniques for training the model in a ranking task. Learning to rank is useful for many applications in Information Retrieval, Natural Language Processing, and Data Mining. Intensive studies have been conducted on the problem and significant progress has been made[1],[2]. This short paper gives an introduction to learning to rank, and it specifically explains the fundamental problems, existing approaches, and future work of learning to rank. Several learning to rank methods using SVM techniques are described in details.

...read moreread less

Journal Article•10.1016/J.ESWA.2011.01.120•

A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis

[...]

Huiling Chen¹, Bo Yang¹, Jie Liu¹, Dayou Liu¹•Institutions (1)

Jilin University¹

01 Jul 2011-Expert Systems With Applications

TL;DR: Experimental results demonstrate the proposed rough set based supporting vector machine classifier (RS_SVM) can not only achieve very high classification accuracy but also detect a combination of five informative features, which can give an important clue to the physicians for breast diagnosis.

...read moreread less

Abstract: Breast cancer is becoming a leading cause of death among women in the whole world, meanwhile, it is confirmed that the early detection and accurate diagnosis of this disease can ensure a long survival of the patients. Expert systems and machine learning techniques are gaining popularity in this field because of the effective classification and high diagnostic capability. In this paper, a rough set (RS) based supporting vector machine classifier (RS_SVM) is proposed for breast cancer diagnosis. In the proposed method (RS_SVM), RS reduction algorithm is employed as a feature selection tool to remove the redundant features and further improve the diagnostic accuracy by SVM. The effectiveness of the RS_SVM is examined on Wisconsin Breast Cancer Dataset (WBCD) using classification accuracy, sensitivity, specificity, confusion matrix and receiver operating characteristic (ROC) curves. Experimental results demonstrate the proposed RS_SVM can not only achieve very high classification accuracy but also detect a combination of five informative features, which can give an important clue to the physicians for breast diagnosis.

...read moreread less

Proceedings Article•

Support Vector Machines Under Adversarial Label Noise

[...]

Battista Biggio¹, Blaine Nelson², Pavel Laskov²•Institutions (2)

University of Cagliari¹, University of Tübingen²

17 Nov 2011

TL;DR: This paper assumes that the adversary has control over some training data, and aims to subvert the SVM learning process, and proposes a strategy to improve the robustness of SVMs to training data manipulation based on a simple kernel matrix correction.

...read moreread less

Abstract: In adversarial classication tasks like spam ltering and intrusion detection, malicious adversaries may manipulate data to thwart the outcome of an automatic analysis. Thus, besides achieving good classication performances, machine learning algorithms have to be robust against adversarial data manipulation to successfully operate in these tasks. While support vector machines (SVMs) have shown to be a very successful approach in classication problems, their eectiveness in adversarial classication tasks has not been extensively investigated yet. In this paper we present a preliminary investigation of the robustness of SVMs against adversarial data manipulation. In particular, we assume that the adversary has control over some training data, and aims to subvert the SVM learning process. Within this assumption, we show that this is indeed possible, and propose a strategy to improve the robustness of SVMs to training data manipulation based on a simple kernel matrix correction.

...read moreread less

Journal Article•10.1016/J.ENCONMAN.2010.11.007•

Fine tuning support vector machines for short-term wind speed forecasting

[...]

Junyi Zhou¹, Jing Shi¹, Gong Li¹•Institutions (1)

North Dakota State University¹

01 Apr 2011-Energy Conversion and Management

TL;DR: For the first time, a systematic study on fine tuning of LS-SVM model parameters for one-step ahead wind speed forecasting is presented and it is found that they can outperform the persistence model in the majority of cases.

...read moreread less

Proceedings Article•10.1109/NER.2011.5910636•

EEG-based emotion recognition during watching movies

[...]

Dan Nie¹, Xiao-Wei Wang¹, Li-Chen Shi¹, Bao-Liang Lu¹•Institutions (1)

Shanghai Jiao Tong University¹

23 Jun 2011

TL;DR: This study extracted features from original EEG data and used a linear dynamic system approach to smooth these features and a manifold model was applied to find the trajectory of emotion changes.

...read moreread less

Abstract: This study aims at finding the relationship between EEG signals and human emotions. EEG signals are used to classify two kinds of emotions, positive and negative. First, we extracted features from original EEG data and used a linear dynamic system approach to smooth these features. An average test accuracy of 87.53% was obtained by using all of the features together with a support vector machine. Next, we reduced the dimension of features through correlation coefficients. The top 100 and top 50 subject-independent features were achieved, with average test accuracies of 89.22% and 84.94%, respectively. Finally, a manifold model was applied to find the trajectory of emotion changes.

...read moreread less

Journal Article•10.1016/J.ESWA.2011.03.063•

Intelligent prognostics for battery health monitoring based on sample entropy

[...]

Achmad Widodo¹, Min-Chan Shim², Wahyu Caesarendra², Bo-Suk Yang²•Institutions (2)

Diponegoro University¹, Pukyong National University²

01 Sep 2011-Expert Systems With Applications

TL;DR: RVM outperforms SVM based battery health prognostics and SampEn and estimated state of charge (SOH) are employed as data input and target vector of learning algorithms, respectively.

...read moreread less

Abstract: In this paper, an intelligent prognostic for battery health based on sample entropy (SampEn) feature of discharge voltage is proposed. SampEn can provide computational means for assessing the predictability of a time series and also can quantity the regularity of a data sequence. Therefore, when it is applied to discharge voltage battery data, it could serve an indicator for battery health. In this work, the intelligent ability is introduced by utilizing machine learning methods namely support vector machine (SVM) and relevance vector machine (RVM). SampEn and estimated state of charge (SOH) are employed as data input and target vector of learning algorithms, respectively. The results show that the proposed method is plausible due to the good performance of SVM and RVM in SOH prediction. In our study, RVM outperforms SVM based battery health prognostics.

...read moreread less

...

Expand