New mixed integer fractional programming problem and some multi-objective models for sparse optimization

Question

1. What is the focus of the paper in the Introduction section?

2. What is binary classification?

3. What is the objective of feature selection in SVM?

4. What is the objective of sparse SVM?

Accepted Answer

The focus of the paper in the Introduction section is on Feature Selection (FS) in the general setting of sparse optimization. The paper aims to tackle Feature Selection (FS) by minimizing a specific objective function that includes the l 0 pseudo-norm. The l 0 pseudo-norm counts the number of nonzero components of any vector, and the paper discusses the use of different norms, such as the l 1-norm and polyhedral k-norms, to ensure sparsity in the solution. Additionally, the paper explores the application of sparse optimization in SVM classification and proposes a novel model based on the polyhedral k-norm. The paper also introduces a multi-objective reformulation of the feature selection model, considering SVM classification models as multi-objective optimization problems and obtaining a set of Pareto optimal solutions instead of a single solution.

Accepted Answer

Binary classification is a task where two classes of individuals are represented by two finite sets A and B, and the goal is to classify an input vector x as a member of either set A or B. The training set consists of labeled points in R^n, with labels +1 for set A and -1 for set B. The functional dependency f(x) determines the class membership of a given vector x, and the separating hyperplane P = x: xTw = g(w) separates the two sets, with open halfspaces *P1 and *P2 containing most of the points belonging to A and B, respectively. The convex hulls of A and B must be disjoint for the sets to be linearly separable. Feature selection in SVM aims to suppress as many components of w as possible.

Accepted Answer

The objective of feature selection in SVM is to construct a separating plane that gives good performance on the training set while using a minimum number of problem features. This is achieved by finding a normal vector to the separating hyperplane with the smallest possible number of nonzero components. A sparsity enforcing term is added to the objective function to achieve this. Feature selection is primarily performed to select informative features, and has become an important issue in machine learning. The goal is to minimize classification error and maximize the separation margin, with the trade-off defined by the parameter C. Additionally, the LASSO approach, which suppresses as many elements of the vector w as possible, is obtained by replacing the l2-norm with the l1-norm.

Accepted Answer

The objective of sparse SVM is to control the number of nonzero components of the normal vector to the separating hyperplane while maintaining satisfactory classification accuracy. This is achieved by minimizing the number of misclassified training data and the number of nonzero elements of vector w. Sparse optimization in SVM involves feature selection, where the problem is to minimize the sum of misclassified training data, the number of nonzero elements of vector w, and the number of nonzero components of any vector. This can be represented by the parametric program (9) with the step function s, which selects features based on their contribution to the model. The l1-norm can be used as a simplification, resulting in model (10) that exhibits good sparsity properties of the solution.

Accepted Answer

The k-norm is a mathematical concept introduced by Gaudioso et al. (2020) and Hiriart-Urruty (2022) in the context of feature selection. It is defined as the sum of the k largest components (in modulus) of a vector X, denoted as x[k] = |x_i1| + |x_i2| + ... + |x_ik|, where |x_i1| >= |x_i2| >= ... >= |x_in|. The k-norm is polyhedral, intermediate between 1 and 0, and possesses fundamental properties linking it to optimization problems. In feature selection, the k-norm is used to define a new approach based on Sparse Optimization. By introducing the k-norm, a relaxation for the model is provided, and algorithms for solving the proposed nonlinear model are introduced. The k-norm plays a crucial role in formulating the optimization problem, minimizing the sum of the k-norm and the error function, and ultimately selecting the most relevant features for a given model.

Accepted Answer

A Multi-objective optimization problem (MOP) is given as follows (Ehrgott 2005): Min f (x) = f 1 (x) , . . . , f p (x) subject to x X (23) where X R n , and the objective functions f k : R n - R, k = 1, . . ., p are continuous. The image of the feasible set X under the objective function mapping f is denoted as Y = f (X ). Assuming that at least two objective functions are conflicting in (23), no single x X would generally minimize all f k 's simultaneously. Therefore, Pareto optimality or Pareto efficiency come into play (Ehrgott 2005).

Accepted Answer

The multi-objective optimization problem for feature selection is reformulated as a multi-objective optimization problem (MOP) using the Epsilon constraint method. The reformulations are as follows: Min m1 i=1 yi + m2 l=1 zl, Min -1 x1 nk=1 x[k] subject to - a Ti w + g + 1 <= yi, i = 1, . . . , m1 bTl w - g + 1 <= zl, l = 1, . . . , m2 yi, zl >= 0 (30). This reformulation allows for solving the multi-objective optimization problems using a modified algorithm based on the Epsilon constraint method, as introduced in Sect. 4. The methods presented in Jaggi (2013) and Sivri et al. (2018) can also be used for solving these problems.

Accepted Answer

In this section, numerical experiments are presented to compare the results of different models. The experiments include single-objective problems for all numerical experiments, and MOP reformulations for two of them. The Global Solve solver from the Global Optimization toolbox in MAPLE v.18.01 is used to solve the test problems. The models are run for C=1 and C=10, but results for C=1 are not reported for Test Problems 1 to 3 due to non-zero error in some models.

Accepted Answer

In MOP testing, considering the model as a multi-objective optimization problem allows for obtaining a set of Pareto optimal solutions instead of a single optimal solution. This approach provides a more comprehensive understanding of the trade-offs between different objectives. For example, in Test Problem 5, the l 1, l 2, and our model were considered as two-objective optimization problems. The results showed that for the first Pareto solution with an error value equal to zero, a smaller value for w 1 was achieved compared to the single-objective problem results. Additionally, for the second Pareto solution, the smallest value for w 1 was obtained, but the error value was nonzero. This demonstrates that considering the model as a multi-objective optimization problem can lead to more efficient and effective solutions, as it allows for a balance between different objectives and provides a broader range of potential solutions to explore.

New mixed integer fractional programming problem and some multi-objective models for sparse optimization

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the focus of the paper in the Introduction section?

2. What is binary classification?

3. What is the objective of feature selection in SVM?

4. What is the objective of sparse SVM?

5. What is the k-norm and how is it used in feature selection?

6. What is a Multi-objective optimization problem?

7. How is the multi-objective optimization problem reformulated for feature selection?

8. What numerical experiments are presented in this section?

9. What are the benefits of considering the model as a multi-objective optimization problem in MOP testing?

Citations

Multi-Objective Models for Sparse Optimization in Linear Support Vector Machine Classification

References

Learning representations by back-propagating errors

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods

Correction to: Convex Analysis and Monotone Operator Theory in Hilbert Spaces

Irrelevant features and the subset selection problem

Related Papers (5)

Energy Efficiency Optimization for Distributed Antenna Systems With D2D Communications Under Channel Uncertainty

A method of network programming in problems of nonlinear optimization

A Global Optimization Method for the Nonlinear Sum of Ratios Problem

Predictive Control for Baggage Handling Systems using Mixed Integer Linear Programming

Integration of production scheduling and dynamic optimization for multi-product CSTRs: Generalized Benders decomposition coupled with global mixed-integer fractional programming