Revisiting Feature Selection with Data Complexity

doi:10.1109/BIBE50027.2020.00042

Open AccessProceedings Article10.1109/BIBE50027.2020.00042

Revisiting Feature Selection with Data Complexity

Ngan Thi Dong, +1 more

- 01 Oct 2020

- pp 211-216

8

TL;DR: In this paper, a comparative study of feature selection methods over 27 publicly available datasets evaluated over a range of the selected features using classification as the downstream task was performed, and it was shown that the performance of all studied feature selection method is highly correlated with the error rate of a nearest-neighbor based classifier.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/BIBE50027.2020.00042

Revisiting Feature Selection with Data Complexity

Ngan Thi Dong, +1 more

- 01 Oct 2020

TL;DR: In this paper, a comparative study of feature selection methods over 27 publicly available datasets evaluated over a range of the selected features using classification as the downstream task was performed, and it was shown that the performance of all studied feature selection method is highly correlated with the error rate of a nearest-neighbor based classifier.

...read moreread less

8

•Journal Article•10.56553/popets-2023-0041

Private Graph Extraction via Feature Explanations

01 Apr 2023

- Proceedings on Privacy Enhancing Technol...

TL;DR: In this paper , the authors proposed graph reconstruction attacks with post-hoc feature explanations, and investigated the differences between attack performance with respect to three different classes of explanation methods: gradient-based, perturbation-based and surrogate model-based methods.

...read moreread less

4

•Posted Content•10.1101/2020.08.14.251306

Investigation of Capsule-Inspired Neural Network Approaches for DNA Methylation

Joshua J. Levy, +8 more

- 14 Aug 2020

- bioRxiv

TL;DR: Deep-learning software is presented that group CpGs into user-specified or predefined biologically relevant groupings related to diagnostic and prognostic outcomes and presents opportunities to increase interpretability of disease mechanisms through utilization of biologically relevant annotations.

...read moreread less

2

•Posted Content•10.1101/2020.08.14.251306

MethylSPWNet and MethylCapsNet: Biologically Motivated Organization of DNAm Neural Network, Inspired by Capsule Networks

Joshua J. Levy, +11 more

- 22 Apr 2021

- bioRxiv

TL;DR: MethylCapsNet and MethylSPWNet as discussed by the authors group CpGs into biologically relevant capsules, such as gene promoter context, CpG island relationship, or user-defined groupings, and relate them to diagnostic and prognostic outcomes.

...read moreread less

2

•Journal Article•10.11591/eei.v11i5.3698

The impact of training data selection on the software defect prediction performance and data complexity

Benyamin Langgu Sinaga, +3 more

- 01 Oct 2022

- Bulletin of Electrical Engineering and I...

TL;DR: This study compared 13 training data selection methods on 61 projects using six classification algorithms and measured the data complexity using six complexity measures, concluding that critically selecting the training data method could improve the performance of the prediction model.

...read moreread less

2

References

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

•Journal Article•10.1111/J.1467-9868.2005.00503.X

Regularization and variable selection via the elastic net

Hui Zou, +1 more

- 01 Apr 2005

- Journal of The Royal Statistical Society...

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

20.2K

•Journal Article•10.1073/PNAS.191367098

Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications

Therese Sørlie, +16 more

- 11 Sep 2001

- Proceedings of the National Academy of S...

TL;DR: Survival analyses on a subcohort of patients with locally advanced breast cancer uniformly treated in a prospective study showed significantly different outcomes for the patients belonging to the various groups, including a poor prognosis for the basal-like subtype and a significant difference in outcome for the two estrogen receptor-positive groups.

...read moreread less

11.7K

•Proceedings Article

A Comparative Study on Feature Selection in Text Categorization

Yiming Yang, +1 more

- 08 Jul 1997

TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.

...read moreread less

5.6K

•Journal Article•10.1073/PNAS.96.12.6745

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Uri Alon, +7 more

- 08 Jun 1999

- Proceedings of the National Academy of S...

TL;DR: In this paper, a two-way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.

...read moreread less

4.5K

...

Expand

Revisiting Feature Selection with Data Complexity

Chat with Paper

AI Agents for this Paper

Citations

Revisiting Feature Selection with Data Complexity

Private Graph Extraction via Feature Explanations

Investigation of Capsule-Inspired Neural Network Approaches for DNA Methylation

MethylSPWNet and MethylCapsNet: Biologically Motivated Organization of DNAm Neural Network, Inspired by Capsule Networks

The impact of training data selection on the software defect prediction performance and data complexity

References

Regression Shrinkage and Selection via the Lasso

Regularization and variable selection via the elastic net

Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications

A Comparative Study on Feature Selection in Text Categorization

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Related Papers (5)

Revisiting Feature Selection with Data Complexity

Feature selection strategies for poorly correlated data: correlation coefficient considered harmful

Feature Selection Methods for Cost-Constrained Classification in Random Forests.

Determining appropriate approaches for using data in feature selection

Evaluating the impact of feature selection consistency in software prediction