Variable Selection for Multi-Purpose Multivariate Data Analysis
Myung-Hoe Huh,Yong-Bin Lim,Yong-Goo Lee +2 more
- 01 Jan 2008
- Vol. 21, Iss: 1, pp 141-149
TL;DR: In this paper, the authors proposed a method for selecting a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables.
read more
Abstract: Recently we frequently analyze multivariate data with quite large number of variables. In such data sets, virtually duplicated variables may exist simultaneously even though they are conceptually distinguishable. Duplicate variables may cause problems such as the distortion of principal axes in principal component analysis and factor analysis and the distortion of the distances between observations, i.e. the input for cluster analysis. Also in supervised learning or regression analysis, duplicated explanatory variables often cause the instability of fitted models. Since real data analyses are aimed often at multiple purposes, it is necessary to reduce the number of variables to a parsimonious level. The aim of this paper is to propose a practical algorithm for selection of a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables. The usefulness of proposed method is demonstrated in visualizing the relationship between selected and unselected variables, in building a predictive model with very large number of independent variables, and in reducing the number of variables and purging/merging categories in categorical data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Stability of symptom clusters and sentinel symptoms during the first two cycles of adjuvant chemotherapy.
TL;DR: A core set of symptoms that form stable symptom clusters during the 1st and 2nd cycles of adjuvant chemotherapy (CTx) are identified, which could facilitate efficient symptom management.
25
Multi-purpose SNP Selection by the principal variables for a genetic study
Seung-Hyun Lee,Mira Park +1 more
- 09 Nov 2015
TL;DR: This work proposes an unsupervised SNP selection algorithm based on the principal variable (PV) method that achives the dimensionality reduction by selecting a subset of original variables called PVs that preserve as much information as possible.
2
Principal variable approach to multipurpose SNP selection in genetic association studies
Seung-Hyun Lee,Taesung Park,Mira Park +2 more
- 01 Jan 2016
TL;DR: An unsupervised SNP selection algorithm based on the principal variable approach called the multipurpose SNP selection MP-SNP method is proposed and shows good performance in selecting the informative SNPs and also provides well-explained cluster structures.
1
Principal Variable Approach to Multipurpose SNP Selection in Genetic Association Studies
Seunghyun Lee,Taesung Park,Mira Park +2 more
TL;DR: This study proposes the multipurpose SNP selection (MP-SNP) method, an unsupervised algorithm that selects informative SNPs using the principal variable approach, effectively eliminating redundant SNPs and preserving original variable structure for joint analysis in GWAS.
References
A stopping rule for structure-preserving variable selection
TL;DR: A stopping rule is provided for the backward elimination process suggested by Krzanowski (1987a) for selecting variables to preserve data structure based on perturbation theory for Procrustes statistics.
29