Supervised dimensionality reduction for big data.
Joshua T. Vogelstein,Eric W. Bridgeford,Minh Tang,Da Zheng,Christopher Douville,Randal Burns,Mauro Maggioni +6 more
TL;DR: Linear Optimal Low-Rank Projection (LOP) as discussed by the authors extends principal component analysis (PCA) by incorporating class-conditional moment estimates into the low-dimensional projection.
read more
Abstract: To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
2D Materials in Flexible Electronics: Recent Advances and Future Prospectives.
Ajit Kumar Katiyar,Anh Tuan Hoang,Duo Xu,Juyeong Hong,Beom Jin Kim,Seunghyeon Ji,Jong-Hyun Ahn +6 more
TL;DR: 2D materials are highly promising for flexible electronics due to their unique properties and compatibility with other materials. They enable the creation of various flexible electronic devices, including wearable electronics and foldable displays.
68
Artificial intelligence (AI) and machine learning (ML) in precision oncology: a review on enhancing discoverability through multiomics integration.
Lise Wei,D. Niraula,Evan D H Gates,Jie Fu,Yi Luo,Matthew J. Nyflot,Stephen R Bowen,Issam El Naqa,Sunan Cui +8 more
TL;DR: Different categories of multiomics data and their roles in diagnosis and therapy are presented and AI-based data fusion methods and modeling methods as well as different validation schemes are illustrated.
38
First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa
Gemma Turon,Jason Hlozek,John G Woodland,Kelly Chibale,Miquel Duran-Frigola +4 more
TL;DR: ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool to train small-molecule activity prediction models, is presented and it is shown how computational profiling of compounds, prior to synthesis and experimental testing, can increase the rate of progression by up to 40%.
35
Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells
Fabian Offensperger,Gary Tin,Miquel Duran-Frigola,Elisa Hahn,Sarah Dobner,Christopher W. am Ende,Joseph W. Strohbach,Andrea Rukavina,Vincenth Brennsteiner,Kevin Ogilvie,Nara Marella,Katharina Kladnik,Rodolfo Ciuffa,Jaimeen D. Majmudar,S. D. Field,Ariel Bensimon,Luca Ferrari,Evandro Ferrada,Amanda Ng,Zhechun Zhang,Gianluca Degliesposti,A. Boeszoermenyi,Sascha Martens,Robert Stanton,André C. Müller,J. Thomas Hannich,David Hepworth,Giulio Superti-Furga,Stefan Kubicek,Monica Schenone,Georg E. Winter +30 more
TL;DR: The proteome-wide binding preferences of more than 400 small-molecule fragments are determined through a chemoproteomics strategy that is based on treatment of intact cells, and an ML framework to build models that can predict how fragments interact with native proteins on a proteome-wide scale is developed.
34
A review on Alzheimer’s disease classification from normal controls and mild cognitive impairment using structural MR images
TL;DR: A review of structural MRI-based studies for AD detection is presented in this paper , where the performance of various feature extraction methods has been compared and it has been observed that the wavelet transform-based feature extraction method would give promising results for AD classification.
31
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
•Posted Content
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martín Abadi,Ashish Agarwal,Paul Barham,Eugene Brevdo,Zhifeng Chen,Craig Citro,Greg S. Corrado,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Ian Goodfellow,Andrew Harp,Geoffrey Irving,Michael Isard,Yangqing Jia,Rafal Jozefowicz,Lukasz Kaiser,Manjunath Kudlur,Josh Levenberg,Dan Mané,Rajat Monga,Sherry Moore,Derek G. Murray,Chris Olah,Mike Schuster,Jonathon Shlens,Benoit Steiner,Ilya Sutskever,Kunal Talwar,Paul A. Tucker,Vincent Vanhoucke,Vijay K. Vasudevan,Fernanda B. Viégas,Oriol Vinyals,Pete Warden,Martin Wattenberg,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +39 more
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.