Supersparse linear integer models for predictive scoring systems

Open AccessProceedings Article

Supersparse linear integer models for predictive scoring systems

- 01 Jan 2013

- pp 128-130

7

TL;DR: Supersparse Linear Integer Models (SLIM) produces scoring systems that are accurate and interpretable using a mixed-integer program (MIP) whose objective penalizes the training error, L0-norm and L1-norm of its coefficients.

Abstract: Scoring systems are classification models that make predictions using a sparse linear combination of variables with integer coefficients Such systems are frequently used in medicine because they are interpretable; that is, they only require users to add, subtract and multiply a few meaningful numbers in order to make a prediction See, for instance, these commonly used scoring systems: (Gage et al 2001; Le Gall et al 1984; Le Gall, Lemeshow, and Saulnier 1993; Knaus et al 1985) Scoring systems strike a delicate balance between accuracy and interpretability that is difficult to replicate with existing machine learning algorithms Current linear methods such as the lasso, elastic net and LARS are not designed to create scoring systems, since regularization is primarily used to improve accuracy as opposed to sparsity and interpretability (Tibshirani 1996; Zou and Hastie 2005; Efron et al 2004) These methods can produce very sparse models through heavy regularization or feature selection methods (Guyon and Elisseeff 2003); however, feature selection often relies on greedy optimization and cannot guarantee an optimal balance between sparsity and accuracy Moreover, the interpretability of scoring systems requires integer coefficients, which these methods do not produce Existing approaches to interpretable modeling include decision trees and lists (Ruping 2006; Quinlan 1986; Rivest 1987; Letham et al 2013) We introduce a formal approach for creating scoring systems, called Supersparse Linear Integer Models (SLIM) SLIM produces scoring systems that are accurate and interpretable using a mixed-integer program (MIP) whose objective penalizes the training error, L0-norm and L1-norm of its coefficients SLIM can create scoring systems for datasets with thousands of training examples and tens to hundreds of features - larger than the sizes of most studies in medicine, where scoring systems are often used

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1007/S10994-015-5529-5

Learning classification models of cognitive conditions from subtle behaviors in the digital Clock Drawing Test

William Souillard-Mandar, +8 more

- 01 Mar 2016

- Machine Learning

TL;DR: In this paper, the Clock Drawing Test (DDT) was used as a screening tool to differentiate normal individuals from those with cognitive impairment, and has proven useful in helping to diagnose cognitive dysfunction associated with neurological disorders such as Alzheimer's disease, Parkinson's disease and other dementias and conditions.

...read moreread less

164

•Posted Content

Learning Optimized Risk Scores

Berk Ustun, +1 more

- 01 Oct 2016

- arXiv: Machine Learning

TL;DR: In this article, the authors formulate the risk score problem as a mixed integer nonlinear program, and present a cutting plane algorithm for non-convex settings to efficiently recover its optimal solution.

...read moreread less

18

•Journal Article•10.1527/TJSAI.AI30-I

Piecewise sparse linear classification via factorized asymptotic bayesian inference

Ryohei Fujimaki, +2 more

- 01 Jan 2016

- Transactions of The Japanese Society for...

TL;DR: A refined version of factorized information criterion is derived which offers a better approximation of Bayesian marginal log-likelihood and an analytic quadratic lower bounding technique is introduced in an EM-like iterative optimization process of FAB/HME, which drastically reduces computational cost.

...read moreread less

4

•Book Chapter•10.1287/ICS.2015.0017

The Support Vector Machine and Mixed Integer Linear Programming: Ramp Loss SVM with L1-Norm Regularization

Eric J. Hess, +1 more

- 01 Jan 2015

TL;DR: In this paper, the authors combine the ideas of ramp loss SVM with L1-norm regularization, which results in a mixed integer linear program (MILP) formulation of SVM.

...read moreread less

4

Journal Article•10.1109/TPDS.2019.2892972

An Empirical Study on Distributed Bayesian Approximation Inference of Piecewise Sparse Linear Models

Masato Asahara, +1 more

- 01 Jul 2019

- IEEE Transactions on Parallel and Distri...

TL;DR: An empirical study on the derivation of a distributed factorized asymptotic Bayesian (FAB) inference of learning piece-wise sparse linear models on distributed memory architectures from the original FAB inference algorithm achieves high prediction accuracy and performance scalability with both synthetic and public benchmark data.

...read moreread less

4

References

Journal Article•10.1111/J.2517-6161.1996.TB02080.X

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996

- Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

45.4K

•Journal Article•10.1111/J.1467-9868.2005.00503.X

Regularization and variable selection via the elastic net

Hui Zou, +1 more

- 01 Apr 2005

- Journal of The Royal Statistical Society...

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

20.2K

•Journal Article•10.1023/A:1022643204877

Induction of Decision Trees

J. R. Quinlan

- 25 Mar 1986

- Machine Learning

TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.

...read moreread less

18.8K

Journal Article•10.1097/00003246-198510000-00009

APACHE II: a severity of disease classification system.

William A. Knaus, +3 more

- 01 Oct 1985

- Critical Care Medicine

TL;DR: The form and validation results of APACHE II, a severity of disease classification system that uses a point score based upon initial values of 12 routine physiologic measurements, age, and previous health status, are presented.

...read moreread less

15.9K

•Journal Article•10.1162/153244303322753616

An introduction to variable and feature selection

Isabelle Guyon, +1 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

15.5K