Journal Article10.1016/J.ESWA.2007.01.029
Random Forests for multiclass classification: Random MultiNomial Logit
Anita Prinzie,Dirk Van den Poel +1 more
169
TL;DR: This paper proposes the Random MultiNomial Logit (RMNL), i.e. a random forest of MNLs, and compares its predictive performance to that of (a) MNL with expert feature selection, (b) Random Forests of classification trees, and indicates a substantial increase in model accuracy of the RMNL model.
read more
Abstract: Several supervised learning algorithms are suited to classify instances into a multiclass value space. MultiNomial Logit (MNL) is recognized as a robust classifier and is commonly applied within the CRM (Customer Relationship Management) domain. Unfortunately, to date, it is unable to handle huge feature spaces typical of CRM applications. Hence, the analyst is forced to immerse himself into feature selection. Surprisingly, in sharp contrast with binary logit, current software packages lack any feature-selection algorithm for MultiNomial Logit. Conversely, Random Forests, another algorithm learning multiclass problems, is just like MNL robust but unlike MNL it easily handles high-dimensional feature spaces. This paper investigates the potential of applying the Random Forests principles to the MNL framework. We propose the Random MultiNomial Logit (RMNL), i.e. a random forest of MNLs, and compare its predictive performance to that of (a) MNL with expert feature selection, (b) Random Forests of classification trees. We illustrate the Random MultiNomial Logit on a cross-sell CRM problem within the home-appliances industry. The results indicate a substantial increase in model accuracy of the RMNL model to that of the MNL model with expert feature selection.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Handling class imbalance in customer churn prediction
Jonathan Burez,D Van den Poel +1 more
TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.
593
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.
Hooman H. Rashidi,Nam K. Tran,Elham Vali Betts,Lydia P. Howell,Ralph Green +4 more
- 03 Sep 2019
TL;DR: This review provides definitions and basic knowledge of machine learning categories, introduces the underlying concept of the bias-variance trade-off as an important foundation in supervisedMachine learning, and discusses approaches to the supervised machine learning study design.
Big Data Analytics for Dynamic Energy Management in Smart Grids
TL;DR: In this paper, the authors highlight the big data issues and challenges faced by the dynamic energy management (DEM) employed in smart grid networks and propose a promising direction for future research in the field.
246
A Bayesian network based framework for real-time crash prediction on the basic freeway segments of urban expressways.
TL;DR: This manuscript investigates the major shortcomings of the existing models of real-time crash prediction models and offers solutions to overcome them with an improved framework and modeling method.
234
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.
TL;DR: A nonparametric approach to the analysis of areas under correlated ROC curves is presented, by using the theory on generalized U-statistics to generate an estimated covariance matrix.
20.5K
Bagging predictors
Leo Breiman
- 01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Related Papers (5)
[...]
Leo Breiman
- 01 Oct 2001
[...]
Leo Breiman
- 01 Aug 1996
Andy Liaw,Matthew C. Wiener +1 more
- 01 Jan 2007