Big Data Regression Using Tree Based Segmentation
Rajiv Sambasivan,Sourish Das +1 more
- 24 Jul 2017
TL;DR: A two step approach to scaling regression to large datasets using a regression tree (CART) to segment the large dataset constitutes the first step of this approach and can yield models that have good explanatory power as well as good predictive performance.
read more
Abstract: Scaling regression to large datasets is a common problem in many application areas. We propose a two step approach to scaling regression to large datasets. Using a regression tree (CART) to segment the large dataset constitutes the first step of this approach. The second step of this approach is to develop a suitable regression model for each segment. Since segment sizes are not very large, we have the ability to apply sophisticated regression techniques if required. A nice feature of this two step approach is that it can yield models that have good explanatory power as well as good predictive performance. Ensemble methods like Gradient Boosted Trees can offer excellent predictive performance but may not provide interpretable models. In the experiments reported in this study, we found that the predictive performance of the proposed approach matched the predictive performance of Gradient Boosted Trees.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Predictors of Turnover Intention in U.S. Federal Government Workforce: Machine Learning Evidence That Perceived Comprehensive HR Practices Predict Turnover Intention:
TL;DR: In this article, the authors identify important predictors of turnover intention and characterize subgroups of U.S. federal employees at high risk for turnover intention using data from the 2018 Fed Employee Turnover Survey.
33
Classification and regression using augmented trees
Rajiv Sambasivan,Sourish Das +1 more
TL;DR: An algorithm for regression and classification tasks on big datasets using augmented tree models that are interpretable while being as accurate as ensemble methods such as random forests or gradient boosted trees is presented.
6
Prognostic techniques for aeroengine health assessment and Remaining Useful Life estimation
A. Caricato,A. Ficarella,L. Spada Chiodo +2 more
- 01 Sep 2021
TL;DR: In this paper, Remaining useful life (RUL) estimates were carried out for different turbofan engines, based on historical individual and fleet data made available by the Prognostics Center of Excellence at NASA.
A Bayesian perspective of statistical machine learning for big data
TL;DR: This paper provides a review of SML from a Bayesian decision theoretic point of view -- where it is argued that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm.
Machine Learning Models for the Seasonal Forecast of Winter Surface Air Temperature in North America
TL;DR: The results of this study suggest that the ML models may provide improved forecasting skill for seasonal forecasts of the winter climate in NA.
References
•Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
XGBoost: A Scalable Tree Boosting System
Tianqi Chen,Carlos Guestrin +1 more
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K