Long-Term-Based Road Blackspot Screening Procedures by Machine Learning Algorithms
Nicholas Fiorentini,Massimo Losa +1 more
TL;DR: A road blackspot screening procedure for two-lane rural roads, relying on five different machine learning algorithms (MLAs) and real long-term traffic data, which shows that the Random Forest outperforms the other MLAs with an overall accuracy of 73.53%.
read more
Abstract: Screening procedures in road blackspot detection are essential tools for road authorities for quickly gathering insights on the safety level of each road site they manage. This paper suggests a road blackspot screening procedure for two-lane rural roads, relying on five different machine learning algorithms (MLAs) and real long-term traffic data. The network analyzed is the one managed by the Tuscany Region Road Administration, mainly composed of two-lane rural roads. An amount of 995 road sites, where at least one accident occurred in 2012–2016, have been labeled as “Accident Case”. Accordingly, an equal number of sites where no accident occurred in the same period, have been randomly selected and labeled as “Non-Accident Case”. Five different MLAs, namely Logistic Regression, Classification and Regression Tree, Random Forest, K-Nearest Neighbor, and Naive Bayes, have been trained and validated. The output response of the MLAs, i.e., crash occurrence susceptibility, is a binary categorical variable. Therefore, such algorithms aim to classify a road site as likely safe (“Accident Case”) or potentially susceptible to an accident occurrence (“Non-Accident Case”) over five years. Finally, algorithms have been compared by a set of performance metrics, including precision, recall, F1-score, overall accuracy, confusion matrix, and the Area Under the Receiver Operating Characteristic. Outcomes show that the Random Forest outperforms the other MLAs with an overall accuracy of 73.53%. Furthermore, all the MLAs do not show overfitting issues. Road authorities could consider MLAs to draw up a priority list of on-site inspections and maintenance interventions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Surface motion prediction and mapping for road infrastructures management by PS-InSAR measurements and machine learning algorithms
TL;DR: Persistent Scatterer Interferometric Synthetic Aperture Radar measurements, geospatial analyses, and Machine Learning Algorithms (MLAs) are employed for predicting and mapping surface motion beneath road pavement structures caused by environmental factors.
42
Review on Lane Detection and Tracking Algorithms of Advanced Driver Assistance System
TL;DR: In this article, a comparative qualitative analysis of lane detection and tracking algorithms is performed to identify gaps in knowledge and to evaluate the performance of these algorithms on real-time data sets.
33
Overfitting Prevention in Accident Prediction Models: Bayesian Regularization of Artificial Neural Networks
TL;DR: Results demonstrate that the BR-ANN markedly outperforms the GD-ANN, which suffers severe overfitting issues, and road authorities could consider regularized ANNs for performing appropriate safety analyses, especially when dealing with small road sample sizes.
22
Can Machine Learning and PS-InSAR Reliably Stand in for Road Profilometric Surveys?
TL;DR: In this article, the authors proposed a methodology for correlating products derived by Synthetic Aperture Radar (SAR) measurements and laser profilometric road roughness surveys, based on two previous studies, in which several machine learning algorithms have been calibrated for predicting the average vertical displacement (in terms of mm/year) of road pavements as a result of exogenous phenomena occurrence, such as subsidence.
15
Defining machine learning algorithms as accident prediction models for Italian two-lane rural, suburban, and urban roads
TL;DR: The computation of Predictor Importance shows that traffic flow, the density of intersections, driveway density, and type of area are the most impacting factors on crash likelihood.
3
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
21.8K
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Nearest neighbor pattern classification
Thomas M. Cover,Peter E. Hart +1 more
TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
•Proceedings Article
A study of cross-validation and bootstrap for accuracy estimation and model selection
Ron Kohavi
- 20 Aug 1995
TL;DR: The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.