1. What is event data in the context of this study?
Event data in this study refers to a collection of data items containing at least a time-stamp and failure/event code. It is typically collected to record a change at a point in time and can be gathered from multiple sources, sensors, or devices. The on-board diagnostic system in the DMU trains monitors various vehicle systems and logs events when specific criteria are met, such as thresholds or detected changes. The event data used in this study is sparse, meaning events are not logged continuously for each sensor, adding complexity to the analysis. The dataset contains 14,483,278 records and 379 features related to the operating functions of a DMU passenger train, with a focus on engine failure prediction.
read more
2. How was bias removed from engine failure data?
To remove bias from engine failure data, genuine ESBT events were identified and false ESBT events were removed. False ESBT events were classified as those that occurred at the servicing depot, when the train was not in motion, or when multiple ESBT events were recorded successively with little time in between. A geofencing polygon was constructed around the servicing depot to identify false ESBT events. The haversine distance was used to determine if the train was in motion during an ESBT event. Additionally, values outside the sensor range were removed and replaced with 'null'. This preprocessing ensured the data were not biased prior to training, which could result in a high false positive incidence.
read more
3. How are positive examples extracted from ESBT events?
Positive examples are extracted by identifying genuine ESBT events after removing known false positives. For each unique DMU engine, the first ESBT event is found and labeled T 0h. Data 3 h prior (T -3h) is filtered and divided into 15 min intervals. If no other ESBT event is found between T -3h and T 0h, the mean and standard deviation of the features in each interval are calculated. If a second ESBT event is found within the 3 h window, only intervals between the first and second event are used. These instances with mean and standard deviation values are labeled as positive examples, representing an ESBT event.
read more
4. How are negative examples created in the study?
Negative examples are created by selecting a random point in the timeline of data, checking for ESBT events within a 6-hour window. If no ESBT event is found, data between -3 hours and 0 hours is divided into 15-minute intervals, and mean and standard deviation values are calculated for each interval. An arbitrary value of 3 iterations is chosen for each unique unit before moving to the next. These instances are labelled as negative examples, representing no occurrence of an ESBT event.
read more