A novel higher performance nomogram based on explainable machine learning for predicting mortality risk in stroke patients within 30 days based on clinical features on the first day ICU admission

Question

1. What is the purpose of combining nomogram and machine learning?

2. What database was used for stroke patient data?

3. How does LightGBM contribute to predicting mortality in ICU stroke patients?

4. How were the top 10 variables selected for nomogram development?

Accepted Answer

The purpose of combining nomogram and machine learning is to develop a higher performance and easier to use clinical prediction nomogram. This combination leverages the strengths of both approaches, with nomogram providing a visual representation of complex mathematical formulas and machine learning handling non-linear relationships in real-world settings. By integrating these two methods, the resulting nomogram can predict the risk of death in stroke patients within 30 days using available clinical data from the first day of ICU admission. This approach aims to improve the accuracy and usability of clinical prediction models, ultimately enhancing patient care and prognosis.

Accepted Answer

The Medical Information Mart for Intensive Care (MIMIC)-IV database was used for stroke patient data. MIMIC-IV is a contemporary electronic health record dataset that provides clinical data on intensive-care patients admitted to hospitals between 2008 and 2019. The data was de-identified and informed consent was waived by the institutional Review Board at the Beth Israel Deaconess Medical Center. The researchers accessed the MIMIC-IV database and extracted clinical data of stroke patients using the ninth and tenth editions of the International Classification of Diseases code. The inclusion criteria for stroke patients were age between 18 and 89 years old, only one stay_id, and a length of ICU stay less than 30 days. The extracted data included age, gender, ethnicity, laboratory measurements, comorbidities, vital signs, and disease severity assessment within the first day of ICU admission. The type of stroke diagnosis was also included as an important feature for the prognosis of stroke patients. A total of 64 relevant features were extracted in this study.

Accepted Answer

LightGBM, a tree-based ensemble learning algorithm, was utilized in this study to predict the risk of mortality within 30 days for ICU stroke patients. It offers fast speed, high predictive accuracy, and reduced memory usage through Gradient-based One-side Sample and Exclusive Feature Bundling. The datasets were divided into training (80%) and testing (20%) sets. The best-performing LightGBM parameters were determined using Bayesian optimization, aiming to maximize the AUC in the test datasets. The optimized model's quality was assessed using a 5-fold cross-validation approach. Additionally, Shapely additive explanations (SHAP) were applied to interpret the model's output. SHAP, derived from coalitional game theory, evaluates the impact of each variable on the machine learning output through SHAP values. SHAP summary plots were employed to identify feature importance and select suitable variables, while SHAP partial dependency plots (PDPs) were used to determine the cut-off point for the selected variables. Overall, LightGBM and SHAP contribute to the development of explainable machine learning in predicting mortality risk for ICU stroke patients.

Accepted Answer

The top 10 variables were selected based on the SHAP summary plot, prioritizing variables with the highest impact on the model. This selection aimed to simplify the nomogram development process and enhance its clinical applicability. By focusing on these key variables, researchers can create a more streamlined and effective tool for assessing patient outcomes and guiding treatment decisions. The SHAP (SHapley Additive exPlanations) method provides a comprehensive understanding of the contribution of each variable to the model's predictions, allowing for a data-driven selection process. This approach ensures that the most influential factors are considered, leading to a more accurate and reliable nomogram for evaluating stroke patients' risk of mortality.

Accepted Answer

Nomograms for predicting ICU death risk in stroke patients are constructed using logistic regression-based models. The dependent variable is the 30-day survival status after ICU admission. The nomograms are built using 10 continuous and dichotomous continuous variables. The discriminatory power, calibration power, and clinical applicability are evaluated using AUC, NRI, calibration curve, Brier score, and DCA. The performance of the nomograms is compared using the DeLong test and bootstrapped resampling to reduce over-tuning.

Accepted Answer

In the study, all variables except sex, mbp_max, ptt_min, ptt_max, potassium_min, chlor_min, and sodium_min were significantly different between survived and dead stroke patients. The average length of stay (los) of survived patients was higher than those of the deceased counterparts. Figure S2 showed no strong correlation between the 10 selected variables, and the VIF of 10 selected variables was less than 4, indicating no multicollinearity. Therefore, all 10 selected variables were included in the nomogram construction.

Accepted Answer

The most important variable for predicting stroke patient death risk is 'sofa' (sepsis-related failure assessment), as shown in the SHAP summary plot. It ranked highest in importance, indicating a significant negative impact on survival when the sofa value is greater than 4. Other top variables include 'glucose_min', 'sodium_max', 'age', 'spo2_mean', 'temperature_max', 'heart_rate_max', 'bun_min', 'wbc_min', and 'charlson_comorbidity_index'.

Accepted Answer

Categorical variables were shown to be statistically different between stroke survivors and deaths. The kh2 test revealed significant variations in these variables. The Kaplan-Meier survival plots demonstrated a significant decrease in the 30-day overall survival rate for stroke patients in the high-risk subgroup compared to the low-risk subgroup. The Cox proportional risk hazards models further confirmed that the low-risk subgroup had lower 30-day mortality rates than the high-risk subgroup. This indicates that categorical variables play a crucial role in determining the survival rates of stroke patients.

Accepted Answer

EML-N and UC-N have similar AUC values, with no significant difference between them. The DeLong test confirmed this. However, EML-N showed a statistically positive improvement in predicting 30-day mortality compared to UC-N, with a NRI of 6.37%. This indicates that EML-N has better discriminative power in predicting stroke patient outcomes.

Accepted Answer

The difference in risk prediction between UC-N and EML-N for the patient in ICU is significant. According to the nomogram, the patient's total score was 155 in the UC-N, corresponding to a risk of death within 30 days at the 18.5% level. However, in the EML-N, the patient's total score was 381, and her/his risk of ICU death within 30 days was 94.6%. This large discrepancy is attributed to the inconsistency of the score for the 'temperature_max' feature. The UC-N was developed from a logistic regression that suggested a linear correlation between 'temperature_max' and stroke mortality, ignoring the risk of death from lower 'temperature_max' in stroke patients. On the other hand, the EML-N is easier to use in defining the scores of individual features, leading to a more accurate prediction of the patient's risk of death.

Accepted Answer

The study identified several key risk factors for death in stroke patients, including 'sofa' score, 'temperature_max', 'age', 'sodium', 'bun' (blood urea nitrogen), and 'heart rate'. The 'sofa' score, which assesses the state of dysfunction in six aspects of the body, was found to have a significant impact on the risk of death. Patients with a 'sofa' score greater than 4 had a higher risk of death. The maximum body temperature between 36.5 and 37.8 degrees on the first day in the ICU was associated with a reduced risk of death within 30 days. Other risk factors such as age, sodium levels, blood urea nitrogen, and heart rate were also identified in studies based on the MIMIC datasets. The study concluded that these risk factors, along with the nomogram based on EML, can improve patient treatment and care by accurately assessing the risk of short-term death for stroke patients on the first day of ICU admission.

Accepted Answer

In the SHAP feature analysis plot, red dots represent higher feature values that have a positive influence on death risk, while blue dots represent lower feature values that have a negative effect on death risk. The plot ranks features in descending order based on their importance, with each dot representing the SHAP value of a patient at a specific feature value. This visualization helps researchers understand the impact of different features on the outcome, in this case, death risk.

A novel higher performance nomogram based on explainable machine learning for predicting mortality risk in stroke patients within 30 days based on clinical features on the first day ICU admission

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the purpose of combining nomogram and machine learning?

2. What database was used for stroke patient data?

3. How does LightGBM contribute to predicting mortality in ICU stroke patients?

4. How were the top 10 variables selected for nomogram development?

5. How are nomograms constructed for predicting ICU death risk in stroke patients?

6. What variables significantly differ between survived and dead stroke patients?

7. What is the most important variable for predicting stroke patient death risk?

8. How do categorical variables affect stroke survivors' death rates?

9. How does EML-N compare to UC-N in discriminative power?

10. What is the difference in risk prediction between UC-N and EML-N for the patient in ICU?

11. What are the key risk factors for death in stroke patients identified in the study?

12. What do red and blue dots in the SHAP feature analysis plot represent?

References

A unified approach to interpreting model predictions

The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine.

LightGBM: a highly efficient gradient boosting decision tree

Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

Nomograms in oncology: more than meets the eye

Related Papers (5)

The Effect of Ignoring Statistical Interactions in Regression Analyses Conducted in Epidemiologic Studies: An Example with Survival Analysis Using Cox Proportional Hazards Regression Model.

Effect of frailty on marginal regression estimates in survival analysis

A Novel Nomogram Model to Predict the Recurrence-Free Survival and Overall Survival of Hepatocellular Carcinoma

Regression analysis of grouped survival data with application to breast cancer data

Development and validation of a nomogram for predicting overall survival of resected N2 non-small cell lung cancer patients undergoing neoadjuvant radiotherapy.