1. What is polygenic risk score?
Polygenic risk score (PRS) quantifies an individual's genetic risk for a complex disease based on multiple genetic variants across their genome. PRS is estimated using statistical methods like BLUP and LDPred, and machine learning approaches. It assumes linear and independent effects of risk variants on a phenotype. Cross-trait analyses reveal shared genetic determinants among diseases, which can improve PRS estimation using multi-task learning (MTL) approaches.
read more
2. How many diseases were included in the pandisease MTL model?
In the pandisease MTL model, 77 diseases were included, including 17 types of cancers and 60 non-cancer diseases. These diseases had prevalence levels higher than 0.5% in the UKB cohort. The model was constructed to predict all 362 diseases using the United Kingdom Biobank dataset, which contained 805,426 SNPs genotyped in the cohort. The dataset was divided into training, validation, and test sets for model training, hyperparameter optimization, and performance benchmarking, respectively.
read more
3. How do pan-cancer MTL and pan-disease MTL models compare to STL in predicting malignant melanoma PRS?
The pan-cancer MTL and pan-disease MTL models showed better predictive performance than the STL model for malignant melanoma PRS. The pan-cancer MTL model achieved a 9.2% higher AUC and the pan-disease MTL model achieved an 8.1% higher AUC compared to the STL model. Both MTL models also achieved higher precision at the same recall level. The pan-cancer MTL model had a 141% relative increase in ROC AUC and a 96% relative increase in PR AUC over the STL model. The pan-disease MTL model had a 153% relative increase in ROC AUC and an 83% relative increase in PR AUC over the STL model. These results suggest that MTL models offer significant improvements in predicting malignant melanoma PRS compared to the STL model.
read more
4. What algorithm identified important SNPs for MTL?
The first-order model-wise LINA interpretation algorithm [26] was used to identify the important SNPs used by MTL to predict each disease. It trained and interpreted a pan-cancer MTL model using a whole-genome vector containing real SNPs and decoy SNPs. The algorithm identified 48 real SNPs as important for predicting malignant melanoma at an estimated FDR level of 0.1%. Many of these important SNPs have been identified as risk variants for melanoma in previous GWAS studies. The algorithm also identified important SNPs for the 17 prevalent cancers at FDR levels of 0.1% and 5%.
read more