OUP accepted manuscript
131
TL;DR: Wang et al. as mentioned in this paper proposed a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins, which achieved a higher prediction performance than state-of-the-art methods on the peptide dataset.
read more
Abstract: Abstract Motivation Recently, peptides have emerged as a promising class of pharmaceuticals for various diseases treatment poised between traditional small molecule drugs and therapeutic proteins. However, one of the key bottlenecks preventing them from therapeutic peptides is their toxicity toward human cells, and few available algorithms for predicting toxicity are specially designed for short-length peptides. Results We present ToxIBTL, a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins. Specifically, we use evolutionary information and physicochemical properties of peptide sequences and integrate the information bottleneck principle into a feature representation learning scheme, by which relevant information is retained and the redundant information is minimized in the obtained features. Moreover, transfer learning is introduced to transfer the common knowledge contained in proteins to peptides, which aims to improve the feature representation capability. Extensive experimental results demonstrate that ToxIBTL not only achieves a higher prediction performance than state-of-the-art methods on the peptide dataset, but also has a competitive performance on the protein dataset. Furthermore, a user-friendly online web server is established as the implementation of the proposed ToxIBTL. Availability and implementation The proposed ToxIBTL and data can be freely accessible at http://server.wei-group.net/ToxIBTL. Our source code is available at https://github.com/WLYLab/ToxIBTL. Supplementary information Supplementary data are available at Bioinformatics online.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
OUP accepted manuscript
21 May 2022
TL;DR: Raghava et al. as mentioned in this paper developed a web-based tool, ToxinPred2, for predicting the toxicity of proteins based on similarity, motif-based similarity, and prediction models.
129
ToxinPred2: an improved method for predicting toxicity of proteins
TL;DR: A general method developed for predicting the toxicity of proteins regardless of their source of origin, and a hybrid method that combined all three approaches and achieved a maximum area under receiver operating characteristic curve around 0.99.
51
Machine learning for antimicrobial peptide identification and design
Fangping Wan,Felix Wong,James J Collins,C. de la Fuente-Nunez +3 more
49
THRONE: A New Approach for Accurate Prediction of Human RNA N7-Methylguanosine Sites.
TL;DR: THRONE as discussed by the authors employs a wide range of sequence-based features inputted to several ML classifiers and combines these models through ensemble learning to identify m7G sites from the human genome.
43
ToxinPred 3.0: An improved method for predicting the toxicity of peptides
Anand Singh Rathore,Akanksha Arora,Shubham Choudhury,Purva Tijare,Gajendra P. S. Raghava +4 more
TL;DR: A refined variant of ToxinPred is proposed that showcases improved reliability and accuracy in predicting peptide toxicity, and hybrid or ensemble methods combining two or more models to enhance performance are developed.
37
References
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
•Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
•Proceedings Article
LightGBM: a highly efficient gradient boosting decision tree
Guolin Ke,Qi Meng,Thomas Finley,Taifeng Wang,Wei Chen,Weidong Ma,Qiwei Ye,Tie-Yan Liu +7 more
- 04 Dec 2017
TL;DR: It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.
InterProScan 5: genome-scale protein function classification
Philip Jones,David Binns,Hsin-Yu Chang,Matthew Fraser,Weizhong Li,Craig McAnulla,Hamish McWilliam,John Maslen,Alex L. Mitchell,Gift Nuka,Sebastien Pesseat,Antony F. Quinn,Amaia Sangrador-Vegas,Maxim Scheremetjew,Siew-Yit Yong,Rodrigo Lopez,Sarah Hunter +16 more
TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
8K
A Comprehensive Survey on Transfer Learning
Fuzhen Zhuang,Zhiyuan Qi,Keyu Duan,Dongbo Xi,Yongchun Zhu,Hengshu Zhu,Hui Xiong,Qing He +7 more
- 01 Jan 2021
TL;DR: Transfer learning aims to improve the performance of target learners on target domains by transferring the knowledge contained in different but related source domains as discussed by the authors, in which the dependence on a large number of target-domain data can be reduced for constructing target learners.
5.3K