Exploiting sequence-based features for predicting enhancer-promoter interactions.
TL;DR: This work demonstrates that sequence‐based features alone can reliably predict enhancer‐promoter interactions genome‐wide, which could potentially facilitate the discovery of important sequence determinants for long‐range gene regulation.
read more
Abstract: Motivation A large number of distal enhancers and proximal promoters form enhancer-promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer-promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions. Results Here, we develop a new computational method (named PEP) to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. The two modules in PEP (PEP-Motif and PEP-Word) use different but complementary feature extraction strategies to exploit sequence-based information. The results across six different cell types demonstrate that our method is effective in predicting enhancer-promoter interactions as compared to the state-of-the-art methods that use functional genomic signals. Our work demonstrates that sequence-based features alone can reliably predict enhancer-promoter interactions genome-wide, which could potentially facilitate the discovery of important sequence determinants for long-range gene regulation. Availability and implementation The source code of PEP is available at: https://github.com/ma-compbio/PEP . Contact jianma@cs.cmu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Integrative analysis of 111 reference human epigenomes
Anshul Kundaje,Wouter Meuleman,Jason Ernst,Angela Yen,Pouya Kheradpour,Zhizhuo Zhang,Jianrong Wang,Lucas D. Ward,Abhishek Sarkar,Gerald Quon,Matthew L. Eaton,Yi-Chieh Wu,Andreas R. Pfenning,Xinchen Wang,Melina Claussnitzer,Yaping Liu,Mukul S. Bansal,Soheil Feizi-Khankandi,Ah Ram Kim,Richard C Sallari,Nicholas A Sinnott-Armstrong,Laurie A. Boyer,Elizabeta Gjoneska,Li-Huei Tsai,Manolis Kellis +24 more
- 01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
5K
•Posted Content
Big Bird: Transformers for Longer Sequences
Manzil Zaheer,Guru Guruganesh,Avinava Dubey,Joshua Ainslie,Chris Alberti,Santiago Ontañón,Philip Pham,Anirudh Ravula,Qifan Wang,Li Yang,Amr Ahmed +10 more
TL;DR: It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.
DeePromoter: Robust Promoter Predictor Using Deep Learning.
TL;DR: A robust deep learning model is proposed, called DeePromoter, to analyze the characteristics of the short eukaryotic promoter sequences, and accurately recognize the human and mouse promoter sequences and derives a more challenging negative set from the promoter sequences.
Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism
TL;DR: This article proposes a new deep learning method, namely EPIVAN, that enables predicting long-range EPIs using only genomic sequences and builds a general model, which has transfer ability and can be used to predict EPIs in various cell lines.
161
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
Shashank Singh,Yang Yang,Barnabás Póczos,Jian Ma +3 more
- 01 Jun 2019
TL;DR: A new computational method using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given, is reported.
References
XGBoost: A Scalable Tree Boosting System
Tianqi Chen,Carlos Guestrin +1 more
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Greedy function approximation: A gradient boosting machine.
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
•Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov,Ilya Sutskever,Kai Chen,Greg S. Corrado,Jeffrey Dean +4 more
- 05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
An integrated encyclopedia of DNA elements in the human genome
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Related Papers (5)
[...]
Erez Lieberman Aiden,Nynke L. van Berkum,Louise Williams,Maxim Imakaev,Tobias Ragoczy,Tobias Ragoczy,Agnes Telling,Agnes Telling,Ido Amit,Bryan R. Lajoie,Peter J. Sabo,Michael O. Dorschner,Richard Sandstrom,Bradley E. Bernstein,Bradley E. Bernstein,Michaël Bender,Mark Groudine,Mark Groudine,Andreas Gnirke,John A. Stamatoyannopoulos,Leonid A. Mirny,Eric S. Lander,Eric S. Lander,Job Dekker +23 more
Guoliang Li,Xiaoan Ruan,Raymond K. Auerbach,Kuljeet Singh Sandhu,Meizhen Zheng,Ping Wang,Huay Mei Poh,Yufen Goh,Joanne Lim,Jingyao Zhang,Hui Shan Sim,Su Qin Peh,Fabianus Hendriyan Mulawadi,Chin Thing Ong,Yuriy L. Orlov,Shuzhen Hong,Zhizhuo Zhang,Steve Landt,Debasish Raha,Ghia Euskirchen,Chia-Lin Wei,Weihong Ge,Huaien Wang,Carrie A. Davis,Katherine I. Fisher-Aylor,Ali Mortazavi,Mark Gerstein,Thomas R. Gingeras,Barbara J. Wold,Yi Sun,Melissa J. Fullwood,Edwin Cheung,Edwin Cheung,Edison T. Liu,Wing-Kin Sung,Wing-Kin Sung,Michael Snyder,Yijun Ruan,Yijun Ruan +38 more