Exploiting sequence-based features for predicting enhancer-promoter interactions.

doi:10.1093/BIOINFORMATICS/BTX257

Open AccessJournal Article10.1093/BIOINFORMATICS/BTX257

Exploiting sequence-based features for predicting enhancer-promoter interactions.

Yang Yang, +3 more

- 15 Jul 2017

- Bioinformatics

- Vol. 33, Iss: 14

109

TL;DR: This work demonstrates that sequence‐based features alone can reliably predict enhancer‐promoter interactions genome‐wide, which could potentially facilitate the discovery of important sequence determinants for long‐range gene regulation.

Abstract: Motivation A large number of distal enhancers and proximal promoters form enhancer-promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer-promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions. Results Here, we develop a new computational method (named PEP) to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. The two modules in PEP (PEP-Motif and PEP-Word) use different but complementary feature extraction strategies to exploit sequence-based information. The results across six different cell types demonstrate that our method is effective in predicting enhancer-promoter interactions as compared to the state-of-the-art methods that use functional genomic signals. Our work demonstrates that sequence-based features alone can reliably predict enhancer-promoter interactions genome-wide, which could potentially facilitate the discovery of important sequence determinants for long-range gene regulation. Availability and implementation The source code of PEP is available at: https://github.com/ma-compbio/PEP . Contact jianma@cs.cmu.edu. Supplementary information Supplementary data are available at Bioinformatics online.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Big Bird: Transformers for Longer Sequences

Manzil Zaheer, +10 more

- 28 Jul 2020

- arXiv: Learning

TL;DR: It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.

...read moreread less

1.4K

•Journal Article•10.3389/FGENE.2019.00286

DeePromoter: Robust Promoter Predictor Using Deep Learning.

Mhaned Oubounyt, +3 more

- 05 Apr 2019

- Frontiers in Genetics

TL;DR: A robust deep learning model is proposed, called DeePromoter, to analyze the characteristics of the short eukaryotic promoter sequences, and accurately recognize the human and mouse promoter sequences and derives a more challenging negative set from the promoter sequences.

...read moreread less

168

Journal Article•10.1093/BIOINFORMATICS/BTZ694

Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism

Zengyan Hong, +3 more

- 06 Sep 2019

- Bioinformatics

TL;DR: This article proposes a new deep learning method, namely EPIVAN, that enables predicting long-range EPIs using only genomic sequences and builds a general model, which has transfer ability and can be used to predict EPIs in various cell lines.

...read moreread less

161

•Journal Article•10.1007/S40484-019-0154-0

Predicting enhancer-promoter interaction from genomic sequence with deep neural networks

Shashank Singh, +3 more

- 01 Jun 2019

TL;DR: A new computational method using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given, is reported.

...read moreread less

159

...

Expand

References

•Proceedings Article•10.1145/2939672.2939785

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, +1 more

- 09 Mar 2016

- arXiv: Learning

TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.

...read moreread less

32.8K

•Journal Article•10.1214/AOS/1013203451

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001

- Annals of Statistics

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

...read moreread less

26.4K

•Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

24.1K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

•Journal Article•10.1038/NATURE11247

An integrated encyclopedia of DNA elements in the human genome

Principal investigators, +3 more

- 06 Sep 2012

- Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

17.5K

...

Expand

Related Papers (5)

A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping

[...]

Suhas S.P. Rao, +15 more

- 18 Dec 2014

- Cell

Topological domains in mammalian genomes identified by analysis of chromatin interactions

[...]

Jesse R. Dixon, +10 more

- 17 May 2012

- Nature

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

[...]

Babak Alipanahi, +3 more

- 01 Aug 2015

- Nature Biotechnology

Exploiting sequence-based features for predicting enhancer-promoter interactions.

Chat with Paper

AI Agents for this Paper

Citations

Integrative analysis of 111 reference human epigenomes

Big Bird: Transformers for Longer Sequences

DeePromoter: Robust Promoter Predictor Using Deep Learning.

Identifying enhancer–promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism

Predicting enhancer-promoter interaction from genomic sequence with deep neural networks

References

XGBoost: A Scalable Tree Boosting System

Greedy function approximation: A gradient boosting machine.

Distributed Representations of Words and Phrases and their Compositionality

Distributed Representations of Words and Phrases and their Compositionality

An integrated encyclopedia of DNA elements in the human genome

Related Papers (5)

A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping

Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

Topological domains in mammalian genomes identified by analysis of chromatin interactions

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation.