Reverse-complement parameter sharing improves deep learning models for genomics

Question

1. What are the contributions mentioned in the paper "Reverse-complement parameter sharing improves deep learning models for genomics" ?

2. What is the number of parameters in a convolutional layer?

3. Why are deep learning models particularly enticing for this problem?

4. How can the authors model the natural RC property of inputs such as DNA sequence?

Accepted Answer

Here, the authors show that conventional deep learning models that do not explicitly model this property can produce substantially different predictions on forward and reverse-complement versions of the same DNA sequence.. The authors present four new convolutional neural network layers that leverage the reverse-complement property of genomic DNA sequence by sharing parameters between forward and reverse-complement representations in the model.. Using experiments on simulated and in vivo transcription factor binding data, the authors show that their proposed architectures lead to improved performance, faster learning and cleaner internal representations compared to conventional architectures trained on the same data.

Accepted Answer

For a standard batch normalization layer following a convolutional layer, the number of parameters is (2 × number_of_input_channels), with one set of γ and one set of β parameters per channel.

Accepted Answer

Deep learning models are particularly enticing for this problem because they are capable of inducing hierarchical, predictive patterns of increasing complexity from raw input DNA sequences without relying on explicit featurization (such as featurization into k-mers).

Accepted Answer

By introducing four new layers that share weights between forward and RC representations, the authors can enable neural networks to model the natural RC property of inputs such as DNA sequence.

Accepted Answer

The authors find that the model with RC weight sharing displays consistently higher auROC and auPRC and performance appearsto drops off more slowly with decreasing training set size compared to the other two models.

Accepted Answer

To calculate the number of parameters going into a Dense layer or Weighted Sum layer, it is necessary to calculate the length of the output of the maxpooling layer.

Accepted Answer

When the weight matrix is two-dimensional (as is the case for dense layers), fan_in is the length of the first dimension of the matrix (which corresponds to the number of input neurons per output neuron) and fan_out is the length of the second dimension (which corresponds to the number of output neurons).

Accepted Answer

The authors can compute the reverse complement of this filter by first reversing the weights in the length dimension and then exchanging the weights of A with T and C with G and vice versa.

Accepted Answer

glorot_uniform initializes the weights to be drawn from a uniform distribution with a min of −s and max of s, where s is computed as:sglorot =√ 6fan_in + fan_out (2)In Keras, fan_in and fan_out are computed according to the shape of the weight matrix.

Accepted Answer

The authors can leverage this implementation detail to compute batch statistics for all neurons in both the forward and RC of a channel as follows: the authors first rearrange the incoming tensor so that the activations of the RC channels are concatenated along the length dimension to their forward counterparts (as a result, the authors halve the size of the tensor along the channel dimension and double it along the length dimension).

Accepted Answer

The authors found that the model with RC weight sharing was able to learn a filter that strongly resembles the canonical CTCF motif, while the model lacking weight sharing showed a tendency to collapse the forward and RC versions of the CTCF motif into “palindromic” representations [Fig. 5].

Accepted Answer

this is because although the output size of the convolution is doubled, the number of parameters in subsequent layers is halved, leading them to approximately cancel out except for a small difference introduced by the Weighted Sum layer.

Accepted Answer

The weight matrix of the Weighted Sum layer has dimensions l × c2 , because it learns the weights at eachposition for each channel, and only learn the weights for the forwardchannels (the weights for the reverse channels are found by reversecomplementation; hence the 2 in the denominator).

Accepted Answer

One extension would be to initialize the weights to concord with a genomic prior, such as having higher weights towards the middle of the sequence and lower weights towards the flanks.

Accepted Answer

popular CNN architectures derived from other application domains such as computer vision do not take advantage of properties of data modalities specific to the genomics.

Accepted Answer

Consistent with the results on simulated data, the authors found that models using RC weight sharing gave superior performance on the validation set compared to traditional models with the same or double the number of filters per layer [Fig. 7].

Accepted Answer

Note however that this does not necessarily translate into a reduction in wall-clock time, as the RC architecture has a larger number of output channels (and therefore performs more computations) than the equivalent traditional architecture with the same number of filters.

Accepted Answer

Contribution scores in the convolutional layer were scored by computing the gradient of the logit of the sigmoid w.r.t. a neuron in the convolutional layer and multiplying the gradient by the activation of the neuron.

Reverse-complement parameter sharing improves deep learning models for genomics

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What are the contributions mentioned in the paper "Reverse-complement parameter sharing improves deep learning models for genomics" ?

2. What is the number of parameters in a convolutional layer?

3. Why are deep learning models particularly enticing for this problem?

4. How can the authors model the natural RC property of inputs such as DNA sequence?

5. What is the effect of weight sharing on the performance of the model?

6. How many parameters are needed to calculate the number of input channels?

7. What is the weight matrix for dense layers?

8. How can the authors compute the reverse complement of a filter?

9. What is the default initialization mode for glorot_uniform?

10. How can the authors use this implementation detail to compute batch statistics for all neurons in a channel?

11. What is the effect of weight sharing on the model?

12. Why is the output size of the convolution doubled?

13. What is the weight matrix of a weighted sum layer?

14. What is the way to initialize the weights?

15. What are the main features of the CNN architecture?

16. What is the effect of weight sharing on the validation set?

17. What does the RC architecture have to do with the TF?

18. How did the gradient of the sigmoid affect the contribution scores in the convolutional?

Figures

Citations

Opportunities and obstacles for deep learning in biology and medicine.

A guide to machine learning for biologists.

A primer on deep learning in genomics.

Recurrent Neural Network for Predicting Transcription Factor Binding Sites

stance-Based Protein Folding Powered by Deep Learning.

References

Adam: A Method for Stochastic Optimization

Adam: A Method for Stochastic Optimization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

An integrated encyclopedia of DNA elements in the human genome

An integrated encyclopedia of DNA elements in the human genome.

Related Papers (5)

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Predicting effects of noncoding variants with deep learning–based sequence model

Dropout: a simple way to prevent neural networks from overfitting

Deep learning

Long short-term memory