About: Dot plot (statistics) is a research topic. Over the lifetime, 26 publications have been published within this topic receiving 555 citations. The topic is also known as: dot diagram.
TL;DR: D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots, which enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files.
Abstract: Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.
TL;DR: The analyses show that measuring the agreement between predicted and comparative secondary structure models underestimates the reliability of structural prediction by mfold, and predicted domains correspond closely with structural domains found by the comparative method in the same RNAs.
Abstract: Recent structural analyses of genomic RNAs from RNA coliphages suggest that both well-determined base paired helices and well-determined structural domains that are identified by "energy dot plot" analysis using the RNA folding package mfold, are likely to be predicted correctly. To test these observations with another group of large RNAs, we have analyzed 15 ribosomal RNAs. Published secondary structure models that were derived by comparative sequence analysis were used to evaluate the predicted structures. Both the optimal predicted fold and the predicted "energy dot plot" of each sequence were examined. Each prediction was obtained from a single computer run on an entire ribosomal RNA sequence. All predicted base pairs in optimal foldings were examined for agreement with proven base pairs in the comparative models. Our analyses show that the overall correspondence between the predicted and comparative models varied for different RNAs and ranges from a low of 27% to high of 70%, with a mean value of 49%. The correspondence improves to a mean value of 81% when the analysis is limited to well-determined helices. In addition to well-determined helices, large well-determined structural domains can be observed in "energy dot plots" of some 16S ribosomal RNAs. The predicted domains correspond closely with structural domains that are found by the comparative method in the same RNAs. Our analyses also show that measuring the agreement between predicted and comparative secondary structure models underestimates the reliability of structural prediction by mfold.
TL;DR: This work presents a technique for converting a basic D3 chart into a reusable style template, and demonstrates the effectiveness of this approach by applying a diverse set of style templates to a variety of source datasets.
Abstract: We present a technique for converting a basic D3 chart into a reusable style template. Then, given a new data source we can apply the style template to generate a chart that depicts the new data, but in the style of the template. To construct the style template we first deconstruct the input D3 chart to recover its underlying structure: the data, the marks and the mappings that describe how the marks encode the data. We then rank the perceptual effectiveness of the deconstructed mappings. To apply the resulting style template to a new data source we first obtain importance ranks for each new data field. We then adjust the template mappings to depict the source data by matching the most important data fields to the most perceptually effective mappings. We show how the style templates can be applied to source data in the form of either a data table or another D3 chart. While our implementation focuses on generating templates for basic chart types (e.g., variants of bar charts, line charts, dot plots, scatterplots, etc.), these are the most commonly used chart types today. Users can easily find such basic D3 charts on the Web, turn them into templates, and immediately see how their own data would look in the visual style (e.g., colors, shapes, fonts, etc.) of the templates. We demonstrate the effectiveness of our approach by applying a diverse set of style templates to a variety of source datasets.
TL;DR: The authors presented an introduction to descriptive statistics, including measures of central tendency, mean, mode and median; measures of spread, which include the range, interquartile range, variance and standard deviation; and graphical displays, including dot plot, histogram, bar chart, pie chart, and box and whiskers.
Abstract: This chapter presents an introduction to descriptive statistics. It discusses such topics as measures of central tendency, which includes the mean, mode and median; measures of spread, which include the range, interquartile range, variance and standard deviation; and graphical displays, which include dot plot, histogram, bar chart, pie chart, and box and whiskers. Finally, the chapter discusses shapes of frequency distributions, including the skewness and the peakedness.
TL;DR: A stand alone, platform independent, graphic alignment tool for comparative sequence analysis that uses the NCBI-BLASTN program and extensive post-processing to exhaustively align two DNA sequences and provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0–200 kb, pairwise, sequence analysis.
Abstract: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.