SNP annotation

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1093/NAR/GKQ603•

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

[...]

Kai Wang¹, Mingyao Li¹, Hakon Hakonarson¹•Institutions (1)

Children's Hospital of Philadelphia¹

01 Sep 2010-Nucleic Acids Research

TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.

...read moreread less

Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

...read moreread less

14,196 citations

Journal Article•10.1101/GR.266932.120•

Accessing NCBI data using the NCBI Sequence Viewer and Genome Data Viewer (GDV)

[...]

Sanjida H. Rangwala¹, Anatoliy Kuznetsov¹, Victor Ananiev¹, Andrea Asztalos¹, Evgeny Borodin¹, Vladislav Evgeniev¹, Victor Joukov¹, Vadim Lotov¹, Ravinder Pannu¹, Dmitry Rudnev¹, Andrew Shkeda¹, Eric M. Weitz¹, Valerie A. Schneider¹ - Show less +9 more•Institutions (1)

National Institutes of Health¹

01 Jan 2021-Genome Research

TL;DR: This work describes how members of the biomedical research community can use GDV and the related NCBI Sequence Viewer (SV) to access, analyze, and disseminate NCBI and custom biomedical sequence data and reports how users can add SV to their own web pages to create a custom graphical sequence display without the need for infrastructure investments or back-end deployments.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) is an archive providing free access to a wide range and large volume of biological sequence data and literature. Staff scientists at NCBI analyze user-submitted data in the archive, producing gene and SNP annotation and generating sequence alignment tools. NCBI's flagship genome browser, Genome Data Viewer (GDV), displays our in-house RefSeq annotation; is integrated with other NCBI resources such as Gene, dbGaP, and BLAST; and provides a platform for customized analysis and visualization. Here, we describe how members of the biomedical research community can use GDV and the related NCBI Sequence Viewer (SV) to access, analyze, and disseminate NCBI and custom biomedical sequence data. In addition, we report how users can add SV to their own web pages to create a custom graphical sequence display without the need for infrastructure investments or back-end deployments.

...read moreread less

174 citations

Journal Article•10.1093/MOLBEV/MSW102•

The Role of Deleterious Substitutions in Crop Genomes

[...]

Thomas J. Y. Kono¹, Fengli Fu¹, Mohsen Mohammadi¹, Paul J. Hoffman¹, Chaochih Liu¹, Robert M. Stupar¹, Kevin P. Smith¹, Peter Tiffin¹, Justin C. Fay², Peter L. Morrell¹ - Show less +6 more•Institutions (2)

University of Minnesota¹, University of Washington²

14 Jun 2016-Molecular Biology and Evolution

TL;DR: It is concluded that individual cultivars carry hundreds of deleterious SNPs on average, and that nonsense variants make up a minority of deleters in the protein-coding regions of the genomes of two crops.

...read moreread less

Abstract: Populations continually incur new mutations with fitness effects ranging from lethal to adaptive. While the distribution of fitness effects of new mutations is not directly observable, many mutations likely either have no effect on organismal fitness or are deleterious. Historically, it has been hypothesized that a population may carry many mildly deleterious variants as segregating variation, which reduces the mean absolute fitness of the population. Recent advances in sequencing technology and sequence conservation-based metrics for inferring the functional effect of a variant permit examination of the persistence of deleterious variants in populations. The issue of segregating deleterious variation is particularly important for crop improvement, because the demographic history of domestication and breeding allows deleterious variants to persist and reach moderate frequency, potentially reducing crop productivity. In this study, we use exome resequencing of 15 barley accessions and genome resequencing of 8 soybean accessions to investigate the prevalence of deleterious single nucleotide polymorphisms (SNPs) in the protein-coding regions of the genomes of two crops. We conclude that individual cultivars carry hundreds of deleterious SNPs on average, and that nonsense variants make up a minority of deleterious SNPs. Our approach identifies known phenotype-altering variants as deleterious more frequently than the genome-wide average, suggesting that putatively deleterious variants are likely to affect phenotypic variation. We also report the implementation of a SNP annotation tool BAD_Mutations that makes use of a likelihood ratio test based on alignment of all currently publicly available Angiosperm genomes.

...read moreread less

92 citations

Journal Article•10.1186/1471-2105-12-S4-S2•

Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites

[...]

Kirsten Faber¹, Karl-Heinz Glatting¹, Phillip J Mueller¹, Angela Risch¹, Agnes Hotz-Wagenblatt¹ - Show less +1 more•Institutions (1)

German Cancer Research Center¹

05 Jul 2011-BMC Bioinformatics

TL;DR: A new tool called AASsites is designed which searches for SNPs which modify splicing and identified 301 “likely” and 985 “probable” classified SNPs with such characteristics.

...read moreread less

Abstract: Background Some single nucleotide polymorphisms (SNPs) are known to modify the risk of developing certain diseases or the reaction to drugs. Due to next generation sequencing methods the number of known human SNPs has grown. Not all SNPs lead to a modified protein, which may be the origin of a disease. Therefore, the recognition of functional SNPs is needed. Because most SNP annotation tools look for SNPs which lead to an amino acid exchange or a premature stop, we designed a new tool called AASsites which searches for SNPs which modify splicing.

...read moreread less

71 citations

Journal Article•10.3389/FGENE.2021.655707•

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

[...]

Frédéric Jehl, Fabien Degalez, Maria Bernard, Frédéric Lecerf, Laetitia Lagoutte, Colette Désert, Manon Coulée, Olivier Bouchez, Sophie Leroux¹, Behnam Abasht², Michèle Tixier-Boichard³, Bertrand Bed'Hom³, Thierry Burlot, David Gourichon, Philippe Bardou, Hervé Acloque³, Sylvain Foissac¹, Sarah Djebali¹, Elisabetta Giuffra³, Tatiana Zerjal³, Frédérique Pitel¹, Christophe Klopp, Sandrine Lagarrigue - Show less +19 more•Institutions (3)

University of Toulouse¹, University of Delaware², Université Paris-Saclay³

28 Jun 2021-Frontiers in Genetics

TL;DR: In this article, the authors compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue.

...read moreread less

Abstract: In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.

...read moreread less

67 citations

...

Expand

Year	Papers
2021	5
2020	5
2019	4
2018	3
2017	1
2016	6

Topic Tools

Papers published on a yearly basis

Papers

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Accessing NCBI data using the NCBI Sequence Viewer and Genome Data Viewer (GDV)

The Role of Deleterious Substitutions in Crop Genomes

Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

Related Topics (5)

Performance Metrics