Sequence Read Archive

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1093/NAR/GKQ1019•

The sequence read archive.

[...]

Rasko Leinonen¹, Hideaki Sugawara¹, Martin Shumway¹•Institutions (1)

National Institute of Genetics¹

01 Jan 2011-Nucleic Acids Research

TL;DR: The content and structure of the SRA is presented, support for sequencing platforms and recommended data submission levels and formats are provided and the response to the challenge of data growth is outlined.

...read moreread less

Abstract: The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.

...read moreread less

2,855 citations

Journal Article•10.1093/BIOINFORMATICS/BTR708•

ART: a next-generation sequencing read simulator

[...]

Weichun Huang¹, Leping Li¹, Jason R. Myers¹, Gabor T. Marth¹•Institutions (1)

National Institutes of Health¹

15 Feb 2012-Bioinformatics

TL;DR: UNLABELLED ART is a set of simulation tools that generate synthetic next-generation sequencing reads that are essential for testing and benchmarking tools for next- generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery.

...read moreread less

Abstract: Summary: ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche’s 454, Illumina’s Solexa and Applied Biosystems’ SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Availability: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art

...read moreread less

1,657 citations

Journal Article•10.1093/NAR/28.1.19•

The EMBL Nucleotide Sequence Database

[...]

Tamara Kulikova¹, Philippe Aldebert¹, Nicola Althorpe¹, Wendy Baker¹, Kirsty Bates¹, Paul Browne¹, Alexandra van den Broek¹, Guy Cochrane¹, Karyn Duggan¹, Ruth Y. Eberhardt¹, Nadeem Faruque¹, Maria Garcia-Pastor¹, Nicola Harte¹, Carola Kanz¹, Rasko Leinonen¹, Quan Lin¹, Vincent Lombard¹, Rodrigo Lopez¹, Renato Mancuso¹, Michelle McHale¹, Francesco Nardone¹, Ville Silventoinen¹, Peter Stoehr¹, Guenter Stoesser¹, Mary Ann Tuli¹, Katerina Tzouvara¹, Robert Vaughan¹, Dan Wu¹, Weimin Zhu¹, Rolf Apweiler¹ - Show less +26 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2004-Nucleic Acids Research

TL;DR: Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.

...read moreread less

Abstract: The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

...read moreread less

1,312 citations

Journal Article•10.1093/NAR/GKR854•

The sequence read archive: explosive growth of sequencing data

[...]

Yuichi Kodama¹, Martin Shumway¹, Rasko Leinonen¹•Institutions (1)

National Institute of Genetics¹

01 Jan 2012-Nucleic Acids Research

TL;DR: The content and structure of the SRA is presented and report on updated metadata structures, submission file formats and supported sequencing platforms, and various responses to the challenge of explosive data growth are outlined.

...read moreread less

Abstract: New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.

...read moreread less

984 citations

Journal Article•10.1093/NAR/GKQ967•

The European Nucleotide Archive

[...]

Rasko Leinonen¹, Ruth Akhtar¹, Ewan Birney¹, Lawrence Bower¹, Ana Cerdeño-Tárraga¹, Ying Cheng¹, Iain Cleland¹, Nadeem Faruque¹, Neil Goodgame¹, Richard Gibson¹, Gemma Hoad¹, Mikyung Jang¹, Nima Pakseresht¹, Sheila Plaister¹, Rajesh Radhakrishnan¹, Kethi Reddy¹, Siamak Sobhany¹, Petra ten Hoopen¹, Robert Vaughan¹, Vadim Zalunin¹, Guy Cochrane¹ - Show less +17 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2011-Nucleic Acids Research

TL;DR: This article outlines these services and describes major changes and improvements introduced during 2010, including extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.

...read moreread less

Abstract: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.

...read moreread less

806 citations

...

Expand

Year	Papers
2021	1
2020	1
2019	3
2018	3
2017	4
2016	5

Topic Tools

Papers published on a yearly basis

Papers

The sequence read archive.

ART: a next-generation sequencing read simulator

The EMBL Nucleotide Sequence Database

The sequence read archive: explosive growth of sequencing data

The European Nucleotide Archive

Related Topics (5)

Performance Metrics