About: Minimum information about a microarray experiment is a research topic. Over the lifetime, 38 publications have been published within this topic receiving 9939 citations. The topic is also known as: MIAME.
TL;DR: The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools.
Abstract: Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
TL;DR: A summary of the GEO database structure and user facilities is provided, and recent enhancements to database design, performance, submission format options, data query and retrieval utilities are described.
Abstract: The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/
TL;DR: The Gene Expression Omnibus at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data and offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives.
Abstract: The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as ‘Minimum Information About a Microarray Experiment’ (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
TL;DR: MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
Abstract: Background
Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. Only when data can be easily exchanged will the entire biological community be able to derive the full benefit from such microarray studies.
TL;DR: The Minimum Information About a Microarray Experiment (MIAME) guidelines are a data content document developed by the Microarray Gene Expression Data (MGED) Society that outlines the information that should be provided when describing a microarray experiment, and the role of reviewers and editors becomes important.
Abstract: The Minimum Information About a Microarray Experiment (MIAME) guidelines are a data content document developed by the Microarray Gene Expression Data (MGED) Society that outlines the information that should be provided when describing a microarray experiment1. Many journals and funding agencies have adopted the guidelines, with the aim of facilitating access to the elements of a study that would enable independent evaluation of results. However, the MIAME requirements have been criticized recently2, 3. The criticism stems, in part, from different interpretations of the level of detail required to adequately report a microarray experiment, and debates as to whether there is a genuine benefit to making microarray data public.
The Gene Expression Omnibus (GEO) database at the National Center for Biotechnology Information (NCBI)4 and ArrayExpress at the European Bioinformatics Institute (EBI)5 are the two major public databases of microarray data. Although they have different designs, both databases support capture of all data elements defined by MIAME. Figure 1 presents a timeline of major landmarks in the evolution of the GEO database, together with concomitant growth in submissions. GEO was launched in 2000, more than a year before the MIAME guidelines were proposed. Because there was not yet a consensus on reporting standards for microarray data, or even an obligation to make microarray data public, GEO initially allowed a minimal level of experimental detail to be supplied. Over the ensuing years we continually monitored the needs and requests of end-users, and gauged the level of effort submitters were realistically willing to invest in making their data public. We responded with incremental improvements to database design and curation standards, and we developed easy-to-generate batch deposit formats that significantly reduced the burden of submission and allowed contributors to focus on the content submitted rather than the mechanism of submission.
Figure 1
Timeline of GEO growth and major landmarks in evolution of GEO database, and a screenshot of GEO tools which allow users to query, analyze, and visualize the data in GEO.
In June 2005, we released major database revisions that included specific provisions for all MIAME data elements. In 2006, mechanisms for provision of raw data were further streamlined, and several MIAME elements that were previously optional became mandatory. Yet, even with these advances, it is still possible for a submitter to supply data that do not strictly adhere to the MIAME requirements. The difficulty lies in the fact that MIAME is a subjective set of guidelines where the level of detail to report is open to interpretation and, thus, cannot be unequivocally validated or enforced by computational means.
All data submitted to GEO are syntactically validated for correct document structure, organization, and provision of basic elements. Next, each submission is inspected by curators for content integrity. GEO curators employ a pragmatic approach; we aim to ensure that sufficient information has been supplied to allow general interpretation of the experiment. Although encouraged, we have been less dogmatic with regards to provision of all-inclusive experimental protocols that would possibly permit practical replication of the entire experiment. Our reasoning is that provision of granulated experimental details adds a significant burden on the submitter, for (arguably) minimal real benefit for most end-users who are usually less concerned with this level of detail. When content or format problems are identified, curators work with the submitter until the issue is resolved. Submissions lacking critical descriptive elements necessary for overall experiment interpretation are not approved for public release. However, given the large diversity of biological themes, technologies, and statistical transformations applied to microarray data, it is impractical for curators to decisively determine the accuracy and validity of the data, or to assess if all relevant information has been supplied. This is where the role of reviewers and editors becomes important.
The GEO database has had mechanisms for anonymous reviewer access to prepublication data since 2003. Over the last several years, authors have occasionally requested curator comment regarding the level of MIAME-compliance of their submissions, and we have been happy to offer feedback on areas that could be improved. GEO staff are similarly available to support reviewers and editors by providing tailored inspections of MIAME compliance of specific submissions upon request of the journal, as ArrayExpress is proposing to do6. If a reviewer determines that insufficient information has been supplied, the GEO database is designed such that authors can quickly respond by updating their records accordingly.
It has been challenging to find the optimal balance between submitter effort and the appropriate level of metadata detail to request, all within a rapidly evolving technological and social environment7. However, the relative simplicity of the GEO database structure, together with common-sense curation policies that focus on gathering germane MIAME elements, have made it possible for us to develop an extensive suite of utilities that make the volumes of complex data archived at GEO accessible and easy to use by the research community at large8. Ultimately, the value of a database is reflected by how it is used by the community it serves. In the past month, GEO received approximately one million query hits, and over 200,000 file transfer downloads amounting to over 2.5 terabytes of compressed data. Furthermore, it is clear that researchers are applying these data to their own studies, as evidenced by over 100 recent publications citing data found in GEO to support or otherwise complement their own studies9. We view this as testament that the effort involved in making expression data public via GEO is fully justified.