TL;DR: This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information, focusing on the categories of terms that define the standard, differences between simple and relational DarwinCore, how the standard has been implemented and the community processes that are essential for maintenance and growth of the standard.
Abstract: Biodiversity data derive from myriad sources stored in various formats on many distinct hardware and software platforms. An essential step towards understanding global patterns of biodiversity is to provide a standardized view of these heterogeneous data sources to improve interoperability. Fundamental to this advance are definitions of common terms. This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information. We focus on the categories of terms that define the standard, differences between simple and relational Darwin Core, how the standard has been implemented, and the community processes that are essential for maintenance and growth of the standard. We present case-study extensions of the Darwin Core into new research communities, including metagenomics and genetic resources. We close by showing how Darwin Core records are integrated to create new knowledge products documenting species distributions and changes due to environmental perturbations.
TL;DR: In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability; though they measure different parameters.
Abstract: Statistics plays a vital role in biomedical research. It helps present data precisely and draws the meaningful conclusions. While presenting data, one should be aware of using adequate statistical measures. In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability; though they measure different parameters. SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate. Journals can avoid such errors by requiring authors to adhere to their guidelines.
TL;DR: The Data Convergence Language (DCLE) as discussed by the authors is a data conversion system and method which converts data between different software and hardware platforms, which allows for multiple database conversions to be created easily and efficiently.
Abstract: A data conversion system and method which converts data between different software and hardware platforms. The DCLE of the present invention converts data from any number of different types or formats from any of various platforms to a single common data standard having a pre-defined generic data type, and the data is then converted from this generic type to a new desired format or type and stored on an existing or new destination platform. Thus, the system and method of the present invention allows for multiple database conversions to be created easily and efficiently. The data conversion process begins by first defining a complete data map of the input and output data environments, as well as zero or more intermediate environments. Data objects referred to as data bridges and streams are created to logically connect or associate the input and output environments as well as the tables in the input and output data environments. In response to user input, the data conversion system and method creates an association between fields or parts in the tables (units) in the input environment and the fields in the output environment. This essentially involves creating user specified mappings between fields in the input data environment and fields in the output data environment. When an execute command is received, the data conversion system and method accesses data from the first input environment, i.e., accesses data from the storage medium storing the data to be converted, and converts the data from the first input data environment to data having a pre-defined generic data type. Converting the data first to a pre-defined generic data type greatly simplifies the conversion process, since conversion code is only required to and from the generic data type and is not required between every possible data format. Thus, the development of conversion code is much simpler and more efficient. Once data has been converted to the generic data object, the associations are executed to convert the data from the pre-defined genetic data type to the output data using the second data format.
TL;DR: A standardized vocabulary is proposed that can be used for storing and sharing ecological trait data and may ease data integration and use of trait data for a broader ecological research community and enable global syntheses across a wide range of taxa and ecosystems.
Abstract: Trait-based approaches are widespread throughout ecological research as they offer great potential to achieve a general understanding of a wide range of ecological and evolutionary mechanisms. Accordingly, a wealth of trait data is available for many organism groups, but this data is underexploited due to a lack of standardization and heterogeneity in data formats and definitions. We review current initiatives and structures developed for standardizing trait data and discuss the importance of standardization for trait data hosted in distributed open-access repositories. In order to facilitate the standardization and harmonization of distributed trait datasets by data providers and data users, we propose a standardized vocabulary that can be used for storing and sharing ecological trait data. We discuss potential incentives and challenges for the wide adoption of such a standard by data providers. The use of a standard vocabulary allows for trait datasets from heterogeneous sources to be aggregated more easily into compilations and facilitates the creation of interfaces between software tools for trait-data handling and analysis. By aiding decentralized trait-data standardization, our vocabulary may ease data integration and use of trait data for a broader ecological research community and enable global syntheses across a wide range of taxa and ecosystems.
TL;DR: FCS 3.1 is a minor revision based on suggested improvements from the community that allows files created by one type of acquisition hardware and software to be analyzed by any other type.
Abstract: The flow cytometry data file standard provides the specifications needed to completely describe flow cytometry data sets within the confines of the file containing the experimental data. In 1984, the first Flow Cytometry Standard format for data files was adopted as FCS 1.0. This standard was modified in 1990 as FCS 2.0 and again in 1997 as FCS 3.0. We report here on the next generation flow cytometry standard data file format. FCS 3.1 is a minor revision based on suggested improvements from the community. The unchanged goal of the standard is to provide a uniform file format that allows files created by one type of acquisition hardware and software to be analyzed by any other type.The FCS 3.1 standard retains the basic FCS file structure and most features of previous versions of the standard. Changes included in FCS 3.1 address potential ambiguities in the previous versions and provide a more robust standard. The major changes include simplified support for international characters and improved support for storing compensation. The major additions are support for preferred display scale, a standardized way of capturing the sample volume, information about originality of the data file, and support for plate and well identification in high throughput, plate based experiments. Please see the normative version of the FCS 3.1 specification in Supporting Information for this manuscript (or at http://www.isac-net.org/ in the Current standards section) for a complete list of changes.