Top 138 GigaScience papers published in 2017

Showing papers in "GigaScience in 2017"

Journal Article•10.1093/GIGASCIENCE/GIW015•

Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes.

[...]

Panayotis Vlastaridis¹, Pelagia Kyriakidou¹, Anargyros Chaliotis¹, Yves Van de Peer², Yves Van de Peer³, Stephen G. Oliver⁴, Grigoris D. Amoutzias¹ - Show less +3 more•Institutions (4)

University of Thessaly¹, Ghent University², University of Pretoria³, University of Cambridge⁴

01 Feb 2017-GigaScience

TL;DR: GDA acknowledges financial support from the ARISTEIA II (Aristeia II) Action, which is co-funded by the European Social and National Resources (code 4288 to GDA).

...read moreread less

Abstract: GDA acknowledges financial support from the “ARISTEIA II” Action of the ”OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING” that is co-funded by the European Social Fund and National Resources (code 4288 to GDA). GDA acknowledges additional support by research grants from the Postgraduate Programme ‘Toxicology’ of the Dept. of Biochemistry and Biotechnology, School of Health Sciences, University of Thessaly, Greece. YVdP acknowledges the Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks” Project (no. 01MR0310W) of Ghent University. SGO acknowledges the University of Cambridge for granting him Sabbatical Leave to permit him to work with GDA in the University of Thessaly, Greece.

...read moreread less

745 citations

Journal Article•10.1093/GIGASCIENCE/GIX018•

Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q

[...]

Wen Xie, Chunhai Chen, Zezhong Yang, Litao Guo, Xin Yang, Dan Wang, Ming Chen, Jinqun Huang, Yanan Wen, Yang Zeng, Yating Liu, Jixing Xia, Lixia Tian, Hongying Cui, Qingjun Wu, Shaoli Wang, Baoyun Xu, Xianchun Li¹, Xinqiu Tan, Murad Ghanim, Bao-Li Qiu², Huipeng Pan², Dong Chu³, Helene Delatte, Midatharahally N. Maruthi⁴, Feng Ge⁵, Xueping Zhou, Xiao-Wei Wang⁶, Fang-Hao Wan, Yuzhou Du⁷, Chen Luo, Fengming Yan⁸, Evan L. Preisser⁹, Xiaoguo Jiao¹⁰, Brad S. Coates¹¹, Jinyang Zhao, Qiang Gao, Jinquan Xia, Ye Yin, Yong Liu, Judith K. Brown¹, Xuguo “Joe” Zhou¹², Youjun Zhang - Show less +39 more•Institutions (12)

University of Arizona¹, South China Agricultural University², Qingdao Agricultural University³, University of Greenwich⁴, Chinese Academy of Sciences⁵, Institute of Insect Sciences, Zhejiang University⁶, Yangzhou University⁷, Henan Agricultural University⁸, University of Rhode Island⁹, Hubei University¹⁰, United States Department of Agriculture¹¹, University of Kentucky¹²

01 May 2017-GigaScience

TL;DR: These MED/Q genomic resources lay a foundation for future ‘pan-genomic’ comparisons of invasive vs. noninvasive, invasive versus.

...read moreread less

Abstract: National Natural Science Foundation of China [31420103919, 31672032]; Chinese Academy of Agricultural Sciences (CAAS-ASTIP-IVFCAAS) the China Agriculture Research System [CARS-26-10]; Beijing Training Project for the Leading Talents in S T [LJRC201412]; Beijing Key Laboratory for Pest Control and Sustainable Cultivation of Vegetables; Beijing Nova Program [Z171100001117039]

...read moreread less

502 citations

Journal Article•10.1093/GIGASCIENCE/GIX083•

Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping

[...]

Michael P. Pound¹, Jonathan A. Atkinson¹, Alexandra J. Burgess¹, Michael Wilson², Marcus Griffiths¹, Aaron S. Jackson¹, Adrian Bulat¹, Georgios Tzimiropoulos¹, Darren M. Wells¹, Erik H. Murchie¹, Tony P. Pridmore¹, Andrew P. French¹ - Show less +8 more•Institutions (2)

University of Nottingham¹, University of Leeds²

01 Oct 2017-GigaScience

TL;DR: Deep learning–based phenotyping is shown to have very good detection and localization accuracy in validation and testing image sets and to derive meaningful biological traits, which in turn can be used in quantitative trait loci discovery pipelines.

...read moreread less

Abstract: Deep learning is an emerging field that promises unparalleled results on many data analysis problems. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping, and demonstrate state-of-the-art results for root and shoot feature identification and localisation. We predict a paradigm shift in image-based phenotyping thanks to deep learning approaches.

...read moreread less

356 citations

Journal Article•10.1093/GIGASCIENCE/GIX049•

Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

[...]

Sarah Siu Tze Mak Mak¹, Shyam Gopalakrishnan¹, Christian Carøe¹, Christian Carøe², Chunyu Geng, Shanlin Liu¹, Mikkel-Holger S. Sinding¹, Mikkel-Holger S. Sinding³, Mikkel-Holger S. Sinding⁴, Lukas F. K. Kuderna⁵, Wenwei Zhang, Fu Shujin, Filipe G. Vieira¹, Mietje Germonpré⁶, Hervé Bocherens⁷, Sergey Fedorov⁸, Bent O. Petersen², Thomas Sicheritz-Pontén², Tomas Marques-Bonet⁹, Tomas Marques-Bonet⁵, Guojie Zhang¹, Hui Jiang, M. Thomas P. Gilbert¹, M. Thomas P. Gilbert¹⁰, M. Thomas P. Gilbert¹¹ - Show less +21 more•Institutions (11)

University of Copenhagen¹, Technical University of Denmark², University of Greenland³, American Museum of Natural History⁴, Spanish National Research Council⁵, Royal Belgian Institute of Natural Sciences⁶, University of Tübingen⁷, North-Eastern Federal University⁸, Catalan Institution for Research and Advanced Studies⁹, Norwegian University of Science and Technology¹⁰, Curtin University¹¹

01 Aug 2017-GigaScience

TL;DR: The observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.

...read moreread less

Abstract: Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction-amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.

...read moreread less

320 citations

Journal Article•10.1093/GIGASCIENCE/GIX024•

A reference human genome dataset of the BGISEQ-500 sequencer

[...]

Jie Huang, Xinming Liang, Yuankai Xuan¹, Chunyu Geng, Yuxiang Li, Haorong Lu, Shoufang Qu, Xianglin Mei¹, Hongbo Chen, Ting Yu, Nan Sun, Junhua Rao, Jiahao Wang, Wenwei Zhang, Ying Chen, Sha Liao, Hui Jiang, Xin Liu, Zhaopeng Yang, Feng Mu, Shangxian Gao - Show less +17 more•Institutions (1)

Food and Drug Administration¹

01 May 2017-GigaScience

TL;DR: The first human whole-genome sequencing dataset of BGISEQ-500, generated by sequencing the widely used cell line HG001, can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.

...read moreread less

Abstract: Background BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.

...read moreread less

311 citations

Journal Article•10.1093/GIGASCIENCE/GIX034•

EEG datasets for motor imagery brain–computer interface

[...]

Hohyun Cho¹, Minkyu Ahn², Sangtae Ahn³, Moonyoung Kwon¹, Sung Chan Jun¹ - Show less +1 more•Institutions (3)

Gwangju Institute of Science and Technology¹, Handong Global University², University of North Carolina at Chapel Hill³

01 Jul 2017-GigaScience

TL;DR: The authors' EEG datasets for MI BCI may provide researchers with opportunities to investigate human factors related to MIBCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states.

...read moreread less

Abstract: Background Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. Findings We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Conclusions Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states.

...read moreread less

297 citations

Journal Article•10.1093/GIGASCIENCE/GIX089•

Taxonomic structure and functional association of foxtail millet root microbiome.

[...]

Tao Jin, Yayu Wang, Yueying Huang, Jin Xu¹, Pengfan Zhang², Nian Wang¹, Xin Liu, Haiyan Chu², Guang Liu, Honggang Jiang, Yuzhen Li, Jing Xu, Karsten Kristiansen³, Liang Xiao, Yunzeng Zhang¹, Gengyun Zhang, Guohua Du, Houbao Zhang, Hongfeng Zou, Zhang Haifeng, Zhuye Jie, Suisha Liang, Huijue Jia, Jingwang Wan, Dechun Lin, Jinying Li, Guangyi Fan, Huanming Yang, Jian Wang, Yang Bai², Xun Xu - Show less +27 more•Institutions (3)

University of Florida¹, Chinese Academy of Sciences², University of Copenhagen³

01 Oct 2017-GigaScience

TL;DR: The results demonstrated that host plants enrich specific bacteria and functions in the rhizoplane in foxtail millet root bacterial community, and may serve as a valuable knowledge foundation for bio-fertilizer development in agriculture.

...read moreread less

Abstract: The root microbes play pivotal roles in plant productivity, nutrient uptakes, and disease resistance. The root microbial community structure has been extensively investigated by 16S/18S/ITS amplicons and metagenomic sequencing in crops and model plants. However, the functional associations between root microbes and host plant growth are poorly understood. This work investigates the root bacterial community of foxtail millet (Setaria italica) and its potential effects on host plant productivity. We determined the bacterial composition of 2882 samples from foxtail millet rhizoplane, rhizosphere and corresponding bulk soils from 2 well-separated geographic locations by 16S rRNA gene amplicon sequencing. We identified 16 109 operational taxonomic units (OTUs), and defined 187 OTUs as shared rhizoplane core OTUs. The β-diversity analysis revealed that microhabitat was the major factor shaping foxtail millet root bacterial community, followed by geographic locations. Large-scale association analysis identified the potential beneficial bacteria correlated with plant high productivity. Besides, the functional prediction revealed specific pathways enriched in foxtail millet rhizoplane bacterial community. We systematically described the root bacterial community structure of foxtail millet and found its core rhizoplane bacterial members. Our results demonstrated that host plants enrich specific bacteria and functions in the rhizoplane. The potentially beneficial bacteria may serve as a valuable knowledge foundation for bio-fertilizer development in agriculture.

...read moreread less

292 citations

Journal Article•10.1093/GIGASCIENCE/GIX097•

The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum

[...]

Aleksey V. Zimin¹, Aleksey V. Zimin², Daniela Puiu¹, Richard Hall³, Sarah B. Kingan³, Bernardo J. Clavijo⁴, Steven L. Salzberg¹, Steven L. Salzberg⁵ - Show less +4 more•Institutions (5)

Johns Hopkins University School of Medicine¹, University of Maryland, College Park², Pacific Biosciences³, Norwich Research Park⁴, Johns Hopkins University⁵

01 Nov 2017-GigaScience

TL;DR: This work reports the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads, providing a strong foundation for future genetic studies of this important food crop.

...read moreread less

Abstract: Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components.

...read moreread less

292 citations

Journal Article•10.1093/GIGASCIENCE/GIX058•

Connections between the human gut microbiome and gestational diabetes mellitus.

[...]

Ya-Shu Kuang¹, Jinhua Lu¹, Shenghui Li¹, Junhua Li, Ming-Yang Yuan¹, Jian-Rong He¹, Nian-Nian Chen¹, Wan-Qing Xiao¹, Songying Shen¹, Lan Qiu¹, Ying-Fang Wu¹, Cui-Yue Hu¹, Yan-Yan Wu¹, Weidong Li¹, Qiao-Zhu Chen¹, Hong-Wen Deng¹, Hong-Wen Deng², Christopher J. Papasian³, Huimin Xia¹, Xiu Qiu¹ - Show less +16 more•Institutions (3)

Guangzhou Medical University¹, Tulane University², University of Missouri–Kansas City³

01 Aug 2017-GigaScience

TL;DR: Novel relationships between the gut microbiome and GDM status are discovered and it is suggested that changes in microbial composition may potentially be used to identify individuals at risk for GDM.

...read moreread less

Abstract: The human gut microbiome can modulate metabolic health and affect insulin resistance, and it may play an important role in the etiology of gestational diabetes mellitus (GDM). Here, we compared the gut microbial composition of 43 GDM patients and 81 healthy pregnant women via whole-metagenome shotgun sequencing of their fecal samples, collected at 21-29 weeks, to explore associations between GDM and the composition of microbial taxonomic units and functional genes. A metagenome-wide association study identified 154 837 genes, which clustered into 129 metagenome linkage groups (MLGs) for species description, with significant relative abundance differences between the 2 cohorts. Parabacteroides distasonis, Klebsiella variicola, etc., were enriched in GDM patients, whereas Methanobrevibacter smithii, Alistipes spp., Bifidobacterium spp., and Eubacterium spp. were enriched in controls. The ratios of the gross abundances of GDM-enriched MLGs to control-enriched MLGs were positively correlated with blood glucose levels. A random forest model shows that fecal MLGs have excellent discriminatory power to predict GDM status. Our study discovered novel relationships between the gut microbiome and GDM status and suggests that changes in microbial composition may potentially be used to identify individuals at risk for GDM.

...read moreread less

267 citations

Journal Article•10.1093/GIGASCIENCE/GIX019•

The need to approximate the use-case in clinical machine learning.

[...]

Sohrab Saeb¹, Luca Lonini¹, Arun Jayaraman¹, David C. Mohr¹, Konrad P. Kording¹ - Show less +1 more•Institutions (1)

Northwestern University¹

01 May 2017-GigaScience

TL;DR: It is found that record-wise CV often massively overestimates the prediction accuracy of the algorithms, and this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes.

...read moreread less

Abstract: The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.

...read moreread less

256 citations

Journal Article•10.1093/GIGASCIENCE/GIX027•

Laboratory x-ray micro-computed tomography: a user guideline for biological samples.

[...]

Anton du Plessis¹, Chris Broeckhoven¹, Anina Guelpa¹, Stephan G. le Roux¹•Institutions (1)

Stellenbosch University¹

01 Jun 2017-GigaScience

TL;DR: An easily operated “how to” guide for new potential users and describes the various steps required for successful planning of research projects that involve micro-CT, a fast-growing method in scientific research applications that allows for non-destructive imaging of morphological structures.

...read moreread less

Abstract: Laboratory x-ray micro-computed tomography (micro-CT) is a fast-growing method in scientific research applications that allows for non-destructive imaging of morphological structures. This paper provides an easily operated "how to" guide for new potential users and describes the various steps required for successful planning of research projects that involve micro-CT. Background information on micro-CT is provided, followed by relevant setup, scanning, reconstructing, and visualization methods and considerations. Throughout the guide, a Jackson's chameleon specimen, which was scanned at different settings, is used as an interactive example. The ultimate aim of this paper is make new users familiar with the concepts and applications of micro-CT in an attempt to promote its use in future scientific studies.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX077•

The sponge microbiome project

[...]

Lucas Moitinho-Silva¹, Shaun Nielsen¹, Amnon Amir², Antonio Gonzalez², Gail Ackermann², Carlo Cerrano, Carmen Astudillo-García³, Cole G. Easson⁴, Detmer Sipkema⁵, Fang Liu⁶, Georg Steinert⁵, Giorgos Kotoulas, Grace P. McCormack⁷, Guofang Feng⁶, James J. Bell⁸, Jan Vicente, Johannes R. Björk⁹, José M. Montoya¹⁰, Julie B. Olson¹¹, Julie Reveillaud¹², Laura Steindler¹³, Mari Carmen Pineda¹⁴, Maria V. Marra⁷, Micha Ilan¹⁵, Michael W. Taylor³, Paraskevi N. Polymenakou, Patrick M. Erwin¹⁶, Peter J. Schupp¹⁷, Rachel L. Simister¹⁸, Rob Knight², Robert W. Thacker¹⁹, Rodrigo Costa²⁰, Russell T. Hill²¹, Susanna López-Legentil¹⁶, Thanos Dailianis, Timothy Ravasi²², Ute Hentschel²³, Zhiyong Li⁶, Nicole S. Webster²⁴, Nicole S. Webster¹⁴, Torsten Thomas¹ - Show less +37 more•Institutions (24)

01 Oct 2017-GigaScience

TL;DR: This dataset represents a comprehensive resource of sponge-associated microbial communities based on 16S rRNA gene sequences that can be used to address overarching hypotheses regarding host-associated prokaryotes, including host specificity, convergent evolution, environmental drivers of microbiome structure, and the sponge- associated rare biosphere.

...read moreread less

Abstract: Marine sponges (phylum Porifera) are a diverse, phylogenetically deep-branching clade known for forming intimate partnerships with complex communities of microorganisms. To date, 16S rRNA gene sequencing studies have largely utilised different extraction and amplification methodologies to target the microbial communities of a limited number of sponge species, severely limiting comparative analyses of sponge microbial diversity and structure. Here, we provide an extensive and standardised dataset that will facilitate sponge microbiome comparisons across large spatial, temporal, and environmental scales. Samples from marine sponges (n = 3569 specimens), seawater (n = 370), marine sediments (n = 65) and other environments (n = 29) were collected from different locations across the globe. This dataset incorporates at least 268 different sponge species, including several yet unidentified taxa. The V4 region of the 16S rRNA gene was amplified and sequenced from extracted DNA using standardised procedures. Raw sequences (total of 1.1 billion sequences) were processed and clustered with (i) a standard protocol using QIIME closed-reference picking resulting in 39 543 operational taxonomic units (OTU) at 97% sequence identity, (ii) a de novo clustering using Mothur resulting in 518 246 OTUs, and (iii) a new high-resolution Deblur protocol resulting in 83 908 unique bacterial sequences. Abundance tables, representative sequences, taxonomic classifications, and metadata are provided. This dataset represents a comprehensive resource of sponge-associated microbial communities based on 16S rRNA gene sequences that can be used to address overarching hypotheses regarding host-associated prokaryotes, including host specificity, convergent evolution, environmental drivers of microbiome structure, and the sponge-associated rare biosphere.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX087•

Lipidomics profiling reveals the role of glycerophospholipid metabolism in psoriasis.

[...]

Chunwei Zeng, Bo Wen, Guixue Hou, Li Lei¹, Zhanlong Mei, Xuekun Jia¹, Xiaomin Chen, Wu Zhu¹, Jie Li¹, Yehong Kuang¹, Weiqi Zeng¹, Juan Su¹, Siqi Liu, Cong Peng¹, Xiang Chen¹ - Show less +11 more•Institutions (1)

Central South University¹

01 Oct 2017-GigaScience

TL;DR: It was found that elements of glycerophospholipid metabolism were significantly altered in the plasma of psoriatic patients and provides novel insight into the role of lipids in psoriasis.

...read moreread less

Abstract: Psoriasis is a common and chronic inflammatory skin disease that is complicated by gene-environment interactions. Although genomic, transcriptomic, and proteomic analyses have been performed to investigate the pathogenesis of psoriasis, the role of metabolites in psoriasis, particularly of lipids, remains unclear. Lipids not only comprise the bulk of the cellular membrane bilayers but also regulate a variety of biological processes such as cell proliferation, apoptosis, immunity, angiogenesis, and inflammation. In this study, an untargeted lipidomics approach was used to study the lipid profiles in psoriasis and to identify lipid metabolite signatures for psoriasis through ultra-performance liquid chromatography-tandem quadrupole mass spectrometry. Plasma samples from 90 participants (45 healthy and 45 psoriasis patients) were collected and analyzed. Statistical analysis was applied to find different metabolites between the disease and healthy groups. In addition, enzyme-linked immunosorbent assay was performed to validate differentially expressed lipids in psoriatic patient plasma. Finally, we identified differential expression of several lipids including lysophosphatidic acid (LPA), lysophosphatidylcholine (LysoPC), phosphatidylinositol (PI), phosphatidylcholine (PC), and phosphatidic acid (PA); among these metabolites, LPA, LysoPC, and PA were significantly increased, while PC and PI were down-regulated in psoriasis patients. We found that elements of glycerophospholipid metabolism such as LPA, LysoPC, PA, PI, and PC were significantly altered in the plasma of psoriatic patients; this study characterizes the circulating lipids in psoriatic patients and provides novel insight into the role of lipids in psoriasis.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX085•

De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads.

[...]

Jonas Korlach¹, Gregory Gedman², Sarah B. Kingan¹, Chen-Shan Chin¹, Jason T. Howard², Jean-Nicolas Audet³, Jean-Nicolas Audet², Lindsey J. Cantin², Erich D. Jarvis⁴, Erich D. Jarvis² - Show less +6 more•Institutions (4)

Pacific Biosciences¹, Rockefeller University², McGill University³, Howard Hughes Medical Institute⁴

01 Oct 2017-GigaScience

TL;DR: The impact of long reads, sequencing of previously difficult-to-sequence regions, and phasing of haplotypes on generating the high-quality assemblies necessary for understanding gene structure, function, and evolution are demonstrated.

...read moreread less

Abstract: Reference-quality genomes are expected to provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid assembly for potential improvements to the Sanger-based intermediate-read zebra finch reference and Illumina-based short-read Anna's hummingbird reference, 2 vocal learning avian species widely studied in neuroscience and genomics. With DNA of the same individuals used to generate the reference genomes, we generated diploid assemblies with the FALCON-Unzip assembler, resulting in contigs with no gaps in the megabase range, representing 150-fold and 200-fold improvements over the current zebra finch and hummingbird references, respectively. These long-read and phased assemblies corrected and resolved what we discovered to be numerous misassemblies in the references, including missing sequences in gaps, erroneous sequences flanking gaps, base call errors in difficult-to-sequence regions, complex repeat structure errors, and allelic differences between the 2 haplotypes. These improvements were validated by single long-genome and transcriptome reads and resulted for the first time in completely resolved protein-coding genes widely studied in neuroscience and specialized in vocal learning species. These findings demonstrate the impact of long reads, sequencing of previously difficult-to-sequence regions, and phasing of haplotypes on generating the high-quality assemblies necessary for understanding gene structure, function, and evolution.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX093•

Panax ginseng genome examination for ginsenoside biosynthesis.

[...]

Jiang Xu¹, Yang Chu¹, Baosheng Liao¹, Shuiming Xiao¹, Qinggang Yin¹, Rui Bai¹, He Su¹, Linlin Dong¹, Xiwen Li¹, Jun Qian¹, Jingjing Zhang¹, Yujun Zhang¹, Xiaoyan Zhang¹, Mingli Wu¹, Jie Zhang¹, Guozheng Li¹, Lei Zhang¹, Zhenzhan Chang², Yuebin Zhang³, Zhengwei Jia⁴, Zhixiang Liu¹, Daniel Afreh, Ruth Nahurira, Lianjuan Zhang¹, Ruiyang Cheng¹, Yingjie Zhu¹, Guangwei Zhu¹, Wei Rao⁴, Chao Zhou⁴, Lirui Qiao⁴, Zhihai Huang, Yung-Chi Cheng⁵, Shilin Chen¹ - Show less +29 more•Institutions (5)

Peking Union Medical College¹, Peking University², Dalian Institute of Chemical Physics³, Waters Corporation⁴, Yale University⁵

01 Nov 2017-GigaScience

TL;DR: The ginseng genome represents a valuable resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb.

...read moreread less

Abstract: Ginseng, which contains ginsenosides as bioactive compounds, has been regarded as an important traditional medicine for several millennia. However, the genetic background of ginseng remains poorly understood, partly because of the plant's large and complex genome composition. We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5-Gb nucleotide sequence contains more than 60% repeats and encodes 42 006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression characteristics. A total of 225 UDP-glycosyltransferases (UGTs) were identified, and these UGTs accounted for one of the largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular modeling of UGTs in the 71st, 74th, and 94th families revealed a regiospecific conserved motif located at the N-terminus. Molecular docking predicted that this motif captures ginsenoside precursors. The ginseng genome represents a valuable resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX004•

Multilayer modeling and analysis of human brain networks.

[...]

Manlio De Domenico

01 May 2017-GigaScience

TL;DR: This work will review the main achievements obtained from interdisciplinary research based on magnetic resonance imaging and establish de facto, the birth of multilayer network analysis and modeling of the human brain.

...read moreread less

Abstract: Understanding how the human brain is structured, and how its architecture is related to function, is of paramount importance for a variety of applications, including but not limited to new ways to prevent, deal with, and cure brain diseases, such as Alzheimer's or Parkinson's, and psychiatric disorders, such as schizophrenia. The recent advances in structural and functional neuroimaging, together with the increasing attitude toward interdisciplinary approaches involving computer science, mathematics, and physics, are fostering interesting results from computational neuroscience that are quite often based on the analysis of complex network representation of the human brain. In recent years, this representation experienced a theoretical and computational revolution that is breaching neuroscience, allowing us to cope with the increasing complexity of the human brain across multiple scales and in multiple dimensions and to model structural and functional connectivity from new perspectives, often combined with each other. In this work, we will review the main achievements obtained from interdisciplinary research based on magnetic resonance imaging and establish de facto, the birth of multilayer network analysis and modeling of the human brain.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX059•

The pearl oyster Pinctada fucata martensii genome and multi-omic analyses provide insights into biomineralization.

[...]

Xiaodong Du¹, Guangyi Fan, Yu Jiao¹, He Zhang, Ximing Guo², Ronglian Huang¹, Zhe Zheng¹, Chao Bian, Yuewen Deng¹, Qingheng Wang¹, Zhongduo Wang¹, Xinming Liang, Haiying Liang¹, Chengcheng Shi, Xiaoxia Zhao¹, Fengming Sun, Ruijuan Hao¹, Jie Bai, Jialiang Liu¹, Wenbin Chen, Jinlian Liang¹, Weiqing Liu, Zhe Xu³, Qiong Shi, Xun Xu, Guofan Zhang⁴, Xin Liu - Show less +23 more•Institutions (4)

Guangdong Ocean University¹, Rutgers University², Atlantic Cape Community College³, Chinese Academy of Sciences⁴

01 Aug 2017-GigaScience

TL;DR: The highly polymorphic genome of the pearl oyster is sequenced and a large set of novel proteins participating in matrix-framework formation are identified, including components similar to that found in vertebrate bones such as collagen-related VWA-containing proteins, chondroitin sulfotransferases, and regulatory elements.

...read moreread less

Abstract: Nacre, the iridescent material found in pearls and shells of molluscs, is formed through an extraordinary process of matrix-assisted biomineralization. Despite recent advances, many aspects of the biomineralization process and its evolutionary origin remain unknown. The pearl oyster Pinctada fucata martensii is a well-known master of biomineralization, but the molecular mechanisms that underlie its production of shells and pearls are not fully understood. We sequenced the highly polymorphic genome of the pearl oyster and conducted multi-omic and biochemical studies to probe nacre formation. We identified a large set of novel proteins participating in matrix-framework formation, many in expanded families, including components similar to that found in vertebrate bones such as collagen-related VWA-containing proteins, chondroitin sulfotransferases, and regulatory elements. Considering that there are only collagen-based matrices in vertebrate bones and chitin-based matrices in most invertebrate skeletons, the presence of both chitin and elements of collagen-based matrices in nacre suggests that elements of chitin- and collagen-based matrices have deep roots and might be part of an ancient biomineralizing matrix. Our results expand the current shell matrix-framework model and provide new insights into the evolution of diverse biomineralization systems.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIW014•

A dataset of images and morphological profiles of 30 000 small-molecule treatments using the Cell Painting assay

[...]

01 Dec 2017-GigaScience

TL;DR: This microscopy dataset includes 919 265 five-channel fields of view, representing 30 616 tested compounds, available at “The Cell Image Library” (CIL) repository, and includes data files containing morphological features derived from each cell in each image, both at the single-cell level and population-averaged level.

...read moreread less

Abstract: Background Large-scale image sets acquired by automated microscopy of perturbed samples enable a detailed comparison of cell states induced by each perturbation, such as a small molecule from a diverse library. Highly multiplexed measurements of cellular morphology can be extracted from each image and subsequently mined for a number of applications. Findings This microscopy dataset includes 919 265 five-channel fields of view, representing 30 616 tested compounds, available at "The Cell Image Library" (CIL) repository. It also includes data files containing morphological features derived from each cell in each image, both at the single-cell level and population-averaged (i.e., per-well) level; the image analysis workflows that generated the morphological features are also provided. Quality-control metrics are provided as metadata, indicating fields of view that are out-of-focus or containing highly fluorescent material or debris. Lastly, chemical annotations are supplied for the compound treatments applied. Conclusions Because computational algorithms and methods for handling single-cell morphological measurements are not yet routine, the dataset serves as a useful resource for the wider scientific community applying morphological (image-based) profiling. The dataset can be mined for many purposes, including small-molecule library enrichment and chemical mechanism-of-action studies, such as target identification. Integration with genetically perturbed datasets could enable identification of small-molecule mimetics of particular disease- or gene-related phenotypes that could be useful as probes or potential starting points for development of future therapeutics.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX020•

Using and understanding cross-validation strategies. Perspectives on Saeb et al.

[...]

Max A. Little, Gaël Varoquaux¹, Sohrab Saeb², Luca Lonini³, Arun Jayaraman³, David C. Mohr², Konrad P. Kording³, Konrad P. Kording² - Show less +4 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Northwestern University², Rehabilitation Institute of Chicago³

01 May 2017-GigaScience

TL;DR: A detailed look at the complexities of cross-validation, fostered by the peer review of Saeb et al.

...read moreread less

Abstract: This three-part review takes a detailed look at the complexities of cross-validation, fostered by the peer review of Saeb et al.'s paper entitled "The need to approximate the use-case in clinical machine learning." It contains perspectives by reviewers and by the original authors that touch upon cross-validation: the suitability of different strategies and their interpretation.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX101•

LAGOS-NE: A multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes

[...]

Patricia A. Soranno¹, Linda C. Bacon, Michael Beauchene, Karen E. Bednar, Edward G. Bissell¹, Claire K. Boudreau¹, Marvin G. Boyer², Mary T. Bremigan¹, Stephen R. Carpenter³, J. Carr⁴, Kendra Spence Cheruvelil¹, Samuel T. Christel³, Matt Claucherty, Sarah M. Collins³, Joseph D. Conroy⁵, John A. Downing⁶, Jed Dukett, C. Emi Fergus⁷, Christopher T. Filstrup⁶, Clara Funk⁷, María J. González⁸, Linda Green⁹, Corinna Gries³, John D. Halfman¹⁰, Stephen K. Hamilton¹, Paul C. Hanson³, Emily Norton Henry¹¹, Elizabeth Herron⁹, Celeste Hockings¹², James R. Jackson¹³, Kari Jacobson-Hedin¹⁴, Lorraine L. Janus¹⁵, William W. Jones¹⁶, John R. Jones¹⁷, Caroline M. Keson, Katelyn B. S. King¹, Scott A. Kishbaugh¹⁸, Jean-François Lapierre¹⁹, Barbara Lathrop²⁰, Jo A. Latimore¹, Yuehlin Lee⁴, Noah R. Lottig³, Jason A. Lynch⁷, Leslie J. Matthews, William H. McDowell²¹, Karen Moore¹⁵, Brian P. Neff²², Sarah J. Nelson²³, Samantha K. Oliver³, Michael L. Pace²⁴, Donald C. Pierson²⁵, Autumn C. Poisson¹, Amina I. Pollard⁷, David M. Post²⁶, Paul O. Reyes⁴, Donald O. Rosenberry²², Karen M. Roy¹⁸, Lars G. Rudstam¹³, Orlando Sarnelle¹, Nancy J. Schuldt, Caren E. Scott, Nicholas K. Skaff¹, Nicole J. Smith¹, Nick R. Spinelli, Joseph Stachelek¹, Emily H. Stanley³, John L. Stoddard⁷, Scott B. Stopyak²⁷, Craig A. Stow²⁸, Jason Tallant²⁹, Pang-Ning Tan¹, Anthony P. Thorpe¹⁷, Michael J. Vanni⁸, Tyler Wagner²², Gretchen Watkins, Kathleen C. Weathers³⁰, Katherine E. Webster³¹, Jeffrey D. White³², Marcy K. Wilmes, Shuai Yuan¹ - Show less +76 more•Institutions (32)

Michigan State University¹, United States Army Corps of Engineers², University of Wisconsin-Madison³, Massachusetts Department of Conservation and Recreation⁴, Ohio Department of Natural Resources⁵, University of Minnesota⁶, United States Environmental Protection Agency⁷, Miami University⁸, University of Rhode Island⁹, Hobart and William Smith Colleges¹⁰, Oregon State University¹¹, Lac du Flambeau Band of Lake Superior Chippewa Indians¹², Cornell University¹³, Fond du Lac Reservation¹⁴, New York City Department of Environmental Protection¹⁵, Indiana University¹⁶, University of Missouri¹⁷, New York State Department of Environmental Conservation¹⁸, Université de Montréal¹⁹, Pennsylvania Department of Environmental Protection²⁰, University of New Hampshire²¹, United States Geological Survey²², University of Maine²³, University of Virginia²⁴, Uppsala University²⁵, University of Connecticut²⁶, Eaton Corporation²⁷, Great Lakes Institute of Management²⁸, University of Michigan²⁹, Institute of Ecosystem Studies³⁰, Trinity College, Dublin³¹, Framingham State University³²

01 Dec 2017-GigaScience

TL;DR: This database is one of the largest and most comprehensive databases of its type because it includes both in situ measurements and ecological context data and can be used as the foundation for other studies of freshwaters at broad spatial and ecological scales.

...read moreread less

Abstract: Understanding the factors that affect water quality and the ecological services provided by freshwater ecosystems is an urgent global environmental issue. Predicting how water quality will respond ...

...read moreread less

Journal Article•10.1093/gigascience/gix023•

Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics

[...]

Yu-Ming Lin¹, Jiumeng Min², Ruilian Lai, Zhangyan Wu, Yukun Chen³, Lili Yu, Chunzhen Cheng³, Yuanchun Jin, Qilin Tian, Qingfeng Liu, Weihua Liu, Chengguang Zhang, Lixia Lin³, Yan Hu, Dongmin Zhang, Minkyaw Thu³, Zihao Zhang³, Shengcai Liu⁴, Chunshui Zhong, Xiaodong Fang², Jian Wang⁵, Huanming Yang², Rajeev K. Varshney⁶, Ye Yin², Zhongxiong Lai³ - Show less +21 more•Institutions (6)

Massachusetts Institute of Technology¹, Beijing Genomics Institute², Fujian Agriculture and Forestry University³, Southern University of Science and Technology⁴, Guangzhou Medical University⁵, International Crops Research Institute for the Semi-Arid Tropics⁶

28 Mar 2017-GigaScience

TL;DR: Comparative transcriptome studies combined with genome-wide analysis revealed polyphenol-rich and pathogen resistance characteristics of longan fruit and suggested a genomic basis for resistance to insects, fungus, and bacteria in this fruit tree.

...read moreread less

Abstract: Abstract Longan (Dimocarpus longan Lour.), an important subtropical fruit in the family Sapindaceae, is grown in more than 10 countries. Longan is an edible drupe fruit and a source of traditional medicine with polyphenol-rich traits. Tree size, alternate bearing, and witches' broom disease still pose serious problems. To gain insights into the genomic basis of longan traits, a draft genome sequence was assembled. The draft genome (about 471.88 Mb) of a Chinese longan cultivar, “Honghezi,” was estimated to contain 31 007 genes and 261.88 Mb of repetitive sequences. No recent whole-genome-wide duplication event was detected in the genome. Whole-genome resequencing and analysis of 13 cultivated D. longan accessions revealed the extent of genetic diversity. Comparative transcriptome studies combined with genome-wide analysis revealed polyphenol-rich and pathogen resistance characteristics. Genes involved in secondary metabolism, especially those from significantly expanded (DHS, SDH, F3΄H, ANR, and UFGT) and contracted (PAL, CHS, and F3΄5΄H) gene families with tissue-specific expression, may be important contributors to the high accumulation levels of polyphenolic compounds observed in longan fruit. The high number of genes encoding nucleotide-binding site leucine-rich repeat (NBS-LRR) and leucine-rich repeat receptor-like kinase proteins, as well as the recent expansion and contraction of the NBS-LRR family, suggested a genomic basis for resistance to insects, fungus, and bacteria in this fruit tree. These data provide insights into the evolution and diversity of the longan genome. The comparative genomic and transcriptome analyses provided information about longan-specific traits, particularly genes involved in its polyphenol-rich and pathogen resistance characteristics.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX007•

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

[...]

Bonnie L. Brown¹, Michael Watson², Samuel S. Minot, Maria C. Rivera¹, Rima B. Franklin¹ - Show less +1 more•Institutions (2)

Virginia Commonwealth University¹, University of Edinburgh²

01 Mar 2017-GigaScience

TL;DR: The ability of sequence data produced by MinION to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities was tested, suggesting the platform has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa.

...read moreread less

Abstract: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at 99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX115•

CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations

[...]

Xihong Wang¹, Zhuqing Zheng¹, Yu-Dong Cai¹, Ting Chen¹, Chao Li¹, Weiwei Fu¹, Yu Jiang¹ - Show less +3 more•Institutions (1)

Northwest A&F University¹

01 Dec 2017-GigaScience

TL;DR: The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detectingCNVs in large-scale sequencing data with complex genomic structures and promotes population genetic analyses of functional CNVs in more species.

...read moreread less

Abstract: Background The increasing amount of sequencing data available for a wide variety of species can be theoretically used for detecting copy number variations (CNVs) at the population level. However, the growing sample sizes and the divergent complexity of nonhuman genomes challenge the efficiency and robustness of current human-oriented CNV detection methods. Results Here, we present CNVcaller, a read-depth method for discovering CNVs in population sequencing data. The computational speed of CNVcaller was 1-2 orders of magnitude faster than CNVnator and Genome STRiP for complex genomes with thousands of unmapped scaffolds. CNV detection of 232 goats required only 1.4 days on a single compute node. Additionally, the Mendelian consistency of sheep trios indicated that CNVcaller mitigated the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly. Furthermore, multiple evaluations using real sheep and human data indicated that CNVcaller achieved the best accuracy and sensitivity for detecting duplications. Conclusions The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detecting CNVs in large-scale sequencing data with complex genomic structures. Therefore, CNVcaller promotes population genetic analyses of functional CNVs in more species.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX095•

The genome draft of coconut (Cocos nucifera).

[...]

Yong Xiao¹, Pengwei Xu², Haikuo Fan¹, Luc Baudouin³, Wei Xia¹, Stéphanie Bocs³, Junyang Xu², Qiong Li, Anping Guo, Lixia Zhou¹, Jing Li¹, Yi Wu¹, Zilong Ma, Alix Armero³, Alix Armero⁴, Auguste Emmanuel Issali, Na Liu², Ming Peng, Yaodong Yang¹ - Show less +15 more•Institutions (4)

Chinese Academy of Tropical Agricultural Sciences¹, Beijing Genomics Institute², University of Montpellier³, SupAgro⁴

01 Nov 2017-GigaScience

TL;DR: Cocos nucifera is a member of genus Cocos and family Arecaceae (Palmaceae) as mentioned in this paper, which is an important tropical fruit and oil crop.

...read moreread less

Abstract: Coconut palm (Cocos nucifera,2n = 32), a member of genus Cocos and family Arecaceae (Palmaceae), is an important tropical fruit and oil crop. Currently, coconut palm is cultivated in 93 countries, including Central and South America, East and West Africa, Southeast Asia and the Pacific Islands, with a total growth area of more than 12 million hectares [1]. Coconut palm is generally classified into 2 main categories: "Tall" (flowering 8-10 years after planting) and "Dwarf" (flowering 4-6 years after planting), based on morphological characteristics and breeding habits. This Palmae species has a long growth period before reproductive years, which hinders conventional breeding progress. In spite of initial successes, improvements made by conventional breeding have been very slow. In the present study, we obtained de novo sequences of the Cocos nucifera genome: a major genomic resource that could be used to facilitate molecular breeding in Cocos nucifera and accelerate the breeding process in this important crop. A total of 419.67 gigabases (Gb) of raw reads were generated by the Illumina HiSeq 2000 platform using a series of paired-end and mate-pair libraries, covering the predicted Cocos nucifera genome length (2.42 Gb, variety "Hainan Tall") to an estimated ×173.32 read depth. A total scaffold length of 2.20 Gb was generated (N50 = 418 Kb), representing 90.91% of the genome. The coconut genome was predicted to harbor 28 039 protein-coding genes, which is less than in Phoenix dactylifera (PDK30: 28 889), Phoenix dactylifera (DPV01: 41 660), and Elaeis guineensis (EG5: 34 802). BUSCO evaluation demonstrated that the obtained scaffold sequences covered 90.8% of the coconut genome and that the genome annotation was 74.1% complete. Genome annotation results revealed that 72.75% of the coconut genome consisted of transposable elements, of which long-terminal repeat retrotransposons elements (LTRs) accounted for the largest proportion (92.23%). Comparative analysis of the antiporter gene family and ion channel gene families between C. nucifera and Arabidopsis thaliana indicated that significant gene expansion may have occurred in the coconut involving Na+/H+ antiporter, carnitine/acylcarnitine translocase, potassium-dependent sodium-calcium exchanger, and potassium channel genes. Despite its agronomic importance, C. nucifera is still under-studied. In this report, we present a draft genome of C. nucifera and provide genomic information that will facilitate future functional genomics and molecular-assisted breeding in this crop species.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX050•

Two distinct metacommunities characterize the gut microbiota in Crohn's disease patients.

[...]

Qing He¹, Yuan Gao, Zhuye Jie, Xinlei Yu, Janne Marie Laursen², Liang Xiao, Ying Li¹, Lingling Li¹, Faming Zhang³, Qiang Feng, Xiaoping Li, Yu Jinghong, Liu Chuan, Ping Lan¹, Ting Yan¹, Xin Liu, Xun Xu, Huanming Yang, Jian Wang, Lise Madsen⁴, Lise Madsen⁵, Susanne Brix², Jianping Wang¹, Karsten Kristiansen⁵, Huijue Jia - Show less +21 more•Institutions (5)

Sun Yat-sen University¹, Technical University of Denmark², Nanjing Medical University³, National Institute of Nutrition, Hyderabad⁴, University of Copenhagen⁵

01 Jul 2017-GigaScience

TL;DR: Metagenomic shotgun sequencing was employed to provide a detailed characterization of the compositional and functional features of the CD microbiota, comprising also unannotated bacteria, and investigated its modulation by exclusive enteral nutrition.

...read moreread less

Abstract: The inflammatory intestinal disorder Crohn's disease (CD) has become a health challenge worldwide. The gut microbiota closely interacts with the host immune system, but its functional impact in CD is unclear. Except for studies on a small number of CD patients, analyses of the gut microbiota in CD have used 16S rDNA amplicon sequencing. Here we employed metagenomic shotgun sequencing to provide a detailed characterization of the compositional and functional features of the CD microbiota, comprising also unannotated bacteria, and investigated its modulation by exclusive enteral nutrition. Based on signature taxa, CD microbiotas clustered into 2 distinct metacommunities, indicating individual variability in CD microbiome structure. Metacommunity-specific functional shifts in CD showed enrichment in producers of the pro-inflammatory hexa-acylated lipopolysaccharide variant and a reduction in the potential to synthesize short-chain fatty acids. Disruption of ecological networks was evident in CD, coupled with reduction in growth rates of many bacterial species. Short-term exclusive enteral nutrition elicited limited impact on the overall composition of the CD microbiota, although functional changes occurred following treatment. The microbiotas in CD patients can be stratified into 2 distinct metacommunities, with the most severely perturbed metacommunity exhibiting functional potentials that deviate markedly from that of the healthy individuals, with possible implication in relation to CD pathogenesis.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIW016•

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

[...]

Aleksey V. Zimin¹, Aleksey V. Zimin², Kristian Stevens³, Marc W. Crepeau³, Daniela Puiu¹, Jill L. Wegrzyn⁴, James A. Yorke², Charles H. Langley³, David B. Neale³, Steven L. Salzberg¹ - Show less +6 more•Institutions (4)

Johns Hopkins University¹, University of Maryland, College Park², University of California, Davis³, University of Connecticut⁴

01 Jan 2017-GigaScience

TL;DR: The 22-gigabase genome of loblolly pine is sequenced using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences, which generated approximately 12-fold coverage in long reads using the MaSuRCA mega-reads assembly algorithm.

...read moreread less

Abstract: The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX109•

Genome sequence of the small brown planthopper, Laodelphax striatellus.

[...]

Junjie Zhu¹, Feng Jiang¹, Xianhui Wang¹, Pengcheng Yang¹, Yanyuan Bao², Wan Zhao¹, Wei Wang¹, Hong Lu¹, Qianshuo Wang¹, Na Cui¹, Jing Li¹, Xiaofang Chen¹, Lan Luo¹, Jinting Yu¹, Le Kang¹, Feng Cui¹ - Show less +12 more•Institutions (2)

Chinese Academy of Sciences¹, Institute of Insect Sciences, Zhejiang University²

01 Dec 2017-GigaScience

TL;DR: Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit, and plant virus transmission between L. striatellus and the other 2 planthoppers.

...read moreread less

Abstract: Background Laodelphax striatellus Fallen (Hemiptera: Delphacidae) is one of the most destructive rice pests. L. striatellus is different from 2 other rice planthoppers with a released genome sequence, Sogatella furcifera and Nilaparvata lugens, in many biological characteristics, such as host range, dispersal capacity, and vectoring plant viruses. Deciphering the genome of L. striatellus will further the understanding of the genetic basis of the biological differences among the 3 rice planthoppers. Findings A total of 190 Gb of Illumina data and 32.4 Gb of Pacbio data were generated and used to assemble a high-quality L. striatellus genome sequence, which is 541 Mb in length and has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive elements account for 25.7% of the genome. A total of 17 736 protein-coding genes were annotated, capturing 97.6% and 98% of the BUSCO eukaryote and arthropoda genes, respectively. Compared with N. lugens and S. furcifera, L. striatellus has the smallest genome and the lowest gene number. Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit, and plant virus transmission between L. striatellus and the other 2 planthoppers. Conclusions We report a high-quality genome assembly of L. striatellus, which is an important genomic resource not only for the study of the biology of L. striatellus and its interactions with plant hosts and plant viruses, but also for comparison with other planthoppers.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIW011•

The Healthy Brain Network Serial Scanning Initiative: a resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions.

[...]

David H. O’Connor¹, David H. O’Connor², Natan Vega Potler², Meagan Kovacs², Ting Xu², Lei Ai², John Pellman², John Pellman¹, Tamara Vanderwal³, Lucas C. Parra⁴, Samantha Cohen⁵, Satrajit S. Ghosh⁶, Jasmine Escalera², Natalie Grant-Villegas², Yael Osman², Anastasia Bui², R. Cameron Craddock², R. Cameron Craddock¹, Michael P. Milham¹, Michael P. Milham² - Show less +16 more•Institutions (6)

Nathan Kline Institute for Psychiatric Research¹, MIND Institute², Yale University³, City College of New York⁴, City University of New York⁵, Massachusetts Institute of Technology⁶

01 Feb 2017-GigaScience

TL;DR: This resource provides a test-bed for quantifying the reliability of connectivity indices across subjects, conditions and time and can be used to compare and optimize different frameworks for measuring connectivity and data collection parameters such as scan length.

...read moreread less

Abstract: Background Although typically measured during the resting state, a growing literature is illustrating the ability to map intrinsic connectivity with functional MRI during task and naturalistic viewing conditions. These paradigms are drawing excitement due to their greater tolerability in clinical and developing populations and because they enable a wider range of analyses (e.g., inter-subject correlations). To be clinically useful, the test-retest reliability of connectivity measured during these paradigms needs to be established. This resource provides data for evaluating test-retest reliability for full-brain connectivity patterns detected during each of four scan conditions that differ with respect to level of engagement (rest, abstract animations, movie clips, flanker task). Data are provided for 13 participants, each scanned in 12 sessions with 10 minutes for each scan of the four conditions. Diffusion kurtosis imaging data was also obtained at each session. Findings Technical validation and demonstrative reliability analyses were carried out at the connection-level using the Intraclass Correlation Coefficient and at network-level representations of the data using the Image Intraclass Correlation Coefficient. Variation in intrinsic functional connectivity across sessions was generally found to be greater than that attributable to scan condition. Between-condition reliability was generally high, particularly for the frontoparietal and default networks. Between-session reliabilities obtained separately for the different scan conditions were comparable, though notably lower than between-condition reliabilities. Conclusions This resource provides a test-bed for quantifying the reliability of connectivity indices across subjects, conditions and time. The resource can be used to compare and optimize different frameworks for measuring connectivity and data collection parameters such as scan length. Additionally, investigators can explore the unique perspectives of the brain's functional architecture offered by each of the scan conditions.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX043•

Multi-locus and long amplicon sequencing approach to study microbial diversity at species level using the MinION™ portable nanopore sequencer.

[...]

Alfonso Benítez-Páez¹, Yolanda Sanz¹•Institutions (1)

Spanish National Research Council¹

01 Jul 2017-GigaScience

TL;DR: The data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities.

...read moreread less

Abstract: The miniaturized and portable DNA sequencer MinION™ has demonstrated great potential in different analyses such as genome-wide sequencing, pathogen outbreak detection and surveillance, human genome variability, and microbial diversity. In this study, we tested the ability of the MinION™ platform to perform long amplicon sequencing in order to design new approaches to study microbial diversity using a multi-locus approach. After compiling a robust database by parsing and extracting the rrn bacterial region from more than 67000 complete or draft bacterial genomes, we demonstrated that the data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, we presented a novel approach consisting of multi-locus and long amplicon sequencing using the MinION™ MkIb DNA sequencer and R9 and R9.4 chemistries that help to overcome the main disadvantage of this portable sequencing platform. Furthermore, the nanopore sequencing library, constructed with the last releases of pore chemistry (R9.4) and sequencing kit (SQK-LSK108), permitted the retrieval of the higher level of 1D read accuracy sufficient to characterize the microbial species present in each mock community analysed. Improvements in nanopore chemistry, such as minimizing base-calling errors and new library protocols able to produce rapid 1D libraries, will provide more reliable information in the near future. Such data will be useful for more comprehensive and faster specific detection of microbial species and strains in complex ecosystems.

...read moreread less

Journal Article•10.1093/GIGASCIENCE/GIX023•

Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics.

[...]

Yuling Lin¹, Jiumeng Min, Ruilian Lai¹, Zhangyan Wu, Yukun Chen¹, Lili Yu, Chunzhen Cheng¹, Yuanchun Jin, Qilin Tian¹, Qingfeng Liu, Weihua Liu¹, Chengguang Zhang, Lixia Lin¹, Yan Hu, Dongmin Zhang¹, Minkyaw Thu¹, Zihao Zhang¹, Liu Shengcai¹, Chunshui Zhong¹, Xiaodong Fang, Jian Wang, Huanming Yang, Rajeev K. Varshney², Rajeev K. Varshney³, Ye Yin, Zhongxiong Lai¹ - Show less +22 more•Institutions (3)

Fujian Agriculture and Forestry University¹, International Crops Research Institute for the Semi-Arid Tropics², University of Western Australia³

01 May 2017-GigaScience

...read moreread less

Abstract: Longan (Dimocarpus longan Lour.), an important subtropical fruit in the family Sapindaceae, is grown in more than 10 countries. Longan is an edible drupe fruit and a source of traditional medicine with polyphenol-rich traits. Tree size, alternate bearing, and witches' broom disease still pose serious problems. To gain insights into the genomic basis of longan traits, a draft genome sequence was assembled. The draft genome (about 471.88 Mb) of a Chinese longan cultivar, “Honghezi,” was estimated to contain 31 007 genes and 261.88 Mb of repetitive sequences. No recent whole-genome-wide duplication event was detected in the genome. Whole-genome resequencing and analysis of 13 cultivated D. longan accessions revealed the extent of genetic diversity. Comparative transcriptome studies combined with genome-wide analysis revealed polyphenol-rich and pathogen resistance characteristics. Genes involved in secondary metabolism, especially those from significantly expanded (DHS, SDH, F3΄H, ANR, and UFGT) and contracted (PAL, CHS, and F3΄5΄H) gene families with tissue-specific expression, may be important contributors to the high accumulation levels of polyphenolic compounds observed in longan fruit. The high number of genes encoding nucleotide-binding site leucine-rich repeat (NBS-LRR) and leucine-rich repeat receptor-like kinase proteins, as well as the recent expansion and contraction of the NBS-LRR family, suggested a genomic basis for resistance to insects, fungus, and bacteria in this fruit tree. These data provide insights into the evolution and diversity of the longan genome. The comparative genomic and transcriptome analyses provided information about longan-specific traits, particularly genes involved in its polyphenol-rich and pathogen resistance characteristics.

...read moreread less

...

Expand