TL;DR: Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
Abstract: Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp.
TL;DR: The plant TF database PlantTFDB is updated to version 3.0, with more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs.
Abstract: With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences.
TL;DR: The National Center for Biotechnology Information's (NCBI) Gene database integrates gene-specific information from multiple data sources and represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI.
Abstract: The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.
TL;DR: A new graphical viewer in Entrez will allow the user to view the genomic landscape from different vantage points and make connections to the sequences, structures, and abstracts relevant to specific chromosomal regions.
Abstract: Publisher Summary Entrez is a biomedical information resource that has been designed to facilitate the discovery process by providing connections among biological sequences, molecular structures, and abstracts. Because it must be anticipated that the amount of data will continue to grow at phenomenal rates, the Internet would seem to be the most practical medium for the future dissemination of this information. However, besides these quantitative changes, several trends promise to alter qualitatively the nature of the nucleotide sequence database. It is necessary to be prepared for the expected volume of data, but changes to the Entrez user interface may also be needed to make effective use of it. For example, imagine finding a sequence of interest and asking for its sequence neighbors only to be presented with a complete chromosome sequence. A new graphical viewer in Entrez will allow the user to view the genomic landscape from different vantage points and make connections to the sequences, structures, and abstracts relevant to specific chromosomal regions.
TL;DR: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement, which show promise as tools to link the literature with biological databases.
Abstract: Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated