TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
TL;DR: The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication and a comprehensive and searchable database of published miRNA sequences is accessible via a web interface.
Abstract: The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms.
TL;DR: Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
Abstract: Pfam is a database of protein families that currently contains 7973 entries (release 180) A recent development in Pfam has enabled the grouping of related families into clans Pfam clans are described in detail, together with the new associated web pages Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented Pfam is available on the web in the UK (http://wwwsangeracuk/Software/Pfam/), the USA (http://pfamwustledu/), France (http://pfamjouyinrafr/) and Sweden (http://pfamcgbkise/)
TL;DR: Guidelines are presented for the identification and annotation of new miRNAs from diverse organisms, particularly so that mi RNAs can be reliably distinguished from other RNAs such as small interfering RNAs.
Abstract: MicroRNAs (miRNAs) are small noncoding RNA gene products about 22 nt long that are processed by Dicer from precursors with a characteristic hairpin secondary structure. Guidelines are presented for the identification and annotation of new miRNAs from diverse organisms, particularly so that miRNAs can be reliably distinguished from other RNAs such as small interfering RNAs. We describe specific criteria for the experimental verification of miRNAs, and conventions for naming miRNAs and miRNA genes. Finally, an online clearinghouse for miRNA gene name assignments is provided by the Rfam database of RNA families.