InParanoid 7 : new algorithms and tools for eukaryotic orthology analysis

doi:10.1093/NAR/GKP931

Open AccessJournal Article10.1093/NAR/GKP931

InParanoid 7 : new algorithms and tools for eukaryotic orthology analysis

Gabriel Östlund, +7 more

- 01 Jan 2010

- Nucleic Acids Research

- Vol. 38, Iss: 1, pp 196-203

702

TL;DR: A two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows in homology assignment.

Abstract: The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.2174/1574893617666220304201507

A Novel Method for Predicting Essential Proteins by Integrating Multidimensional Biological Attribute Information and Topological Properties

Hanyu Lu, +5 more

- 04 Mar 2022

- Current Bioinformatics

TL;DR: Experimental results show that the proposed EOP (Edge clustering coefficient -Orthologous-Protein) method can achieve satisfactory prediction results, which may provide references for future research.

...read moreread less

3

A phylogenomic pipeline and its use inferring lateral gene transfer in the eukaryotic parasite Entamoeba histolytica

Jessica Rain Grant

- 01 Jan 2013

3

Regulation of Tissue-Specific Expression in the C. Elegans Embryo

Joshua Tom Burdick

- 01 Jan 2015

TL;DR: This work asked how well it could “deconvolute” the expression genome-wide in each individual cell, based on expression measurements in overlapping sets of cells, and found that it could estimate the possible range of expression throughout the embryo, using far fewer measurements than there are cells.

...read moreread less

3

potential role in vascular networks

Guoxiong Xu, +8 more

- 01 Jan 2013

3

•Journal Article•10.3389/fgene.2022.1087294

A disease-related essential protein prediction model based on the transfer neural network

Sisi Chen, +3 more

- 04 Jan 2023

- Frontiers in Genetics

TL;DR: In this paper , an improved Transfer Neural Network (TNN) was designed to extract raw features from multiple biological information of proteins first, and then, based on the newly-constructed transfer neural network, a novel computational model called TNNM is designed to infer essential proteins in this paper.

...read moreread less

2

...

Expand

References

•Journal Article•10.1093/NAR/GKL842

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Kim D. Pruitt, +2 more

- 17 Dec 2004

- Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.

...read moreread less

4.8K

Journal Article•10.2307/2412448

Distinguishing Homologous From Analogous Proteins

Walter M. Fitch

- 01 Jun 1970

- Systematic Biology

TL;DR: This work provides a means by which it is possible to determine whether two groups of related proteins have a common ancestor or are of independent origin, and how many nucleotide positions must differ in the genes encoding the two presumptively homologous proteins.

...read moreread less

1.6K

•Journal Article•10.1093/NAR/GKL976

The TIGR Rice Genome Annotation Resource: Improvements and New Features

Shu Ouyang, +13 more

- 01 Jan 2007

- Nucleic Acids Research

TL;DR: Through incorporation of multiple transcript and proteomic expression data sets, the Institute for Genomic Research has been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome.

...read moreread less

1.3K

•Journal Article•10.1111/J.1742-4658.2005.04945.X

Protein database searches using compositionally adjusted substitution matrices

Stephen F. Altschul, +6 more

- 01 Oct 2005

- FEBS Journal

TL;DR: This work has recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions.

...read moreread less

1.1K