Top 77 PLOS Computational Biology papers published in 2006

Showing papers in "PLOS Computational Biology in 2006"

Journal Article•10.1371/JOURNAL.PCBI.0020140•

Cooperation prevails when individuals adjust their social ties.

[...]

Francisco C. Santos¹, Jorge M. Pacheco², Jorge M. Pacheco³, Tom Lenaerts⁴•Institutions (4)

Université libre de Bruxelles¹, Harvard University², University of Lisbon³, Vrije Universiteit Brussel⁴

20 Oct 2006-PLOS Computational Biology

TL;DR: In this paper, a computational model is constructed in which individuals are able to self-organize both their strategy and their social ties throughout evolution, based exclusively on their self-interest.

...read moreread less

Abstract: Conventional evolutionary game theory predicts that natural selection favours the selfish and strong even though cooperative interactions thrive at all levels of organization in living systems. Recent investigations demonstrated that a limiting factor for the evolution of cooperative interactions is the way in which they are organized, cooperators becoming evolutionarily competitive whenever individuals are constrained to interact with few others along the edges of networks with low average connectivity. Despite this insight, the conundrum of cooperation remains since recent empirical data shows that real networks exhibit typically high average connectivity and associated single-to-broad–scale heterogeneity. Here, a computational model is constructed in which individuals are able to self-organize both their strategy and their social ties throughout evolution, based exclusively on their self-interest. We show that the entangled evolution of individual strategy and network structure constitutes a key mechanism for the sustainability of cooperation in social networks. For a given average connectivity of the population, there is a critical value for the ratio W between the time scales associated with the evolution of strategy and of structure above which cooperators wipe out defectors. Moreover, the emerging social networks exhibit an overall heterogeneity that accounts very well for the diversity of patterns recently found in acquired data on social networks. Finally, heterogeneity is found to become maximal when W reaches its critical value. These results show that simple topological dynamics reflecting the individual capacity for self-organization of social ties can produce realistic networks of high average connectivity with associated single-to-broad–scale heterogeneity. On the other hand, they show that cooperation cannot evolve as a result of “social viscosity” alone in heterogeneous networks with high average connectivity, requiring the additional mechanism of topological co-evolution to ensure the survival of cooperative behaviour.

...read moreread less

573 citations

Journal Article•10.1371/JOURNAL.PCBI.0020144•

Mapping Information Flow in Sensorimotor Networks

[...]

Max Lungarella¹, Olaf Sporns²•Institutions (2)

University of Tokyo¹, Indiana University²

27 Oct 2006-PLOS Computational Biology

TL;DR: The results suggest a fundamental link between physical embeddedness and information, highlighting the effects of embodied interactions on internal (neural) information processing, and illuminating the role of various system components on the generation of behavior.

...read moreread less

Abstract: Biological organisms continuously select and sample information used by their neural structures for perception and action, and for creating coherent cognitive states guiding their autonomous behavior. Information processing, however, is not solely an internal function of the nervous system. Here we show, instead, how sensorimotor interaction and body morphology can induce statistical regularities and information structure in sensory inputs and within the neural control architecture, and how the flow of information between sensors, neural units, and effectors is actively shaped by the interaction with the environment. We analyze sensory and motor data collected from real and simulated robots and reveal the presence of information structure and directed information flow induced by dynamically coupled sensorimotor activity, including effects of motor outputs on sensory inputs. We find that information structure and information flow in sensorimotor networks (a) is spatially and temporally specific; (b) can be affected by learning, and (c) can be affected by changes in body morphology. Our results suggest a fundamental link between physical embeddedness and information, highlighting the effects of embodied interactions on internal (neural) information processing, and illuminating the role of various system components on the generation of behavior.

...read moreread less

347 citations

Journal Article•10.1371/JOURNAL.PCBI.0020013•

Structure modeling of all identified G protein-coupled receptors in the human genome.

[...]

Yang Zhang¹, Mark E. DeVries¹, Jeffrey Skolnick¹•Institutions (1)

University at Buffalo¹

17 Feb 2006-PLOS Computational Biology

TL;DR: Structural clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse, which demonstrates the usefulness and robustness of the in silico models for GPCR functional analysis.

...read moreread less

Abstract: G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Cα root-mean-squared deviation from native of 4.6 A, with a root-mean-squared deviation in the transmembrane helix region of 2.1 A. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR).

...read moreread less

260 citations

Journal Article•10.1371/JOURNAL.PCBI.0020143•

The Evolution of Two-Component Systems in Bacteria Reveals Different Strategies for Niche Adaptation

[...]

Eric J. Alm, Katherine H. Huang¹, Adam P. Arkin², Adam P. Arkin¹•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

03 Nov 2006-PLOS Computational Biology

TL;DR: This article analyzed the phylogenetic distribution of nearly 5,000 histidine protein kinases from 207 sequenced prokaryotic genomes and found that many genomes carry a large repertoire of recently evolved signaling genes, which may reflect selective pressure to adapt to new environmental conditions.

...read moreread less

Abstract: Two-component systems including histidine protein kinases represent the primary signal transduction paradigm in prokaryotic organisms To understand how these systems adapt to allow organisms to detect niche-specific signals, we analyzed the phylogenetic distribution of nearly 5,000 histidine protein kinases from 207 sequenced prokaryotic genomes We found that many genomes carry a large repertoire of recently evolved signaling genes, which may reflect selective pressure to adapt to new environmental conditions Both lineage-specific gene family expansion and horizontal gene transfer play major roles in the introduction of new histidine kinases into genomes; however, there are differences in how these two evolutionary forces act Genes imported via horizontal transfer are more likely to retain their original functionality as inferred from a similar complement of signaling domains, while gene family expansion accompanied by domain shuffling appears to be a major source of novel genetic diversity Family expansion is the dominant source of new histidine kinase genes in the genomes most enriched in signaling proteins, and detailed analysis reveals that divergence in domain structure and changes in expression patterns are hallmarks of recent expansions Finally, while these two modes of gene acquisition are widespread across bacterial taxa, there are clear species-specific preferences for which mode is used

...read moreread less

227 citations

Journal Article•10.1371/JOURNAL.PCBI.0020036•

Practical Strategies for Discovering Regulatory DNA Sequence Motifs

[...]

Kenzie D MacIsaac, Ernest Fraenkel

28 Apr 2006-PLOS Computational Biology

TL;DR: This tutorial reviews computational techniques, termed “motif discovery,” to learn representations of regulatory motifs from sequence data, and discusses the main challenges associated with motif discovery in detail.

...read moreread less

Abstract: Many functionally important regions of the genome can be recognized by searching for sequence patterns, or “motifs.” Aside from the genes themselves, examples include CpG islands, often present in promoter regions, and splice sites that denote intron/exon boundaries. Other motifs of great interest correspond to sites bound by regulatory proteins. Differential expression of genes in response to environmental and developmental cues depends on the action of these proteins, which are also known as transcription factors. Identifying the regulatory motifs bound by transcription factors can provide crucial insight into the mechanisms of transcriptional regulation. However, the search for these sites is challenging because a single regulatory protein will often recognize a variety of similar sequences. In this tutorial, we review computational techniques, termed “motif discovery,” to learn representations of regulatory motifs from sequence data. In Figure 1, we present an overview of the basic workflow in a motif discovery analysis and some practical strategies for successfully mining sequence data for biologically important regulatory motifs. In the remainder of this tutorial, we discuss the main challenges associated with motif discovery in detail, and we review recent developments for addressing these challenges. Figure 1 Motif Discovery Workflow

...read moreread less

196 citations

Journal Article•10.1371/JOURNAL.PCBI.0020045•

Statistics of Knots, Geometry of Conformations, and Evolution of Proteins

[...]

Rhonald C. Lua¹, Alexander Y. Grosberg¹•Institutions (1)

University of Minnesota¹

19 May 2006-PLOS Computational Biology

TL;DR: It is shown that native conformations of proteins have statistically fewer knots than random compact loops, and that the local geometrical properties, such as the crumpled character of the conformations at a certain range of scales, are consistent with the rarity of knots.

...read moreread less

Abstract: Like shoelaces, the backbones of proteins may get entangled and form knots. However, only a few knots in native proteins have been identified so far. To more quantitatively assess the rarity of knots in proteins, we make an explicit comparison between the knotting probabilities in native proteins and in random compact loops. We identify knots in proteins statistically, applying the mathematics of knot invariants to the loops obtained by complementing the protein backbone with an ensemble of random closures, and assigning a certain knot type to a given protein if and only if this knot dominates the closure statistics (which tells us that the knot is determined by the protein and not by a particular method of closure). We also examine the local fractal or geometrical properties of proteins via computational measurements of the end-to-end distance and the degree of interpenetration of its subchains. Although we did identify some rather complex knots, we show that native conformations of proteins have statistically fewer knots than random compact loops, and that the local geometrical properties, such as the crumpled character of the conformations at a certain range of scales, are consistent with the rarity of knots. From these, we may conclude that the known “protein universe” (set of native conformations) avoids knots. However, the precise reason for this is unknown—for instance, if knots were removed by evolution due to their unfavorable effect on protein folding or function or due to some other unidentified property of protein evolution.

...read moreread less

185 citations

Journal Article•10.1371/JOURNAL.PCBI.0020133•

Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human.

[...]

Leo Goodstadt¹, Chris P. Ponting¹•Institutions (1)

University of Oxford¹

29 Sep 2006-PLOS Computational Biology

TL;DR: PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species, and will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.

...read moreread less

Abstract: Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.

...read moreread less

176 citations

Journal Article•10.1371/JOURNAL.PCBI.0020061•

Prioritizing Genomic Drug Targets in Pathogens: Application to Mycobacterium tuberculosis

[...]

Samiul Hasan¹, Sabine Daugelat¹, Srinivasa P. S. Rao¹, Mark Schreiber¹•Institutions (1)

Novartis Institute for Tropical Diseases¹

09 Jun 2006-PLOS Computational Biology

TL;DR: A software program that weights and integrates specific properties on the genes in a pathogen so that they may be ranked as drug targets is developed and it is shown that targets can be prioritized by using evolutionary programming to optimize the weights of each desired property.

...read moreread less

Abstract: We have developed a software program that weights and integrates specific properties on the genes in a pathogen so that they may be ranked as drug targets. We applied this software to produce three prioritized drug target lists for Mycobacterium tuberculosis, the causative agent of tuberculosis, a disease for which a new drug is desperately needed. Each list is based on an individual criterion. The first list prioritizes metabolic drug targets by the uniqueness of their roles in the M. tuberculosis metabolome (“metabolic chokepoints”) and their similarity to known “druggable” protein classes (i.e., classes whose activity has previously been shown to be modulated by binding a small molecule). The second list prioritizes targets that would specifically impair M. tuberculosis, by weighting heavily those that are closely conserved within the Actinobacteria class but lack close homology to the host and gut flora. M. tuberculosis can survive asymptomatically in its host for many years by adapting to a dormant state referred to as “persistence.” The final list aims to prioritize potential targets involved in maintaining persistence in M. tuberculosis. The rankings of current, candidate, and proposed drug targets are highlighted with respect to these lists. Some features were found to be more accurate than others in prioritizing studied targets. It can also be shown that targets can be prioritized by using evolutionary programming to optimize the weights of each desired property. We demonstrate this approach in prioritizing persistence targets.

...read moreread less

160 citations

Journal Article•10.1371/JOURNAL.PCBI.0020054•

The G protein-coupled receptor subset of the chicken genome.

[...]

Malin C. Lagerström¹, Anders R. Hellström¹, David E. Gloriam¹, Thomas P. Larsson¹, Helgi B. Schiöth¹, Robert Fredriksson¹ - Show less +2 more•Institutions (1)

Uppsala University¹

02 Jun 2006-PLOS Computational Biology

TL;DR: This dataset of chicken GPCRs is the largest curated dataset from a single gene family from a non-mammalian vertebrate, and has high proportions of orthologous pairs, although the percentage of amino acid identity varies.

...read moreread less

Abstract: G protein-coupled receptors (GPCRs) are one of the largest families of proteins, and here we scan the recently sequenced chicken genome for GPCRs. We use a homology-based approach, utilizing comparisons with all human GPCRs, to detect and verify chicken GPCRs from translated genomic alignments and Genscan predictions. We present 557 manually curated sequences for GPCRs from the chicken genome, of which 455 were previously not annotated. More than 60% of the chicken Genscan gene predictions with a human ortholog needed curation, which drastically changed the average percentage identity between the human-chicken orthologous pairs (from 56.3% to 72.9%). Of the non-olfactory chicken GPCRs, 79% had a one-to-one orthologous relationship to a human GPCR. The Frizzled, Secretin, and subgroups of the Rhodopsin families have high proportions of orthologous pairs, although the percentage of amino acid identity varies. Other groups show large differences, such as the Adhesion family and GPCRs that bind exogenous ligands. The chicken has only three bitter Taste 2 receptors, and it also lacks an ortholog to human TAS1R2 (one of three GPCRs in the human genome in the Taste 1 receptor family [TAS1R]), implying that the chicken's ability and mode of detecting both bitter and sweet taste may differ from the human's. The chicken genome contains at least 229 olfactory receptors, and the majority of these (218) originate from a chicken-specific expansion. To our knowledge, this dataset of chicken GPCRs is the largest curated dataset from a single gene family from a non-mammalian vertebrate. Both the updated human GPCR dataset, as well the chicken GPCR dataset, are available for download.

...read moreread less

125 citations

Journal Article•10.1371/JOURNAL.PCBI.0020145•

Identification of the Proliferation/Differentiation Switch in the Cellular Network of Multicellular Organisms

[...]

Kai Xia¹, Huiling Xue¹, Dong Dong¹, Shanshan Zhu¹, Jiamu Wang¹, Qingpeng Zhang¹, Lei Hou¹, Hua Chen², Ran Tao², Zheng Huang¹, Zheng Fu¹, Ye-Guang Chen², Jing-Dong J. Han¹ - Show less +9 more•Institutions (2)

Chinese Academy of Sciences¹, Tsinghua University²

24 Nov 2006-PLOS Computational Biology

TL;DR: The results indicate that even at the tissue and organism levels, proliferation and differentiation modules may correspond to two alternative states of the molecular network and may reflect a universal symbiotic relationship in a multicellular organism.

...read moreread less

Abstract: The protein–protein interaction networks, or interactome networks, have been shown to have dynamic modular structures, yet the functional connections between and among the modules are less well understood. Here, using a new pipeline to integrate the interactome and the transcriptome, we identified a pair of transcriptionally anticorrelated modules, each consisting of hundreds of genes in multicellular interactome networks across different individuals and populations. The two modules are associated with cellular proliferation and differentiation, respectively. The proliferation module is conserved among eukaryotic organisms, whereas the differentiation module is specific to multicellular organisms. Upon differentiation of various tissues and cell lines from different organisms, the expression of the proliferation module is more uniformly suppressed, while the differentiation module is upregulated in a tissue- and species-specific manner. Our results indicate that even at the tissue and organism levels, proliferation and differentiation modules may correspond to two alternative states of the molecular network and may reflect a universal symbiotic relationship in a multicellular organism. Our analyses further predict that the proteins mediating the interactions between these modules may serve as modulators at the proliferation/differentiation switch.

...read moreread less

107 citations

Journal Article•10.1371/JOURNAL.PCBI.0020027•

Folding Very Short Peptides Using Molecular Dynamics

[...]

Bosco K. Ho¹, Ken A. Dill¹•Institutions (1)

University of California, San Francisco¹

14 Apr 2006-PLOS Computational Biology

TL;DR: This work simulated 133 peptide 8-mer fragments from six different proteins, sampled by replica-exchange molecular dynamics using Amber7 with a GB/SA (generalized-Born/solvent-accessible electrostatic approximation to water) implicit solvent, and found that 85 of the peptides have no preferred structure, while 48 of them converge to a preferred structure.

...read moreread less

Abstract: Peptides often have conformational preferences. We simulated 133 peptide 8-mer fragments from six different proteins, sampled by replica-exchange molecular dynamics using Amber7 with a GB/SA (generalized-Born/solvent-accessible electrostatic approximation to water) implicit solvent. We found that 85 of the peptides have no preferred structure, while 48 of them converge to a preferred structure. In 85% of the converged cases (41 peptides), the structures found by the simulations bear some resemblance to their native structures, based on a coarse-grained backbone description. In particular, all seven of the β hairpins in the native structures contain a fragment in the turn that is highly structured. In the eight cases where the bioinformatics-based I-sites library picks out native-like structures, the present simulations are largely in agreement. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020174•

Modularity and dynamics of cellular networks.

[...]

Yuan Qi¹, Hui Ge•Institutions (1)

Massachusetts Institute of Technology¹

29 Dec 2006-PLOS Computational Biology

TL;DR: Recent progress on addressing questions in analyzing the architecture and dynamics of cellular networks are surveyed and mammalian cell signaling is used as case studies to discuss how computational analyses of networks shed light on specific biological processes.

...read moreread less

Abstract: Understanding how the phenotypes and behaviors of cells are controlled is one of the major challenges in biological research. Traditionally, focus has been given to the characterization of individual genes/proteins or individual interactions during cellular events. However, many phenotypes and behaviors cannot be attributed to isolated components. Rather, they arise from characteristics of cellular networks, which represent connections between molecules in cells. We review the recent progress on analyzing the architecture and dynamics of cellular networks. We also summarize how computational modeling yields insight about cell signaling pathways. The responses of cells to genetic perturbations or environmental cues are controlled by complex networks, including interconnected signaling pathways and cascades of transcriptional programs. The advance of genome technologies has made it possible to analyze cellular events on a global scale. A number of high-throughput techniques, such as DNA microarrays, chromatin immunoprecipitations, and yeast two-hybrid and mass-spectrometry analyses have been applied to cellular systems [1–10]. These experiments have provided first-draft catalogs of essential components, transcriptional regulatory diagrams, and molecular interaction maps for a number of organisms. In addition to providing a candidate list of biomolecules involved in biological processes, the high-throughput technologies offer unprecedented opportunities to derive underlying principles of how complex cellular networks are built and how network architectures contribute to phenotypes. A series of important questions in this area have been addressed recently (Figure 1). For example, what are the characteristics of cellular network structures that distinguish them from randomly generated networks? Are the network structures relevant for biological functions? If so, are they evolutionarily conserved and how do they evolve? Are some topological patterns preferred at certain times or conditions? These questions are analogous to those asked in the field of genome sequence analysis, such as identifying biologically relevant sequence motifs and domains, investigating the evolutionary conservation between sequences from different species, and understanding temporal or spatial specificities of regulatory sites. In this paper, we survey recent progress on addressing these questions and use mammalian cell signaling as case studies to discuss how computational analyses of networks shed light on specific biological processes. Figure 1 An Overview of Biological Network Analyses Based on “Omic” Data

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020149•

Correction: Emergence of Protein Fold Families through Rational Design

[...]

Feng Ding, Nikolay V. Dokholyan

27 Oct 2006-PLOS Computational Biology

TL;DR: The sequence identities of the redesigned proteins using the flexible-backbone design simulation are presented as the function of the backbone-RMSD from the reference protein.

...read moreread less

Abstract: In PLoS Computational Biology, volume 2, issue 7: DOI: 10.1371/journal.pcbi.0020085 The references to the figure parts in the legend of figure 3 were incorrect. The correct caption is as follows: Figure 3. The Sequence Identity for the Constructed Homologous Structures Three different protein folds are studied: HPR domain (A,D), ROSSMAN fold (B,E), and SH3 domain (C,F). (A,B,C) The sequence identities of the redesigned proteins using the flexible-backbone design simulation are presented as the function of the backbone-RMSD from the reference protein. (D,E,F) The sequence identity of the core is also plotted against the overall sequence identity. The “twilight zone” of sequence identity (20%–30%) corresponds to regions between horizontal (A,B,C) or vertical (D,E,F) lines.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020053•

Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells.

[...]

Anthony A. Philippakis¹, Brian W. Busser¹, Stephen S. Gisselbrecht¹, Fangxue Sherry He², Fangxue Sherry He¹, Beatriz Estrada¹, Alan D. Michelson¹, Martha L. Bulyk - Show less +4 more•Institutions (2)

Brigham and Women's Hospital¹, Massachusetts Institute of Technology²

26 May 2006-PLOS Computational Biology

TL;DR: The present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based.

...read moreread less

Abstract: While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed "CodeFinder") that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020077•

Functional classification using phylogenomic inference.

[...]

Duncan Brown¹, Kimmen Sjölander•Institutions (1)

University of California, Berkeley¹

30 Jun 2006-PLOS Computational Biology

TL;DR: Phylogenomic inference of protein (or gene) function attempts to address the question, “What function does this protein perform?” in an evolutionary context by using annotated subfamily groupings to infer function.

...read moreread less

Abstract: Phylogenomic inference of protein (or gene) function attempts to address the question, “What function does this protein perform?” in an evolutionary context. As originally outlined by Jonathan Eisen [1–3], phylogenomic inference of protein function is a multistep process involving selection of homologs, multiple sequence alignment (MSA), and phylogenetic tree construction; overlaying annotations on the tree topology; discriminating between orthologs and paralogs; and—finally—inferring the function of a protein based on the orthologs identified by this process and the annotations retrieved. Figure 1 shows an example of using annotated subfamily groupings to infer function, in a manner similar to [1]. One of us, while at Celera Genomics, separately came up with a similar approach for the functional classification of the human genome [4], based on the automated identification of functional subfamilies using the SCI-PHY algorithm and the use of subfamily hidden Markov models (HMMs) to classify novel sequences [5,6]. Our experiences over the past several years in developing computational pipelines for automating phylogenomic inference at the genome scale [7]—and the challenges we have faced in this effort—motivate this paper. Figure 1 Phylogenomic Analysis of Protein Function Using Subfamily Annotation In practice, phylogenomic inference of gene function is not often used. Far from it. The majority of novel sequences are assigned a putative function through the use of annotation transfer from the top hits in a database search. In our analysis of over 300,000 proteins in the UniProt database, only 3% of proteins with informative annotations (i.e., those not labelled as “hypothetical” or “unknown”) had experimental support for their annotations; 97% were annotated using electronic evidence alone. These annotations are uploaded to GenBank, where they persist even if they are eventually determined to be in error. The systematic errors associated with this annotation protocol have been pointed out by numerous investigators over the years [8–10]. The root causes of these errors are these: Gene duplication. This enables protein superfamilies to innovate novel functions on the same structural template, so that the top database hit may have a function distinct from the query. Domain shuffling. Domain fusion and fission events add an additional layer of complexity, as a query and database hit may share only a local region of homology and thus have entirely different molecular functions and structures. Propagation of existing errors in database annotations. This is particularly pernicious, as existing annotation errors are seldom detected and, even if detected, are not necessarily corrected. Evolutionary distance. Two proteins can share a common ancestor and domain structure, yet have very different functions simply due to their presence in very divergently related species. Phylogenomic analysis, properly applied, avoids these errors and provides a mechanism for detecting existing database annotation errors [3,7]. Why then is phylogenomic inference not used more widely? We believe this is due to four reasons. First, the actual frequency of annotation error is not known, so the gravity of the situation is not recognized. Second, phylogenomic inference is a much more complicated endeavor than a simple database search and requires significantly more expertise and computing resources. It is therefore not easily applied at the genome scale. Third, millions of dollars and years of effort have been poured into developing computational annotation systems that depend on annotation transfer from top database hits, perhaps overlaid with domain prediction methods such as PFAM or the NCBI CDD [11,12]. Fourth, phylogenomic approaches to protein function prediction have arisen only in the last few years, while database search methods have been available for much longer. Revolutions do not normally take place overnight. These four reasons result in phylogenomic inference being applied on a one-off basis, for a few protein superfamilies here and there. This may be about to change. A variety of software tools and algorithms enabling phylogenomic inference have been developed in recent years (see Table 1). Some of these methods have based annotation transfer on the identification of orthologs [13–15] or of functional subfamilies [6,16–21]. Other groups have used whole-tree analyses [22–24]. Still other groups employ expert knowledge to define functional subtypes and then develop statistical models to allow users to classify novel sequences [25,26]; these expert system-based approaches are unfortunately limited by the scarcity of experimental data for most protein families. Table 1 Resources for Phylogenomic Analysis It is worth examining the assumptions underlying these phylogenomic resources, and phylogenomic inference as a whole.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020127•

Computational model of vascular endothelial growth factor spatial distribution in muscle and pro-angiogenic cell therapy.

[...]

Feilim Mac Gabhann¹, James W. Ji¹, Aleksander S. Popel•Institutions (1)

Johns Hopkins University¹

22 Sep 2006-PLOS Computational Biology

TL;DR: A biophysically and molecularly detailed computational model is constructed to study microenvironmental transport of two isoforms of VEGF in rat extensor digitorum longus skeletal muscle under in vivo conditions and results in a platform for the design and evaluation of therapeutic approaches.

...read moreread less

Abstract: Members of the vascular endothelial growth factor (VEGF) family of proteins are critical regulators of angiogenesis. VEGF concentration gradients are important for activation and chemotactic guidance of capillary sprouting, but measurement of these gradients in vivo is not currently possible. We have constructed a biophysically and molecularly detailed computational model to study microenvironmental transport of two isoforms of VEGF in rat extensor digitorum longus skeletal muscle under in vivo conditions. Using parameters based on experimental measurements, the model includes: VEGF secretion from muscle fibers; binding to the extracellular matrix; binding to and activation of endothelial cell surface VEGF receptors; and internalization. For 2-D cross sections of tissue, we analyzed predicted VEGF distributions, gradients, and receptor binding. Significant VEGF gradients (up to 12% change in VEGF concentration over 10 lm) were predicted in resting skeletal muscle with uniform VEGF secretion, due to non-uniform capillary distribution. These relative VEGF gradients were not sensitive to extracellular matrix composition, or to the overall VEGF expression level, but were dependent on VEGF receptor density and affinity, and internalization rate parameters. VEGF upregulation in a subset of fibers increased VEGF gradients, simulating transplantation of proangiogenic myoblasts, a possible therapy for ischemic diseases. The number and relative position of overexpressing fibers determined the VEGF gradients and distribution of VEGF receptor activation. With total VEGF expression level in the tissue unchanged, concentrating overexpression into a small number of adjacent fibers can increase the number of capillaries activated. The VEGF concentration gradients predicted for resting muscle (average 3% VEGF/10 lm) is sufficient for cellular sensing; the tip cell of a vessel sprout is approximately 50 lm long. The VEGF gradients also result in heterogeneity in the activation of blood vessel VEGF receptors. This first model of VEGF tissue transport and heterogeneity provides a platform for the design and evaluation of therapeutic approaches.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020006•

Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods

[...]

Kai Puolamäki¹, Mikael Fortelius², Heikki Mannila¹•Institutions (2)

Helsinki University of Technology¹, University of Helsinki²

10 Feb 2006-PLOS Computational Biology

TL;DR: A full probabilistic model for fossil data that can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection is described.

...read moreread less

Abstract: Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020099•

A biocurator perspective: annotation at the Research Collaboratory for Structural Bioinformatics Protein Data Bank.

[...]

Kyle Burkhardt, Bohdan Schneider, Jeramia Ory

27 Oct 2006-PLOS Computational Biology

TL;DR: Light is shed on the daily challenges faced by annotators at the RCSB and the reader is given a glimpse at the juggling act that defines the job of a biocurator.

...read moreread less

Abstract: Like most scientists, annotators at the Research Collaboratory for Structural Bioinformatics (RCSB) (http://www.pdb.org) dread the immortal cocktail party question “So, what do you do?” Unlike for some jobs, however, their answer can leave other scientists at the party with no response. Even within the structural biology community, our job is not well-understood. Throughout this perspective, we will shed light on the daily challenges faced by annotators at the RCSB and give the reader a glimpse at the juggling act that defines the job of a biocurator.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020012•

Ten Simple Rules for Getting Grants

[...]

Philip E. Bourne, Leo M. Chalupa

24 Feb 2006-PLOS Computational Biology

TL;DR: This piece follows an earlier Editorial, ‘‘Ten Simple Rules for Getting Published’’, and believes the rules presented here are generic, transcending funding institutions and national boundaries.

...read moreread less

Abstract: This piece follows an earlier Editorial, ‘‘Ten Simple Rules for Getting Published’’ [1], which has generated significant interest, is well read, and continues to generate a variety of positive comments. That Editorial was aimed at students in the early stages of a life of scientific paper writing. This interest has prompted us to try to help scientists in making the next academic career step—becoming a young principal investigator. Leo Chalupa has joined us in putting together ten simple rules for getting grants, based on our many collective years of writing both successful and unsuccessful grants. While our grant writing efforts have been aimed mainly at United States government funding agencies, we believe the rules presented here are generic, transcending funding institutions and national boundaries. At the present time, US funding is frequently below 10% for a given grant program. Today, more than ever, we need all the help we can get in writing successful grant proposals. We hope you find these rules useful in reaching your research career goals.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020141•

From bad to good: Fitness reversals and the ascent of deleterious mutations.

[...]

Matthew C. Cowperthwaite¹, James J. Bull¹, Lauren Ancel Meyers², Lauren Ancel Meyers¹•Institutions (2)

University of Texas at Austin¹, Santa Fe Institute²

20 Oct 2006-PLOS Computational Biology

TL;DR: In a simulation model of an evolving population of asexually replicating RNA molecules, initially deleterious mutations accumulated at rates nearly equal to that of initially beneficial mutations, without impeding evolutionary progress.

...read moreread less

Abstract: Deleterious mutations are considered a major impediment to adaptation, and there are straightforward expectations for the rate at which they accumulate as a function of population size and mutation rate. In a simulation model of an evolving population of asexually replicating RNA molecules, initially deleterious mutations accumulated at rates nearly equal to that of initially beneficial mutations, without impeding evolutionary progress. As the mutation rate was increased within a moderate range, deleterious mutation accumulation and mean fitness improvement both increased. The fixation rates were higher than predicted by many population-genetic models. This seemingly paradoxical result was resolved in part by the observation that, during the time to fixation, the selection coefficient (s) of initially deleterious mutations reversed to confer a selective advantage. Significantly, more than half of the fixations of initially deleterious mutations involved fitness reversals. These fitness reversals had a substantial effect on the total fitness of the genome and thus contributed to its success in the population. Despite the relative importance of fitness reversals, however, the probabilities of fixation for both initially beneficial and initially deleterious mutations were exceedingly small (on the order of 10−5 of all mutations).

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020021•

Spatiotemporal Expression Control Correlates with Intragenic Scaffold Matrix Attachment Regions (S/MARs) in Arabidopsis thaliana

[...]

Igor V. Tetko, Georg Haberer, Stephen Rudd, Blake C. Meyers¹, Hans-Werner Mewes², Klaus F. X. Mayer - Show less +2 more•Institutions (2)

Delaware Biotechnology Institute¹, Technische Universität München²

31 Mar 2006-PLOS Computational Biology

TL;DR: It is demonstrated that genes containing intragenic S/MARs are prone to pronounced spatiotemporal expression regulation and this characteristic is found to be even more pronounced for transcription factor genes.

...read moreread less

Abstract: Scaffold/matrix attachment regions (S/MARs) are essential for structural organization of the chromatin within the nucleus and serve as anchors of chromatin loop domains. A significant fraction of genes in Arabidopsis thaliana contains intragenic S/MAR elements and a significant correlation of S/MAR presence and overall expression strength has been demonstrated. In this study, we undertook a genome scale analysis of expression level and spatiotemporal expression differences in correlation with the presence or absence of genic S/MAR elements. We demonstrate that genes containing intragenic S/MARs are prone to pronounced spatiotemporal expression regulation. This characteristic is found to be even more pronounced for transcription factor genes. Our observations illustrate the importance of S/MARs in transcriptional regulation and the role of chromatin structural characteristics for gene regulation. Our findings open new perspectives for the understanding of tissue- and organ-specific regulation of gene expression.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020158•

Meta-Analysis of Differentiating Mouse Embryonic Stem Cell Gene Expression Kinetics Reveals Early Change of a Small Gene Set

[...]

Clive H. Glover¹, Michael Marin¹, Connie J. Eaves², Connie J. Eaves¹, Cheryl D. Helgason¹, Cheryl D. Helgason², James M. Piret¹, Jennifer Bryan¹ - Show less +4 more•Institutions (2)

University of British Columbia¹, BC Cancer Agency²

24 Nov 2006-PLOS Computational Biology

TL;DR: A novel meta-analysis methodology applied to multiple gene expression datasets from three mouse embryonic stem cell lines obtained at specific time points during the course of their differentiation into various lineages identifies a small set of genes whose expression is useful for identifying changes in stem cell frequencies in cultures of mouse ESC.

...read moreread less

Abstract: Stem cell differentiation involves critical changes in gene expression. Identification of these should provide endpoints useful for optimizing stem cell propagation as well as potential clues about mechanisms governing stem cell maintenance. Here we describe the results of a new meta-analysis methodology applied to multiple gene expression datasets from three mouse embryonic stem cell (ESC) lines obtained at specific time points during the course of their differentiation into various lineages. We developed methods to identify genes with expression changes that correlated with the altered frequency of functionally defined, undifferentiated ESC in culture. In each dataset, we computed a novel statistical confidence measure for every gene which captured the certainty that a particular gene exhibited an expression pattern of interest within that dataset. This permitted a joint analysis of the datasets, despite the different experimental designs. Using a ranking scheme that favored genes exhibiting patterns of interest, we focused on the top 88 genes whose expression was consistently changed when ESC were induced to differentiate. Seven of these (103728_at, 8430410A17Rik, Klf2, Nr0b1, Sox2, Tcl1, and Zfp42) showed a rapid decrease in expression concurrent with a decrease in frequency of undifferentiated cells and remained predictive when evaluated in additional maintenance and differentiating protocols. Through a novel meta-analysis, this study identifies a small set of genes whose expression is useful for identifying changes in stem cell frequencies in cultures of mouse ESC. The methods and findings have broader applicability to understanding the regulation of self-renewal of other stem cell types.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020125•

The biocurator: connecting and enhancing scientific data.

[...]

Nima Salimi, Randi Vita

27 Oct 2006-PLOS Computational Biology

TL;DR: The present emphasis on expanding computational resources, capable of managing and analyzing complex biological data, presents an ever-growing demand for biocurators capable of interpreting the increasingly complex scientific literature and extracting relevant data in an efficient, yet consistent, manner.

...read moreread less

Abstract: From Impressionism and Pop Art to phosphorylation sites and interacting atom pairs, the realm of curation has been expanded. The recent growth of bioinformatics, driven by exponentially growing data, advanced computing techniques, and increased funding from private and governmental organizations, has created the need for novel strategies to adequately capture, store, and analyze the multitude of data present in the scientific literature. To meet this challenge, the number and scope of scientific databases has soared in recent years, creating a new profession, the biocurator. Indeed, the present emphasis on expanding computational resources, capable of managing and analyzing complex biological data, presents an ever-growing demand for biocurators capable of interpreting the increasingly complex scientific literature and extracting relevant data in an efficient, yet consistent, manner.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020142•

Biocurators: contributors to the world of science.

[...]

Philip E. Bourne, Johanna McEntyre

27 Oct 2006-PLOS Computational Biology

TL;DR: This October issue pays homage to biocurators of the Immune Epitope Database and Analysis Resource (IEDB), a new resource detailing known epitopes and their immunological outcomes, through two Perspectives written by biOCurators working with different types of biological data.

...read moreread less

Abstract: Computational biology is a discipline built upon data (mostly free access), found in biological databases, and knowledge (mostly not free access), found in the literature. So important are these online sources of data that the discipline, and indeed this Journal, simply would not exist without them. Whether we are using the data in “browse mode”—doing a PubMed search, looking up a reaction in an enzymatic pathway, or in “compute mode”—analysis of a large dataset, we usually visit Web sites and download information without a second thought. Since our discipline is so dependent on the availability, extent, and quality of biological data, it is worth taking some time to think about the processes of data accessibility, annotation, and validation. These processes depend very much on biocurators—trained staff who ensure the information you are receiving is as complete and accurate as possible. Biocurators can be considered the museum catalogers of the Internet age: they turn inert and unidentifiable objects (now virtual) into a powerful exhibit from which we can all marvel and learn. That would be a decent enough contribution to the world of science, but the task of the biocurator is even more extensive. Computational biologists do not expect to merely walk through the door, cast a casual eye over the exhibit, and exit wiser (although we frequently do); we also want to add our own data to the exhibit, plus pick and choose pieces of it to take home and create new exhibits of our own. Oh, and we would like to do all these things with minimal effort, please. We can be a pretty exacting bunch of customers, and it takes skills over and above a knowledge of biology to juggle the different needs of data submitters, information seekers, and power players. “We pay homage to these special individuals who are dedicated to making our research endeavors a success.” In this October issue, we pay homage to these special individuals who are dedicated to making our research endeavors a success. We do so through two Perspectives written by biocurators working with different types of biological data. The first is by biocurators from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (PDB), a well-established biological resource of macromolecular structure data used by more than 10,000 individual scientists per day, and the second by biocurators of the Immune Epitope Database and Analysis Resource (IEDB), a new resource detailing known epitopes and their immunological outcomes. The PDB validates the quality and consistency of primary data submitted by structural biologists as a prerequisite to publication. The IEDB curates the published literature, extracting relevant facts about the epitopes discussed therein. As you read these two Perspectives, similarities and differences concerning the approaches will emerge. But more than anything, we hope you are struck by the level of professionalism and dedication that goes into helping to make the quality research articles that you read in this Journal and elsewhere. These two articles are told from the perspective of the biocurators themselves. It is only two perspectives; we certainly encourage you to send eLetters with your own perspective on biocuration, either as a curator of a different type of information, or as a person whose information has been curated, or as a consumer of information that has been curated. If you are not moved to comment, at least give a thought to the person upon whose efforts your research may well depend.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020056•

Moving Forward Moving Backward: Directional Sorting of Chemotactic Cells due to Size and Adhesion Differences

[...]

Jos Käfer¹, Paulien Hogeweg¹, Athanasius F. M. Marée¹•Institutions (1)

Utrecht University¹

09 Jun 2006-PLOS Computational Biology

TL;DR: A computational study of cell sorting caused by a combination of cell adhesion and chemotaxis, where all cells respond equally to the chemotactic signal is presented, and the occurrence of “absolute negative mobility” is demonstrated.

...read moreread less

Abstract: Differential movement of individual cells within tissues is an important yet poorly understood process in biological development. Here we present a computational study of cell sorting caused by a combination of cell adhesion and chemotaxis, where we assume that all cells respond equally to the chemotactic signal. To capture in our model mesoscopic properties of biological cells, such as their size and deformability, we use the Cellular Potts Model, a multiscale, cell-based Monte Carlo model. We demonstrate a rich array of cell-sorting phenomena, which depend on a combination of mescoscopic cell properties and tissue level constraints. Under the conditions studied, cell sorting is a fast process, which scales linearly with tissue size. We demonstrate the occurrence of “absolute negative mobility”, which means that cells may move in the direction opposite to the applied force (here chemotaxis). Moreover, during the sorting, cells may even reverse the direction of motion. Another interesting phenomenon is “minority sorting”, where the direction of movement does not depend on cell type, but on the frequency of the cell type in the tissue. A special case is the cAMP-wave-driven chemotaxis of Dictyostelium cells, which generates pressure waves that guide the sorting. The mechanisms we describe can easily be overlooked in studies of differential cell movement, hence certain experimental observations may be misinterpreted.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020028•

Correction: New Maximum Likelihood Estimators for Eukaryotic Intron Evolution

[...]

Hung D. Nguyen, Maki Yoshihama, Naoya Kenmochi

01 Mar 2006-PLOS Computational Biology

TL;DR: The URL provided for the GPCR model database in the published article is no longer active and is now located at http://cssb.biology.gatech.edu/skolnick/files/gpcr/GPcr.html.

...read moreread less

Abstract: Correction: Structure Modeling of All Identified G Protein–Coupled Receptors in the Human Genome Yang Zhang, Mark E. DeVries, Jeffrey Skolnick DOI: 10.1371/journal.pcbi.0020013 In PLoS Computational Biology, volume 2, issue 2: The URL provided for the GPCR model database in the published article is no longer active. The database is now located at http://cssb.biology.gatech.edu/skolnick/files/gpcr/gpcr.html.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020091•

The Ion Channel Inverse Problem: Neuroinformatics Meets Biophysics

[...]

Robert C. Cannon, Giampaolo D'Alessandro

25 Aug 2006-PLOS Computational Biology

TL;DR: The current state of channel modeling is reviewed and the developments needed for its conclusions to be integrated into whole-cell modeling are explored.

...read moreread less

Abstract: Ion channels are the building blocks of the information processing capability of neurons: any realistic computational model of a neuron must include reliable and effective ion channel components. Sophisticated statistical and computational tools have been developed to study the ion channel structure–function relationship, but this work is rarely incorporated into the models used for single neurons or small networks. The disjunction is partly a matter of convention. Structure–function studies typically use a single Markov model for the whole channel whereas until recently whole-cell modeling software has focused on serial, independent, two-state subunits that can be represented by the Hodgkin–Huxley equations. More fundamentally, there is a difference in purpose that prevents models being easily reused. Biophysical models are typically developed to study one particular aspect of channel gating in detail, whereas neural modelers require broad coverage of the entire range of channel behavior that is often best achieved with approximate representations that omit structural features that cannot be adequately constrained. To bridge the gap so that more recent channel data can be used in neural models requires new computational infrastructure for bringing together diverse sources of data to arrive at best-fit models for whole-cell modeling. We review the current state of channel modeling and explore the developments needed for its conclusions to be integrated into whole-cell modeling.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020042•

Designing a Nanotube Using Naturally Occurring Protein Building Blocks

[...]

Chung-Jung Tsai¹, Jie Zheng¹, Ruth Nussinov², Ruth Nussinov¹•Institutions (2)

Science Applications International Corporation¹, Tel Aviv University²

28 Apr 2006-PLOS Computational Biology

TL;DR: This work proposes a strategy for the very first step in protein nanotube design: map the candidate building blocks onto a planar sheet and wrap the sheet around a cylinder with the target dimensions.

...read moreread less

Abstract: Here our goal is to carry out nanotube design using naturally occurring protein building blocks. Inspection of the protein structural database reveals the richness of the conformations of proteins, their parts, and their chemistry. Given target functional protein nanotube geometry, our strategy involves scanning a library of candidate building blocks, combinatorially assembling them into the shape and testing its stability. Since self-assembly takes place on time scales not affordable for computations, here we propose a strategy for the very first step in protein nanotube design: we map the candidate building blocks onto a planar sheet and wrap the sheet around a cylinder with the target dimensions. We provide examples of three nanotubes, two peptide and one protein, in atomistic model detail for which there are experimental data. The nanotube models can be used to verify a nanostructure observed by low-resolution experiments, and to study the mechanism of tube formation.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020121•

Ten Simple Rules for Selecting a Postdoctoral Position

[...]

Philip E. Bourne, Iddo Friedberg

24 Nov 2006-PLOS Computational Biology

TL;DR: Here are ten simple rules to help you make the best decisions on a research project and the laboratory in which to carry it out.

...read moreread less

Abstract: You are a PhD candidate and your thesis defense is already in sight. You have decided you would like to continue with a postdoctoral position rather than moving into industry as the next step in your career (that decision should be the subject of another “Ten Simple Rules”). Further, you already have ideas for the type of research you wish to pursue and perhaps some ideas for specific projects. Here are ten simple rules to help you make the best decisions on a research project and the laboratory in which to carry it out.

...read moreread less

Journal Article•10.1371/JOURNAL.PCBI.0020047•

Dynamic changes in subgraph preference profiles of crucial transcription factors.

[...]

Zhihua Zhang¹, Changning Liu¹, Geir Skogerbø¹, Xiaopeng Zhu¹, Hongchao Lu¹, Lan Chen¹, Baochen Shi¹, Yong Zhang¹, Jie Wang¹, Tao Wu¹, Runsheng Chen¹ - Show less +7 more•Institutions (1)

Chinese Academy of Sciences¹

12 May 2006-PLOS Computational Biology

TL;DR: Normalized abundances of basic regulatory patterns of individual THubs in the yeast Saccharomyces cerevisiae transcriptional regulation network under five different cellular states and environmental conditions suggest switching of regulatory pattern preferences suggests that a change in conditions does not only elicit achange in response by the regulatory network, but also in the mechanisms by which the response is mediated.

...read moreread less

Abstract: Transcription factors with a large number of target genes—transcription hub(s), or THub(s)—are usually crucial components of the regulatory system of a cell, and the different patterns through which they transfer the transcriptional signal to downstream cascades are of great interest. By profiling normalized abundances (AN) of basic regulatory patterns of individual THubs in the yeast Saccharomyces cerevisiae transcriptional regulation network under five different cellular states and environmental conditions, we have investigated their preferences for different basic regulatory patterns. Subgraph-normalized abundances downstream of individual THubs often differ significantly from that of the network as a whole, and conversely, certain over-represented subgraphs are not preferred by any THub. The THub preferences changed substantially when the cellular or environmental conditions changed. This switching of regulatory pattern preferences suggests that a change in conditions does not only elicit a change in response by the regulatory network, but also a change in the mechanisms by which the response is mediated. The THub subgraph preference profile thus provides a novel tool for description of the structure and organization between the large-scale exponents and local regulatory patterns.

...read moreread less