Top 14 papers published in the topic of Biological data in 1995

Showing papers on "Biological data published in 1995"

Journal Article•10.1089/CMB.1995.2.557•

Challenges in integrating biological data sources.

[...]

Susan B. Davidson¹, G. Christian Overton, Peter Buneman•Institutions (1)

01 Jan 1995-Journal of Computational Biology

TL;DR: The technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies are surveyed, to counter the increasing dispersion and heterogeneity of data.

...read moreread less

Abstract: Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

...read moreread less

218 citations

Other•10.1002/9783527615452.CH4•

Multivariate Data Analysis of Chemical and Biological Data

[...]

Rainer Franke, Andreas Gruska, James Devillers, Daniel Chessel, William J. Dunn, Svante Wold, Paul J. Lewi, Martyn G. Ford, David W. Salt, Han van de Waterbeemd, James W. McFarland, Daniel J. Gans - Show less +8 more

9 Feb 1995

42 citations

Proceedings Article•

DNA sequence assembly and genetic algorithms new results and puzzling insights.

[...]

Rebecca Parsons¹, Mark E. Johnson¹•Institutions (1)

University of Central Florida¹

1 Jan 1995

TL;DR: Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings.

...read moreread less

Abstract: Applying genetic algorithms to DNA sequence assembly is not a straightforward process. Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings. Specifically, the solution time for a 10kb data set was reduced by an order of magnitude, and a 20kb data set that was previously unsolved by the genetic algorithm was solved in a time that represents only a linear increase from the 10kb data set. Additionally, significant progress has been made on a 35kb data set representing real biological data. A single contig solution was found for a 752 fragment subset of the data set, and a 15 contig solution was found for the full data set. This paper discusses the new results, the modifications to the previous genetic algorithm used in this study, the experimental design process by which the new results were obtained, the questions raised by these results, and some preliminaxy attempts to explain these results.

...read moreread less

25 citations

Journal Article•10.1016/0378-1119(95)00636-K•

Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype

[...]

Claudine Médigue¹, Ivan Moszer¹, Alain Viari², Antoine Danchin¹•Institutions (2)

Pasteur Institute¹, Curie Institute²

01 Jan 1995-Gene

TL;DR: The analysis of a B. subtilis genome fragment allowed us to combine the results of several methods used for predicting coding sequences, and to characterize it as comprising a cryptic phage, the skin element, indicated that local features of the nucleotide sequence could discriminate between phage and non-phage DNA sequence.

...read moreread less

23 citations

Journal Article•10.1006/IJHC.1995.1026•

A probabilistic approach to determining biological structure: integrating uncertain data sources

[...]

Russ B. Altman¹•Institutions (1)

Stanford University¹

01 Jun 1995-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: A Bayesian approach for determining the coordinates of atoms in a three-dimensional space is developed and how to extend the algorithm to make it suitable for non-Gaussian constraints is described.

...read moreread less

Abstract: Modeling the structure of biological molecules is critical for understanding how these structures perform their function, and for designing compounds to modify or enhance this function (for medicinal or industrial purposes). The determination of molecular structure involves defining three-dimensional positions for each of the constituent atoms using a variety of experimental, theoretical and empirical data sources. Unfortunately, each of these data sources can be noisy or not available in sufficient abundance to determine the precise position of each atom. Instead, some atomic positions are precisely defined by the data, and others are poorly defined. An understanding of structural uncertainty is critical for properly interpreting structural models. We have developed a Bayesian approach for determining the coordinates of atoms in a three-dimensional space. Our algorithm takes as input a set of probabilistic constraints on the coordinates of the atoms, and an a priori distribution for each atom location. The output is a maximum a posteriori (MAP) estimate of the location of each atom. We introduce constraints as updates to the prior distributions. In this paper, we describe the algorithm and show its performance on three data sets. The first data set is synthetic and illustrates the convergence properties of the method. The other data sets comprise real biological data for a protein (the trp repressor molecule) and a nucleic acid (the transfer RNA fold). Finally, we describe how we have begun to extend the algorithm to make it suitable for non-Gaussian constraints.

...read moreread less

16 citations

Proceedings Article•

Cooperative computer system for genome sequence analysis

[...]

Claudine Médigue¹, Thierry Vermat, Gilles Bisson, Alain Viari, Antoine Danchin - Show less +1 more•Institutions (1)

Pasteur Institute¹

1 Jan 1995

TL;DR: The prototype of a software system that provides an environment for the analysis of large-scale sequence data and an overview of the knowledge-based models used to build this integrated system is presented.

...read moreread less

Abstract: Analysis of the huge volumes of data generated by large scale sequencing projects clearly requires the construction of new sophisticated computer systems. These systems should be able to handle the biological data as well as the results of the analysis of this data. They should also help the user to choose the most appropriate method for a simple task and to string together the methods needed to solve a global analysis task. In this paper we present the prototype of a software system that provides an environment for the analysis of large-scale sequence data. In a first approach this environment has been put to the test within the B. subtilis sequencing projecL This system integrates both a descriptive knowledge of the entities involved (genes, regulatory signals etc.) and the methodological knowledge concerning an extendable set of analytical methods (i.e. how to solve sequence analysis problem through task decomposition and method selection). A knowledge representation based on two existing object-orianted models, named Shirka and SCARP, is used to implement this integrated system. In addition, the present prototype provides a suitable user interface for both displaying the results generated by several methods and interscting with the objects. We present in rids paper an overview of the knowledge-based models used to build this integrated system, and a description of the way in which biological entities and sequence analysis tasks are represented. We give illustrations of the co-operation between user and system during the problem solving process. Such a system constitutes a computer workbench for molecular biologists studying the genetic programs of living organisms.

...read moreread less

12 citations

Biological assessment methods: controlling the quality of biological data. Package 1: The variability of data used for assessing the biological condition of rivers

[...]

Mike T. Furse, R.T. Clarke, J.M. Winder, K.L. Symes, J.H. Blackburn, N.J. Grieve, R.J.M. Gunn - Show less +3 more

1 Apr 1995

7 citations

Journal Article•10.1093/BIOINFORMATICS/11.4.339•

Issues in incorporation semantic integrity in molecular biological object-oriented databases

[...]

Sabine Schweigert, Patrick V.G. Herde¹, Peter R. Sibbald¹•Institutions (1)

European Bioinformatics Institute¹

01 Aug 1995-Bioinformatics

TL;DR: It is concluded that object-oriented technology will support semantic checking even in a complex domain like biology, and proposes 10 guidelines for future work including ways of treating exceptional cases and 'positioning' of constraints in a schema.

...read moreread less

Abstract: Issues critical to ensuring semantic integrity in molecular biological data collections have been identified and include complexity, exceptions, missing data, changing models, holism and integration, delocalized data, interoperability and nomenclature. This combination is peculiar to biology and presents some interesting problems as a result. Little is known about semantic checking in object-oriented databases in general, but because such technology appears highly suitable for modeling biological data, it is appropriate to examine the ways in which object-oriented technology can support this functionality. It is concluded that object-oriented technology will support semantic checking even in a complex domain like biology. We propose 10 guidelines for future work including ways of treating exceptional cases and 'positioning' of constraints in a schema.

...read moreread less

6 citations

Book Chapter•10.1007/3-540-60321-2_15•

Parallel Processing in DNA Analysis

[...]

Charles R. Cantor¹, Takeshi Sano¹, Natalie E. Broude¹, Cassandra L. Smith¹•Institutions (1)

Boston University¹

4 Sep 1995

TL;DR: The reader is introduced to the general properties of DNA and the currently used methods for manipulating and studying DNA molecules and how DNA properties allow parallel strategies to be used for direct biological experiments, and on possible future applications which go beyond pure biological applications.

...read moreread less

Abstract: Potential analogies exist between the way computers process and analyze data and the way data is handled in biological systems. Biological data is predominantly in the form of DNA. DNA data is at least two-fold redundant, and it is often multi-fold redundant. Thus it is relatively error resistant. A key aspect of DNA data is that the sequence of DNA bases which is the information stored in DNA also provides a way for the specific purification of DNA subsets. Thus, DNA, in principle, can be handled as very complex mixtures of species with the ability to sort things out afterwards. As a result, DNA-based manipulations can sometimes be formulated into highly parallel strategies. In this paper we will introduce the reader to the general properties of DNA and the currently used methods for manipulating and studying DNA molecules. We will emphasize how DNA properties allow parallel strategies to be used for direct biological experiments, and we will also speculate on possible future applications of such strategies which go beyond pure biological applications.

...read moreread less

5 citations

Journal Article•10.1007/BF01074396•

Biological and social determinants of body size across the life span: a model for the integration of population genetics and demography.

[...]

Toni P. Miles¹, Christine L. Himes¹•Institutions (1)

Pennsylvania State University¹

01 Sep 1995-Population Research and Policy Review

TL;DR: The biology of adult body size, its behavior as a variable in statistical analyses, and strategies for the incorporation of this variable into demographic models of population aging in the United States are discussed.

...read moreread less

Abstract: The accuracy of demographic models designed to project future trends of population-level health and disease can be improved by incorporating biological data. One barrier to this process are quantitative characteristics of the data themselves. Biological data are characteristically time-dependent phenomena that behave in a nonlinear fashion. To develop accurate projections of the morbidity, disability, and mortality experience among future cohorts in late life, research needs to focus on development of models that create the opportunity to distinguish all-or-none, boundaries, and latency aspects of biological factors driving demographic phenomena, development of methods to identify time-dependent effects, and development of genetically informative samples. This presentation focuses on the biology of adult body size, its behavior as a variable in statistical analyses, and strategies for the incorporation of this variable into demographic models of population aging in the United States. First, several examples of generally observed quantitative characteristics of biological variables are reviewed. To illustrate the nonlinear character of biological data, three general patterns of change with aging are presented. Next, issues concerning the measurement of body size are discussed. Scenarios describing body size over the adult life span are described. By the end of this process, recommendations for starting a dialogue between researchers interested in biological endpoints (individual weight change, disease risk) and those interested in demographic outcomes (population-level disease and disability issues) using body size will be presented.

...read moreread less

3 citations

Journal Article•10.1007/BF02228814•

Biological systems: Stochastic, deterministic or both

[...]

Ferenc Czegledy¹, Jose Katz¹•Institutions (1)

Columbia University¹

01 Jun 1995-Open Systems & Information Dynamics

TL;DR: Many systems in nature, including biological systems, have very complex dynamics which generate random-looking time series, and it is often of interest to determine whether the system is caused by deterministic subsystems (e.g. chaotic systems), stochastic subsystems, or both.

...read moreread less

Abstract: Many systems in nature, including biological systems, have very complex dynamics which generate random-looking time series. To better understand a particular dynamical system, it is often of interest to determine whether the system is caused by deterministic subsystems (e.g. chaotic systems), stochastic subsystems, or both. Although there are now several different approaches to determine this from time series data (e.g. correlation dimension and Lyapunov exponent calculations), these methods often require large amounts of stationary data (biological data is frequently nonstationary for long time scales), can often mis-identify certain systems, and can be subject to other technical problems. Alternatively, one can use methods that measure the complexity in a particular system which seldom make assumptions about a particular system, such as assuming the presence of stationarity. Additionally, mathematical and computational modeling techniques can be used to test different hypothesis about the dynamics of biological systems.

...read moreread less

Proceedings Article•10.5555/832271.833831•

Case study: using spatial access methods to support the visualization of environmental data

[...]

Charles Falkenberg¹, Ravi Kulkarni¹•Institutions (1)

University of Maryland, College Park¹

29 Oct 1995

TL;DR: The spatial indexing features of the Illustra(tm) object-relational database management system are linked with the visualization capabilities of AVS to create an interactive environment for analysis of SEA data.

...read moreread less

Abstract: As part of a large effort evaluating the effect of the Exxon Valdez oil spill, we are using the spatial selection features of an object-relational database management system to support the visualization of the ecological data. The effort, called the Sound Ecosystem Assessment project (SEA), is collecting and analyzing oceanographic and biological data from Prince William Sound in Alaska. To support visualization of the SEA data we are building a data management system which includes a spatial index over a bounding polygon for all of the datasets which are collected. In addition to other selection criteria the prototype provides several methods for selecting data within an arbitrary region. This case study presents the requirements and the implementation for the application prototype which combines visualization and database technology. The spatial indexing features of the Illustra(tm) object-relational database management system are linked with the visualization capabilities of AVS to create an interactive environment for analysis of SEA data.

...read moreread less

A model for the integration of population genetics and demography

[...]

Toni P. Miles, Christine L. Himes

1 Jan 1995

...read moreread less

Abstract: The accuracy of demographic models designed to project future trends of population- level health and disease can be improved by incorporating biological data. One barrier to this process are quantitative characteristics of the data themselves. Biological data are characteristi- cally time-dependent phenomena that behave in a nonlinear fashion. To develop accurate projections of the morbidity, disability, and mortality experience among future cohorts in late life, research needs to focus on development of models that create the opportunity to distinguish all-or-none, boundaries, and latency aspects of biological factors driving demographic phenom- ena, development of methods to identify time-dependent effects, and development of genetically informative samples. This presentation focuses on the biology of adult body size, its behavior as a variable in statistical analyses, and strategies for the incorporation of this variable into demographic models of population aging in the United States. First, several examples of gen- erally observed quantitative characteristics of biological variables are reviewed. To illustrate the nonlinear character of biological data, three general patterns of change with aging are presented. Next, issues concerning the measurement of body size are discussed. Scenarios describing body size over the adult life span are described. By the end of this process, recommendations for starting a dialogue between researchers interested in biological endpoints (individual weight change, disease risk) and those interested in demographic outcomes (population-level disease and disability issues) using body size will be presented.

...read moreread less

Proceedings Article•

A Data Transformation System for Biological Data Sources

[...]

Peter Buneman, Susan B. Davidson¹, Kyle D. Hart¹, G. Christian Overton¹, Limsoon Wong² - Show less +1 more•Institutions (2)

University of Pennsylvania¹, National University of Singapore²

11 Sep 1995

TL;DR: Techniques for querying and transforming scientific data in structured files maintained in a number of different formats are presented and their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22.

...read moreread less

Abstract: Scientific data of importance to biologists in the Humitn Genome Project resides not only in conventional da.tabases, but in structured files maintained in a number of different formats (e.g. ASN.1 a.nd ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data.

...read moreread less