TL;DR: A real-time biological data processing PC card as mentioned in this paper is very lightweight, cost effective, and portable, and it is capable of converting a host personal computer system (27) into a powerful diagnostic instrument.
Abstract: A real-time biological data processing PC card is very lightweight, cost effective, and portable. The real-time biological data processing PC card is capable of converting a host personal computer system (27) into a powerful diagnostic instrument. Each real-time biological data processing PC card is adapted to input and process biological data from one or more biological sensors (21), and is interchangeable with other real-time biological data processing PC cards. A practitioner having three different biological data collection devices, effectively carries three full sized, powerful diagnostic instruments. The full resources of a host personal computer (27) can be utilized and converted into a powerful diagnostic instrument, for each biological data collection device by the insertion of one of the real-time biological data processing PC cards.
TL;DR: This paper reviews how the stochastic nature, effective size, and the compartmentalization of genetic networks as well as the information content of gene expression matrices will influence the ability to perform successful reverse engineering.
Abstract: Complementary DNA microarray and high density oligonucleotide arrays opened the opportunity for massively parallel biological data acquisition. Application of these technologies will shift the emphasis in biological research from primary data generation to complex quantitative data analysis. Reverse engineering of time-dependent gene-expression matrices is amongst the first complex tools to be developed. The success of reverse engineering will depend on the quantitative features of the genetic networks and the quality of information we can obtain from biological systems. This paper reviews how the (1) stochastic nature, (2) the effective size, and (3) the compartmentalization of genetic networks as well as (4) the information content of gene expression matrices will influence our ability to perform successful reverse engineering.
TL;DR: Self Organising Map can be used for the search of new leads among available databases and the exploration of new structural domains for a given biological activity, as well as for the study of the overlapping of two databases.
Abstract: Ahstract. Self Organising Map (SOM). also known as Kohonen Neural Network, is tested as a non supervised procedure for comparing molecular databases. Each chemical compound being represented by a point in the hyperspace of the molecular descriptors. SOMs was used to reflect the multidimensional hyperspace onto a two dimensional (2D) map while preserving the order of distances between the points, but in a non linear way. The aim of this work was to apply SOM to the study of the overlapping of two databases in order to obtain information about the extent of their differences in regard to their molecular diversity. Firstly. the ability of SOM to discriminate between two virtual databases was investigated. The positions of these two virtual databases were made to vary from non-overlappin g to overlapping ones. In any considered eases, all the individuals of these two databases are processed simultaneously to give one SOM. From this map it is possible to analyse and understand the structure of the original data. Secondly two chemical databases are compared. The first chemical database deals with the commercially available organophosphorous pesticides (OPC). the second one deals with more than two thousand OPC tested as potent pesticides. Given the biological data known for each compound, the second database was shown to bring an interesting supplement to the structural information nested in the first database taken as a reference. Furthermore, the results obtained indicate that SOM can be used for the search of new leads among available databases and the exploration of new structural domains for a given biological activity.
TL;DR: Capabilities of KNN in data dimensionality reduction are presented as compared with the capabilities of Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA).
Abstract: Automated data classification is an indispensable tool in Drug Design. It allows to select homogeneous training sets or to distinguish compounds with required biological properties. The Kohonen Neural Networks (KNN) suggest new means for classification of biologically interesting compounds. In this paper, first, capabilities of KNN in data dimensionality reduction are presented as compared with the capabilities of Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA). The advantages of KNN become evident with increasing data dimensionality and size of the training set. Then, new methods are suggested to evaluate the quality of KNN models. Finally, a case study on chemical and biological data is presented. The database studied includes more than 2000 organophosphorous potent pesticides. The Kohonen maps were obtained which allow to distinguish compounds with different biological behaviour.
TL;DR: In this article, the authors proposed a method to continuously manage the health condition of a person to be examined without paying a particular attention in daily life by providing a presentation means for judging the health conditions of the person from the compared result of present determined biological data and stored biological data in the part and presenting the discriminated result.
Abstract: PROBLEM TO BE SOLVED: To continuously manage the health condition of a person to be examined without paying a particular attention in a daily life by providing a presentation means for judging the health condition of the person to be examined from the compared result of present determined biological data and stored biological data in the part and presenting the discriminated result. SOLUTION: When a person 11 to be examined appears in front of a washstand 12, by detecting the load applied to a floor through a load measuring part 14, the existence of the person 11 is detected and reported to a measurement start/end discriminating part 18, and an operation starting instruction is outputted to respective components. Thus, load and biological data measuring parts 14 and 15 start processing to measure the biological data of the person 11, and the measured biological data are collected into a data collecting part 1 and stored in a data storage part 20. Then, the biological data at present and in the past outputted from the data collecting part are compared, a data discriminating part 21 discriminates the health condition of the person, and a discriminated result output part 22 outputs the discriminated result to image and sound presenting parts 23 and 24 and presents it for the person 11. COPYRIGHT: (C)1999,JPO
TL;DR: MUSCA is a two-stage approach to the alignment problem by identifying two relatively simpler sub-problems whose solutions are used to obtain the alignment of the sequences and introduces the the notion of an alignment number K (2 = K = N), a user-controlled parameter, that lends a useful flexibility to the aligning program.
Abstract: Given a set of N sequences, the Multiple Sequence Alignment problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. MUSCA is a two-stage approach to the alignment problem by identifying two relatively simpler sub-problems whose solutions are used to obtain the alignment of the sequences. We first discover motifs in the N sequences and then extract an appropriate subset of compatible motifs to obtain a good alignment. The motifs of interest to us are the irredundant motifs which are only polynomial in the input size. In practice, however, the number is much smaller (sub-linear). Notice that this step aids in a direct N-wise alignment, as opposed to composing the alignments from lower order (say pairwise) alignments and the solution is also independent of the order of the input sequences; hence the algorithm works very well while dealing with a large number of sequences. The second part of the problem that deals with obtaining a good alignment is solved using a graph-theoretic approach that computes an induced subgraph satisfying certain simple constraints. We reduce a version of this problem to that of solving an instance of a set covering problem, thus offer the best possible approximate solution to the problem (provided P not equalNP). Our experimental results, while being preliminary, indicate that this approach is efficient, particularly on large numbers of long sequences, and, gives good alignments when tested on biological data such as DNA and protein sequences. We introduce the the notion of an alignment number K (2 = K = N), a user-controlled parameter, that lends a useful flexibility to the aligning program: this additional requirement constrains the alignment to have at least K sequences agree on a character, whenever possible, in the alignment. The usefulness of the alignment number is corroborated by the users who view this as a natural constraint while dealing with a large number of sequences.
TL;DR: The new wealth of biological data generated by ongoing genome projects is being used to develop database tools for biologists, which can then be interpreted from many viewpoints–from molecular interactions to interactions among organisms.
TL;DR: This work defines a Data Learning Process (DLP), a formalization aimed at facilitating knowledge discovery in biological databases, which comprises a series of steps for comprehension of biological data within the bioinformatics framework.
Abstract: The four most important data-related considerations for the bioinformatic analysis of biological systems are understanding of: the complexity and hierarchical nature of processes that generate biological data, fuzziness of biological data, biases and potential misconceptions in data, and the effects of noise and errors. We discuss these issues and summarize our findings by defining a Data Learning Process (DLP). DLP comprises a series of steps for comprehension of biological data within the bioinformatics framework. DLP is a formalization aimed at facilitating knowledge discovery in biological databases.
TL;DR: This work shows how prior domain knowledge can be used in a system for mining databases of biological data and shows that the patterns derived by this fully automated system compete well with the semi-manually constructed patterns.
Abstract: We show how prior domain knowledge can be used in a system for mining databases of biological data. Our system performs automated discovery of diagnostic patterns from a database of protein sequences. Such patterns are used for classification of new sequences, and identification of biologically interesting positions in the proteins. The patterns have a simple syntax and can be translated into regular expressions, which can be used for rapid scanning of databases. Current pattern libraries are built semi-manually, since the correctness of the pattern depends on the incorporation of domain knowledge. Due to the dramatic growth of the databases it is desirable to automate this process. Our results show that the patterns derived by our fully automated system compete well with the semi-manually constructed patterns.
TL;DR: This chapter describes the use of graphical probability models (Bayes' networks) for modeling biological data, such as proteins and DNA sequences, which allow biologists to postulate generative processes that describe biological structures.
Abstract: This chapter describes the use of graphical probability models (Bayes' networks) for modeling biological data, such as proteins and DNA sequences. Bayes' networks (or probabilistic networks) are graphical models of probability distributions that can provide a convenient medium for scientists. Probabilistic networks provide a convenient medium for scientists to experiment with different empirical models and obtain potentially important insights into the problems being studied. Generally, probabilistic graphical models provide support for the following two important capabilities: (1) learning, with a framework for specifying the important probabilistic dependencies to capture the data and (2) inference, with a computational framework for combining these conditional probabilities using algorithms that take advantage of the graphical properties of the model to simplify and speed-up computation. Models generated by probabilistic methods have a precise, quantitative semantics that is easy to interpret and translate to biological rules. The causal connections in probabilistic network models allow biologists to postulate generative processes that describe biological structures, such as helical regions in proteins or coding regions in DNA.
TL;DR: The objective of this paper is to find some complementary method to existing methods that will improve accuracy at inference of phylogenetic trees when they are applied together.
Abstract: Numerous methods have been invented to reconstruct phylogenetic trees. However, criteria used here may not necessarily suitable in actual biological data. On the other hand, if we obtain the same result from methods based on different criteria, the inference can be considered to be robust and reliable. Therefore, some complementary method to existing methods will improve accuracy at inference of phylogenetic trees when they are applied together. The objectives in this paper are as follows:
TL;DR: In this article, an input biological data buffer 241 temporarily stores input collating organismic data and then, inputs the stored organismsic data to an input data evaluation part 244, if the evaluation result of the part 244 is smaller than a prescribed threshold, the organisms data are judged as the ineffective data, while the orgnismic data are outputted to a data collation part 245 as the effective data if the organisms' data are larger than the prescribed threshold.
Abstract: PROBLEM TO BE SOLVED: To perform the registration and collation even to a user who may possibly be rejected by changing a collation method according to the evaluation value of dictionary data contained in the registered data which are read out of a storage means when the dictionary data are collated with collating organismic data. SOLUTION: An input biological data buffer 241 temporarily stores input collating organismic data and then, inputs the stored organismic data to an input data evaluation part 244. If the evaluation result of the part 244 is smaller than a prescribed threshold, the organismic data are judged as the ineffective data. Meanwhile, the orgnismic data are outputted to a data collation part 245 as the effective data if the organismic data are larger than the prescribed threshold. The part 245 collates the inputted organismic data with the dictionary data stored in a dictionary data buffer 243 and the numerical value of this collation result is compared with the prescribed threshold by a collation result decision part 246. Based on the comparison result obtained by the part 246, the number of collation times is changed.
TL;DR: An integrated method was developed to assess the ecological integrity of an urban watershed, and to evaluate the cumulative impacts of physical habitat degradation and metal-contaminated habitats upon its fish and macroinvertebrates as mentioned in this paper.
Abstract: An integrated method was developed to assess the ecological integrity of an urban watershed, and to evaluate the cumulative impacts of physical habitat degradation and metal-contaminated habitats upon its fish and macroinvertebrates. Sixteen sites in this watershed (the Aberjona watershed, in eastern Massachusetts) and 4 ecoregional reference sites (sites chosen to represent minimally-impaired conditions) were assessed. An innovative approach was used to evaluate metal contamination of macroinvertebrate epifaunal habitats (i.e., submerged vegetation, vegetation-covered rocks, overhanging bank vegetation, and undercut bank roots). Higher concentrations of As, Cr, Cu, Pb and Zn were measured in epifaunal habitats than in fine-grained sediments traditionally collected for risk assessments, and the ranking of sites by contamination level depended upon which sample type was used. As predicted, the biological condition of fish and macroinvertebrate assemblages from minimally-contaminated sites varied from good to poor as habitat condition varied from good to poor. Biological degradation beyond that attributable to habitat degradation alone was observed, as predicted, at sites contaminated in excess of MacDonald's (1992) Expected Risk Median values. When the number of native fish species caught at a site was used as the measure of biological condition, and habitat condition was a simple function of stream depth and instream cover score, this predicted relationship was observed. Contamination of epifaunal habitats by As and Cr was reflected in elevated concentrations in whole white suckers. Twelve benthic metrics were used to characterize the biological condition of macroinvertebrate assemblages and to relate biological impairment to chemical and physical degradation (using linear regression and t -tests comparing site categories). Similar levels of biological impairment were observed at sites with severe physical or chemical degradation, or moderate chemical and physical degradation. Individual benthic metrics were not diagnostic of impairment type. An aggregate macroinvertebrate index was developed that was more sensitive to chemical and physical degradation than any individual metric, illustrating the strength of a multimetric index approach for detecting the cumulative impacts of physical habitat degradation and chemical contamination. The method developed for the dissertation has a variety of applications for environmental protection programs and for ecological risk assessments. Thesis Supervisor: Harold F. Hemond William E. Leonhard Professor of Civil and Environmental Engineering Thesis Supervisor: David H. Marks James Mason Crafts Professor of Civil and Environmental Engineering
TL;DR: In this article, a remote data acquisition and system for central processing and storage is disclosed under the name of data Treasury system, which supports the processing of electronic documents and data associated with other devices, including marketing, business, banking and general consumer transactions globally.
Abstract: A remote data acquisition and system for central processing and storage is disclosed under the name of data Treasury system. Treasury data system supports the processing of electronic documents and data associated with other devices, including marketing, business, banking and general consumer transactions globally. The system searches the transaction data, such as electronic or credit card receipt paper form from one or more remote locations, encrypting data, and sends the encrypted data to a central location, convert the data into a useful form, and signature data and identify biological data, and sends the generated information and reports and information from the report data to a remote location. Data Treasury system has a number of advantages that work together to provide a high performance, security, reliability, low error, and low cost. First, the network architecture facilitates secure communication between the remote location and the central processing equipment. Dynamic address assignment algorithm corrects the load balancing server system for speed and ease of use. Finally, the segmentation method is to improve the error correction process.
TL;DR: This work illustrates a user-friendly protocol using a common question frequently faced by a wet-lab bench-biologist--"Now that I have a DNA or protein sequence, what can I do with it using a computer?"
Abstract: Until the recent advent of high-throughput experimental data-acquisition in biology, the computational analysis of the biological data was predominantly on an ad hoc basis--i.e., the application of a given piece of software on the biological data depended on the need of the moment. This "functional approach" often resulted in piecemeal computational analysis with large amount of intervening "dead-time". The present high-throughput availability of experimental biological data requires a more streamlined and integrated "protocol approach". In this work, we illustrate such a user-friendly protocol using a common question frequently faced by a wet-lab bench-biologist--"Now that I have a DNA or protein sequence, what can I do with it using a computer?" As phrased, this question is steeped in the functional approach. In contrast, the protocol approach would re-phrase the same question as "Now that I have a DNA or protein sequence, what can a computer do for me?" Our integrating tool can start with a sequence and build a substantial custom data-warehouse of computationally derived sequence information, structure information and relevant published literature, that is continually updated.
TL;DR: A real-time biological data processing PC card as discussed by the authors is very lightweight, cost effective, and portable, and it is capable of converting a host personal computer system (27) into a powerful diagnostic instrument.
Abstract: A real-time biological data processing PC card is very lightweight, cost effective, and portable. The real-time biological data processing PC card is capable of converting a host personal computer system (27) into a powerful diagnostic instrument. Each real-time biological data processing PC card is adapted to input and process biological data from one or more biological sensors (21), and is interchangeable with other real-time biological data processing PC cards. A practitioner having three different biological data collection devices, effectively carries three full sized, powerful diagnostic instruments. The full resources of a host personal computer (27) can be utilized and converted into a powerful diagnostic instrument, for each biological data collection device by the insertion of one of the real-time biological data processing PC cards.
TL;DR: A data model, based on existing software used by culture collections that includes links to networked data resources, is presented, which will serve not only individual laboratories but, importantly, the global biohydrogen community.
Abstract: Access to computer technology is so widespread that few biologists do not make use of automated systems for managing laboratory data. The growth of the Internet and the development of software tools that facilitate management and dissemination of data in a networked environment have resulted in the proliferation of biological data resources that are now available to researchers throughout the world. The biohydrogen community is no exception, in that data relevant to hydrogen-producing microorganisms are available through numerous resources, many of which may be accessed via the World Wide Web. From genomics and sequence databases to culture collections and specialized data, the hydrogen producers are well-represented. The challenge is to develop a platform on which these data can be merged into a working tool that will meet the various requirements of a broad range of potential users. Environmental conditions, media/nutrient requirements, biochemical characteristics, metabolic pathways, genomics, and industrial process modeling are all aspects of a knowledge base that will be required to track the growing body of information about these organisms and the distinguishing characteristics that may result in the production of energy sources to improve the environment. Development of a working data management tool to serve the biohydrogen community will require adherence to accepted technical and semantic standards. The interdependence of information in disparate databases must be recognized and support for integrated queries involving multiple databases must be maintained. The disparity in data resources relevant to hydrogen producers may be more acute than for other microorganisms because several distinct taxa are represented and will be studied from a functional, rather than a taxonomic, viewpoint. This requires an interdisciplinary approach to data management problems and will require cooperation from a social and technical standpoint. A data model for the biohydrogen community, based on existing software used by culture collections that includes links to networked data resources, is presented. Such a system will serve not only individual laboratories but, importantly, the global biohydrogen community.