TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.
Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described
TL;DR: This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models, and presents a uniied view of the topic by putting experimental design in a decision theoretic framework.
Abstract: This paper reviews the literature on Bayesian experimental design. A unified view of this topic is presented, based on a decision-theoretic approach. This framework casts criteria from the Bayesian literature of design as part of a single coherent approach. The decision-theoretic structure incorporates both linear and nonlinear design problems and it suggests possible new directions to the experimental design problem, motivated by the use of new utility functions. We show that, in some special cases of linear design problems, Bayesian solutions change in a sensible way when the prior distribution and the utility function are modified to allow for the specific structure of the experiment. The decision-theoretic approach also gives a mathematical justification for selecting the appropriate optimality criterion.
TL;DR: This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.
Abstract: For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.
TL;DR: A unified theory of neighborhood filters and reliable criteria to compare them to other filter classes are presented and it will be demonstrated that computing trajectories and restricting the neighborhood to them is harmful for denoising purposes and that space-time NL-means preserves more movie details.
Abstract: Neighborhood filters are nonlocal image and movie filters which reduce the noise by averaging similar pixels. The first object of the paper is to present a unified theory of these filters and reliable criteria to compare them to other filter classes. A CCD noise model will be presented justifying the involvement of neighborhood filters. A classification of neighborhood filters will be proposed, including classical image and movie denoising methods and discussing further a recently introduced neighborhood filter, NL-means. In order to compare denoising methods three principles will be discussed. The first principle, "method noise", specifies that only noise must be removed from an image. A second principle will be introduced, "noise to noise", according to which a denoising method must transform a white noise into a white noise. Contrarily to "method noise", this principle, which characterizes artifact-free methods, eliminates any subjectivity and can be checked by mathematical arguments and Fourier analysis. "Noise to noise" will be proven to rule out most denoising methods, with the exception of neighborhood filters. This is why a third and new comparison principle, the "statistical optimality", is needed and will be introduced to compare the performance of all neighborhood filters.
The three principles will be applied to compare ten different image and movie denoising methods. It will be first shown that only wavelet thresholding methods and NL-means give an acceptable method noise. Second, that neighborhood filters are the only ones to satisfy the "noise to noise" principle. Third, that among them NL-means is closest to statistical optimality. A particular attention will be paid to the application of the statistical optimality criterion for movie denoising methods. It will be pointed out that current movie denoising methods are motion compensated neighborhood filters. This amounts to say that they are neighborhood filters and that the ideal neighborhood of a pixel is its trajectory. Unfortunately the aperture problem makes it impossible to estimate ground true trajectories. It will be demonstrated that computing trajectories and restricting the neighborhood to them is harmful for denoising purposes and that space-time NL-means preserves more movie details.
TL;DR: In this article, the robust maximum correntropy criterion (MCC) was adopted as the optimality criterion instead of using the minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption.