TL;DR: This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network.
Abstract: Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare four families of solutions, applicable in different settings. One of them is the minimum variance unbiased gradient estimator for stochatic binary neurons (a special case of the REINFORCE algorithm). A second approach, introduced here, decomposes the operation of a binary stochastic neuron into a stochastic binary part and a smooth differentiable part, which approximates the expected effect of the pure stochatic binary neuron to first order. A third approach involves the injection of additive or multiplicative noise in a computational graph that is otherwise differentiable. A fourth approach heuristically copies the gradient with respect to the stochastic output directly as an estimator of the gradient with respect to the sigmoid argument (we call this the straight-through estimator). To explore a context where these estimators are useful, we consider a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. In this case, it is important that the gating units produce an actual 0 most of the time. The resulting sparsity can be potentially be exploited to greatly reduce the computational cost of large deep networks for which conditional computation would be useful.
TL;DR: The proposed adaptive fuzzy tracking controller guarantees that all signals in the closed-loop system are bounded in probability and the system output eventually converges to a small neighborhood of the desired reference signal in the sense of mean quartic value.
Abstract: This paper is concerned with the problem of adaptive fuzzy tracking control for a class of pure-feedback stochastic nonlinear systems with input saturation. To overcome the design difficulty from nondifferential saturation nonlinearity, a smooth nonlinear function of the control input signal is first introduced to approximate the saturation function; then, an adaptive fuzzy tracking controller based on the mean-value theorem is constructed by using backstepping technique. The proposed adaptive fuzzy controller guarantees that all signals in the closed-loop system are bounded in probability and the system output eventually converges to a small neighborhood of the desired reference signal in the sense of mean quartic value. Simulation results further illustrate the effectiveness of the proposed control scheme.
TL;DR: It is found that when f is locally Lipschitz and semi-algebraic with bounded sublevel sets, the BFGS method with the inexact line search almost always generates sequences whose cluster points are Clarke stationary and with function values converging R-linearly to a Clarke stationary value.
Abstract: We investigate the behavior of quasi-Newton algorithms applied to minimize a nonsmooth function f , not necessarily convex. We introduce an inex- act line search that generates a sequence of nested intervals containing a set of points of nonzero measure that satisfy the Armijo and Wolfe conditions if f is absolutely continuous along the line. Furthermore, the line search is guaranteed to terminate if f is semi-algebraic. It seems quite difficult to establish a convergence theorem for quasi-Newton methods applied to such general classes of functions, so we give a care- ful analysis of a special but illuminating case, the Euclidean norm, in one variable using the inexact line search and in two variables assuming that the line search is exact. In practice, we find that when f is locally Lipschitz and semi-algebraic with bounded sublevel sets, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) method with the inexact line search almost always generates sequences whose cluster points are Clarke stationary and with function values converging R-linearly to a Clarke station- ary value. We give references documenting the successful use of BFGS in a variety of nonsmooth applications, particularly the design of low-order controllers for linear dynamical systems. We conclude with a challenging open question.
TL;DR: In this article, the authors consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.
Abstract: We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/√n) after n iterations. We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running-time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments showing that they often outperform existing approaches.
TL;DR: It is shown that the auto-encoder captures the score (derivative of the log-density with respect to the input) and contradicts previous interpretations of reconstruction error as an energy function.
Abstract: What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments.
TL;DR: Focusing on nonasymptotic bounds on convergence rates, it is shown that if pairs of function values are available, algorithms for d-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods.
Abstract: We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most $\sqrt{d}$ in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and non-smooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple ($m \ge 2$) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.
TL;DR: This work provides a criterion on m that describes the needed amount of regularization to ensure that the least squares method is stable and that its accuracy, measured in L2(X,ρX), is comparable to the best approximation error of f by elements from Vm.
Abstract: We consider the problem of reconstructing an unknown function f on a domain X from samples of f at n randomly chosen points with respect to a given measure źX. Given a sequence of linear spaces (Vm)m>0 with dim(Vm)=m≤n, we study the least squares approximations from the spaces Vm. It is well known that such approximations can be inaccurate when m is too close to n, even when the samples are noiseless. Our main result provides a criterion on m that describes the needed amount of regularization to ensure that the least squares method is stable and that its accuracy, measured in L2(X,źX), is comparable to the best approximation error of f by elements from Vm. We illustrate this criterion for various approximation schemes, such as trigonometric polynomials, with źX being the uniform measure, and algebraic polynomials, with źX being either the uniform or Chebyshev measure. For such examples we also prove similar stability results using deterministic samples that are equispaced with respect to these measures.
TL;DR: This work considers the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.
Abstract: We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^{1/2}). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments on standard machine learning benchmarks showing that they often outperform existing approaches.
TL;DR: A convergence analysis of accelerated forward-backward splitting methods for composite function minimization, when the proximity operator is not available in closed form, and can only be computed up to a certain precision is proposed.
Abstract: We propose a convergence analysis of accelerated forward-backward splitting methods for composite function minimization, when the proximity operator is not available in closed form, and can only be computed up to a certain precision. We prove that the $1/k^2$ convergence rate for the function values can be achieved if the admissible errors are of a certain type and satisfy a sufficiently fast decay condition. Our analysis is based on the machinery of estimate sequences first introduced by Nesterov for the study of accelerated gradient descent algorithms. Furthermore, we give a global complexity analysis, taking into account the cost of computing admissible approximations of the proximal point. An experimental analysis is also presented.
TL;DR: The three-loop remainder function as discussed by the authors describes the scattering of six gluons in the maximally-helicity-violating configuration in planar N = 4 super- Yang-Mills theory, as a function of the three dual conformal cross ratios.
Abstract: We present the three-loop remainder function, which describes the scattering of six gluons in the maximally-helicity-violating configuration in planar N = 4 super- Yang-Mills theory, as a function of the three dual conformal cross ratios. The result can be expressed in terms of multiple Goncharov polylogarithms. We also employ a more restricted class of hexagon functions which have the correct branch cuts and certain other restrictions on their symbols. We classify all the hexagon functions through transcendental weight five, using the coproduct for their Hopf algebra iteratively, which amounts to a set of first-order differential equations. The three-loop remainder function is a particular weight-six hexagon function, whose symbol was determined previously. The differential equations can be integrated numerically for generic values of the cross ratios, or analytically in certain kinematic limits, including the near-collinear and multi-Regge limits. These limits allow us to impose constraints from the operator product expansion and multi- Regge factorization directly at the function level, and thereby to fix uniquely a set of Riemann ζ valued constants that could not be fixed at the level of the symbol. The near- collinear limits agree precisely with recent predictions by Basso, Sever and Vieira based on integrability. The multi-Regge limits agree with the factorization formula of Fadin and Lipatov, and determine three constants entering the impact factor at this order. We plot the three-loop remainder function for various slices of the Euclidean region of positive cross ratios, and compare it to the two-loop one. For large ranges of the cross ratios, the ratio of the three-loop to the two-loop remainder function is relatively constant, and close to −7.
TL;DR: The Gaussian Process Upper Confidence Bound and Pure exploration algorithm (GP-UCB-PE) is introduced which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations and proves theoretical upper bounds on the regret with batches of size K for this procedure.
Abstract: In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of sqrt{K} for fixed iteration cost over purely sequential versions. Moreover, the multiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors.
TL;DR: This article considers the problem of constructing nonparametric tolerance/prediction sets by starting from the general conformal prediction approach, and uses a kernel density estimator as a measure of agreement between a sample point and the underlying distribution.
Abstract: This article introduces a new approach to prediction by bringing together two different nonparametric ideas: distribution-free inference and nonparametric smoothing. Specifically, we consider the problem of constructing nonparametric tolerance/prediction sets. We start from the general conformal prediction approach, and we use a kernel density estimator as a measure of agreement between a sample point and the underlying distribution. The resulting prediction set is shown to be closely related to plug-in density level sets with carefully chosen cutoff values. Under standard smoothness conditions, we get an asymptotic efficiency result that is near optimal for a wide range of function classes. But the coverage is guaranteed whether or not the smoothness conditions hold and regardless of the sample size. The performance of our method is investigated through simulation studies and illustrated in a real data example.
TL;DR: In this article, a normal approximation for the functional sample mean is developed and asymptotically justify testing procedures for the equality of means in two functional samples exhibiting temporal dependence.
Abstract: Summary. The paper is concerned with inference based on the mean function of a functional time series. We develop a normal approximation for the functional sample mean and then focus on the estimation of the asymptotic variance kernel. Using these results, we develop and asymptotically justify testing procedures for the equality of means in two functional samples exhibiting temporal dependence. Evaluated by means of a simulation study and application to a real data set, these two-sample procedures enjoy good size and power in finite samples.
TL;DR: In this paper, a Bayesian approach is adopted to the inverse problem of estimating an unknown function u from noisy measurements y of a known, possibly nonlinear, map applied to u. The prior measure is specified as a Gaussian random field μ 0.
Abstract: We consider the inverse problem of estimating an unknown function u from noisy measurements y of a known, possibly nonlinear, map $\mathcal {G}$ applied to u. We adopt a Bayesian approach to the problem and work in a setting where the prior measure is specified as a Gaussian random field μ0. We work under a natural set of conditions on the likelihood which implies the existence of a well-posed posterior measure, μy. Under these conditions, we show that the maximum a posteriori (MAP) estimator is well defined as the minimizer of an Onsager–Machlup functional defined on the Cameron–Martin space of the prior; thus, we link a problem in probability with a problem in the calculus of variations. We then consider the case where the observational noise vanishes and establish a form of Bayesian posterior consistency for the MAP estimator. We also prove a similar result for the case where the observation of $\mathcal {G}(u)$ can be repeated as many times as desired with independent identically distributed noise. The theory is illustrated with examples from an inverse problem for the Navier–Stokes equation, motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, motivated by problems arising in molecular dynamics.
TL;DR: It is shown that for certain classes of admissible inputs, the existence of an ISS-Lyapunov function implies the ISS of a system, and it is proved a linearization principle that allows a construction of a local ISS- Lyap unov function for a system.
Abstract: We develop tools for investigation of input-to-state stability (ISS) of infinite-dimensional control systems. We show that for certain classes of admissible inputs, the existence of an ISS-Lyapunov function implies the ISS of a system. Then for the case of systems described by abstract equations in Banach spaces, we develop two methods of construction of local and global ISS-Lyapunov functions. We prove a linearization principle that allows a construction of a local ISS-Lyapunov function for a system, the linear approximation of which is ISS. In order to study the interconnections of nonlinear infinite-dimensional systems, we generalize the small-gain theorem to the case of infinite-dimensional systems and provide a way to construct an ISS-Lyapunov function for an entire interconnection, if ISS-Lyapunov functions for subsystems are known and the small-gain condition is satisfied. We illustrate the theory on examples of linear and semilinear reaction-diffusion equations.
TL;DR: The numerical experiments show that SO-MI reaches significantly better results than the other algorithms when the number of function evaluations is very restricted (200-300 evaluations), and the algorithm converges to the global optimum almost surely.
TL;DR: In this paper, when the short-term interest rate is considered as a random variable, there is an unknown function λ(r, t), called the market price of risk, in the governing equation.
Abstract: As pointed out in Sect. 2.3, when the short-term interest rate is considered as a random variable, there is an unknown function λ(r, t), called the market price of risk, in the governing equation.
TL;DR: This work proposes LSE, an algorithm that guides both sampling and classification based on GP-derived confidence bounds, and extends LSE and its theory to two more natural settings: where the threshold level is implicitly defined as a percentage of the (unknown) maximum of the target function and (2) where samples are selected in batches.
Abstract: Many information gathering problems require determining the set of points, for which an unknown function takes value above or below some given threshold level. We formalize this task as a classification problem with sequential measurements, where the unknown function is modeled as a sample from a Gaussian process (GP). We propose LSE, an algorithm that guides both sampling and classification based on GP-derived confidence bounds, and provide theoretical guarantees about its sample complexity. Furthermore, we extend LSE and its theory to two more natural settings: (1) where the threshold level is implicitly defined as a percentage of the (unknown) maximum of the target function and (2) where samples are selected in batches. We evaluate the effectiveness of our proposed methods on two problems of practical interest, namely autonomous monitoring of algal populations in a lake environment and geolocating network latency.
TL;DR: In this paper, the Taylor-Maclaurin coefficients of the function f when f is in the following subclasses: SΣ(λ,γ;ϕ), HS Σ(α), RΣ (η,γ,ϕ) and BΣ((μ,φ,γ),φ) are investigated.
Abstract: In this paper, we introduce and investigate each of the following subclasses: SΣ(λ,γ;ϕ), HS Σ(α), RΣ(η,γ;ϕ) and BΣ(μ;ϕ) (0 λ 1; γ ∈ C� {0}; α ∈ C ;0 η 0, and ϕ(D) is symmetric with respect to the real axis. We obtain coefficient bounds involving the Taylor-Maclaurin coefficients |a2| and |a3| of the function f when f is in these classes. The various results, which are presented in this paper, would generalize and improve those in related works of several earlier authors.
TL;DR: The Gaussian Process Upper Confidence Bound and Pure Exploration (GP-UCB-PE) algorithm as discussed by the authors combines the UCB strategy and pure exploration in the same batch of evaluations along the parallel iterations.
Abstract: In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of $\sqrt{K}$ for fixed iteration cost over purely sequential versions. Moreover, the multiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors.
TL;DR: The three-loop remainder function as discussed by the authors describes the scattering of six gluons in the maximally-helicity-violating configuration in planar N=4 super-Yang-Mills theory, as a function of the three dual conformal cross ratios.
Abstract: We present the three-loop remainder function, which describes the scattering of six gluons in the maximally-helicity-violating configuration in planar N=4 super-Yang-Mills theory, as a function of the three dual conformal cross ratios. The result can be expressed in terms of multiple Goncharov polylogarithms. We also employ a more restricted class of "hexagon functions" which have the correct branch cuts and certain other restrictions on their symbols. We classify all the hexagon functions through transcendental weight five, using the coproduct for their Hopf algebra iteratively, which amounts to a set of first-order differential equations. The three-loop remainder function is a particular weight-six hexagon function, whose symbol was determined previously. The differential equations can be integrated numerically for generic values of the cross ratios, or analytically in certain kinematics limits, including the near-collinear and multi-Regge limits. These limits allow us to impose constraints from the operator product expansion and multi-Regge factorization directly at the function level, and thereby to fix uniquely a set of Riemann-zeta-valued constants that could not be fixed at the level of the symbol. The near-collinear limits agree precisely with recent predictions by Basso, Sever and Vieira based on integrability. The multi-Regge limits agree with the factorization formula of Fadin and Lipatov, and determine three constants entering the impact factor at this order. We plot the three-loop remainder function for various slices of the Euclidean region of positive cross ratios, and compare it to the two-loop one. For large ranges of the cross ratios, the ratio of the three-loop to the two-loop remainder function is relatively constant, and close to -7.
TL;DR: In this article, the authors proposed a sampling approach to estimate the distributions of the extreme value of the stochastic process, which is then used to replace the corresponding stochian process, and then the time-dependent reliability analysis is converted into its time-invariant counterpart.
Abstract: Maintaining high accuracy and efficiency is a challenging issue in time-dependent reliability analysis. In this work, an accurate and efficient method is proposed for limit-state functions with the following features: The limit-state function is implicit with respect to time. There is only one stochastic process in the input to the limit-sate function. The stochastic process could be either a general strength or a general stress variable so that the limit-state function is monotonic to the stochastic process. The new method employs a sampling approach to estimate the distributions of the extreme value of the stochastic process. The extreme value is then used to replace the corresponding stochastic process. Consequently the time-dependent reliability analysis is converted into its time-invariant counterpart. The commonly used time-invariant reliability method, the first order reliability method, is then applied to calculate the probability of failure over a given period of time. The results show that the proposed method significantly improves the accuracy and efficiency of time-dependent reliability analysis. [DOI: 10.1115/1.4023925]
TL;DR: This Part I of two articles introduces the new time-correlation function and derives its t → 0+ limit and shows that the RPMD-TST rate is equal to the exact quantum rate in the absence of recrossing.
Abstract: Surprisingly, there exists a quantum flux-side time-correlation function which has a non-zero short-time (t->0+) limit, and thus yields a rigorous quantum generalization of classical transition-state theory (TST). In this Part I of two articles, we introduce the new time-correlation function, and derive its short-time limit. The new ingredient is a generalized Kubo transform which allows the flux and side dividing surfaces to be the same function of path-integral space. Choosing this common dividing surface to be a single point gives a short-time limit which is identical to an expression introduced on heuristic grounds by Wigner in 1932, but which does not give positive-definite quantum statistics, causing it to fail while still in the shallow-tunnelling regime. Choosing the dividing surface to be invariant to imaginary-time translation gives, uniquely, a short-time limit that gives the correct positive- definite quantum statistics at all temperatures, and which is identical to ring-polymer molecular dynamics (RPMD) TST. We find that the RPMD-TST rate is not a strict upper bound to the exact quantum rate, but a good approximation to one if real-time coherence effects are small. Part II will show that the RPMD-TST rate is equal to the exact quantum rate in the absence of recrossing.
TL;DR: The current state-of-the-art is due to Gentry, Halevi, and Smart as discussed by the authors, who constructed a fully homomorphic encryption (FHE) scheme from a "somewhat homomorphic" one that is powerful enough to evaluate its own decryption function.
Abstract: Gentry’s “bootstrapping” technique (STOC 2009) constructs a fully homomorphic encryption (FHE) scheme from a “somewhat homomorphic” one that is powerful enough to evaluate its own decryption function. To date, it remains the only known way of obtaining unbounded FHE. Unfortunately, bootstrapping is computationally very expensive, despite the great deal of effort that has been spent on improving its efficiency. The current state of the art, due to Gentry, Halevi, and Smart (PKC 2012), is able to bootstrap “packed” ciphertexts (which encrypt up to a linear number of bits) in time only quasilinear O(λ) = λ · logO(1) λ in the security parameter. While this performance is asymptotically optimal up to logarithmic factors, the practical import is less clear: the procedure composes multiple layers of expensive and complex operations, to the point where it appears very difficult to implement, and its concrete runtime appears worse than those of prior methods (all of which have quadratic or larger asymptotic runtimes).
TL;DR: This paper model the robust loop-closure pose-graph SLAM problem as a Bayesian network and shows that it can be solved with the Classification Expectation-Maximization (EM) algorithm, and shows proofs of the conceptual similarity between the EM algorithm and the M-Estimator.
Abstract: In this paper, we model the robust loop-closure pose-graph SLAM problem as a Bayesian network and show that it can be solved with the Classification Expectation-Maximization (EM) algorithm. In particular, we express our robust pose-graph SLAM as a Bayesian network where the robot poses and constraints are latent and observed variables. An additional set of latent variables is introduced as weights for the loop-constraints. We show that the weights can be chosen as the Cauchy function, which are iteratively computed from the errors between the predicted robot poses and observed loop-closure constraints in the Expectation step, and used to weigh the cost functions from the pose-graph loop-closure constraints in the Maximization step. As a result, outlier loop-closure constraints are assigned low weights and exert less influences in the pose-graph optimization within the EM iterations. To prevent the EM algorithm from getting stuck at local minima, we perform the EM algorithm multiple times where the loop constraints with very low weights are removed after each EM process. This is repeated until there are no more changes to the weights. We show proofs of the conceptual similarity between our EM algorithm and the M-Estimator. Specifically, we show that the weight function in our EM algorithm is equivalent to the robust residual function in the M-Estimator. We verify our proposed algorithm with experimental results from multiple simulated and real-world datasets, and comparisons with other existing works.
TL;DR: In this paper, the trace distance of a two-qubit state to the minimization of an explicit two-variable function was derived for the class of Bell-diagonal states.
Abstract: It is known that a reliable geometric quantifier of discord-like correlations can be built by employing the so-called trace distance. This is used to measure how far the state under investigation is from the closest "classical-quantum" one. To date, the explicit calculation of this indicator for two qubits was accomplished only for states such that the reduced density matrix of the measured party is maximally mixed, a class that includes Bell-diagonal states. Here, we first reduce the required optimization for a general two-qubit state to the minimization of an explicit two-variable function. Using this framework, we show next that the minimum can be analytically worked out in a number of relevant cases including quantum-classical and X states. This provides an explicit and compact expression for the trace distance discord of an arbitrary state belonging to either of these important classes of density matrices.
TL;DR: In this paper, the known formulations for steady-state hydraulics within looped water distribution networks are rederived in terms of linear and nonlinear transformations of the original set of partly linear and partly nonlinear equations that express conservation of mass and energy.
Abstract: The known formulations for steady-state hydraulics within looped water distribution networks are rederived in terms of linear and nonlinear transformations of the original set of partly linear and partly nonlinear equations that express conservation of mass and energy. All of these formulations lead to a system of nonlinear equations that can be linearized as a function of the chosen unknowns using either the Newton-Raphson (NR) or the linear theory (LT) approaches. This produces a number of different algorithms, some of which are already known in the literature, whereas others have been originally developed within this work. For the sake of clarity, all the different algorithms were rederived using the same analytical approach and a unified notation. They were all applied to the same test case network with randomly perturbed demands to compare their convergence characteristics. The results show that all of the linearly transformed formulations have exactly the same convergence rate, whose value d...
TL;DR: In this article, the authors studied the thermal partition function of level $k$ U(N) Chern-Simons theories on matter interacting with the fundamental representation in the t Hooft limit.
Abstract: We study the thermal partition function of level $k$ U(N) Chern-Simons theories on $S^2$ interacting with matter in the fundamental representation. We work in the 't Hooft limit, $N,k\to\infty$, with $\lambda = N/k$ and $\frac{T^2 V_{2}}{N}$ held fixed where $T$ is the temperature and $V_{2}$ the volume of the sphere. An effective action proposed in arXiv:1211.4843 relates the partition function to the expectation value of a `potential' function of the $S^1$ holonomy in pure Chern-Simons theory; in several examples we compute the holonomy potential as a function of $\lambda$. We use level rank duality of pure Chern-Simons theory to demonstrate the equality of thermal partition functions of previously conjectured dual pairs of theories as a function of the temperature. We reduce the partition function to a matrix integral over holonomies. The summation over flux sectors quantizes the eigenvalues of this matrix in units of ${2\pi \over k}$ and the eigenvalue density of the holonomy matrix is bounded from above by $\frac{1}{2 \pi \lambda}$. The corresponding matrix integrals generically undergo two phase transitions as a function of temperature. For several Chern-Simons matter theories we are able to exactly solve the relevant matrix models in the low temperature phase, and determine the phase transition temperature as a function of $\lambda$. At low temperatures our partition function smoothly matches onto the $N$ and $\lambda$ independent free energy of a gas of non renormalized multi trace operators. We also find an exact solution to a simple toy matrix model; the large $N$ Gross-Witten-Wadia matrix integral subject to an upper bound on eigenvalue density.
TL;DR: The algorithm, StoSOO, follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next and shows that it performs almost as well as the best specifically-tuned algorithms even though the local smoothness of thefunction is not known.
Abstract: We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semi-metric, around one of its global maxima. Compared to previous works on bandits in general spaces (Kleinberg et al., 2008; Bubeck et al., 2011a) our algorithm does not require the knowledge of this semi-metric. Our algorithm, StoSOO, follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next. A finite-time analysis of StoSOO shows that it performs almost as well as the best specifically-tuned algorithms even though the local smoothness of the function is not known.
TL;DR: The algorithm of Raessi and Pitsch is modified from a staggered grid method to a collocated grid method and their treatment for the nonlinear terms with the variable density, collocated, pressure projection algorithm developed by Kwatra et al.
Abstract: A coupled level set and moment of fluid method (CLSMOF) is described for computing solutions to incompressible two-phase flows. The local piecewise linear interface reconstruction (the CLSMOF reconstruction) uses information from the level set function, volume of fluid function, and reference centroid, in order to produce a slope and an intercept for the local reconstruction. The level set function is coupled to the volume-of-fluid function and reference centroid by being maintained as the signed distance to the CLSMOF piecewise linear reconstructed interface.
The nonlinear terms in the momentum equations are solved using the sharp interface approach recently developed by Raessi and Pitsch (Annual Research Brief, 2009). We have modified the algorithm of Raessi and Pitsch from a staggered grid method to a collocated grid method and we combine their treatment for the nonlinear terms with the variable density, collocated, pressure projection algorithm developed by Kwatra et al. (J. Comput. Phys. 228:4146---4161, 2009). A collocated grid method makes it convenient for using block structured adaptive mesh refinement (AMR) grids. Many 2D and 3D numerical simulations of bubbles, jets, drops, and waves on a block structured adaptive grid are presented in order to demonstrate the capabilities of our new method.