TL;DR: In this article, an incremental majorization-minimization scheme for minimizing a large sum of continuous functions is proposed, where the upper bounds approximate the objective up to a smooth error; such upper bounds are called first-order surrogate functions.
Abstract: Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable and has been very popular in various scientific fields, especially in signal processing and statistics. We propose an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning. We present convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error; we call such upper bounds “first-order surrogate functions.” More precisely, we study asymptotic stationary point guarantees for nonconvex problems, and for convex ones, we provide convergence rates for the expected objective function value. We apply our scheme to composite optimization and obtain a new incremental proximal gradient algorithm with linear convergence rate for strongly convex functions. Our experiments show that our method is competitive with the state of the art for solving machine learning problems such as logistic regression when the number of training samples is large enough, and we demonstrate its usefulness for sparse estimation with nonconvex penalties.
TL;DR: In this paper, the authors proposed block-coordinate fixed point algorithms with applications to nonlinear analysis and optimization in Hilbert spaces, based on a notion of stochastic quasi-Fejer monotonicity.
Abstract: This work proposes block-coordinate fixed point algorithms with applications to nonlinear analysis and optimization in Hilbert spaces. The asymptotic analysis relies on a notion of stochastic quasi-Fejer monotonicity, which is thoroughly investigated. The iterative methods under consideration feature random sweeping rules to select arbitrarily the blocks of variables that are activated over the course of the iterations and they allow for stochastic errors in the evaluation of the operators. Algorithms using quasi-nonexpansive operators or compositions of averaged nonexpansive operators are constructed, and weak and strong convergence results are established for the sequences they generate. As a by-product, novel block-coordinate operator splitting methods are obtained for solving structured monotone inclusion and convex minimization problems. In particular, the proposed framework leads to random block-coordinate versions of the Douglas--Rachford and forward-backward algorithms and of some of their variant...
TL;DR: The results demonstrate that variable-order fractional derivatives can be used to model the physics of anomalous transport with spatiotemporal variability but also as new effective numerical tools that can deal with the long-standing issues of outflow boundary conditions and monotonicity of integer-order PDEs.
TL;DR: A novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems by using an iterative adaptive dynamic programming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum.
Abstract: This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamic programming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton–Jacobi–Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.
TL;DR: In this article, the Lyapunov stability and boundedness of motions of infinite-dimensional dynamical systems determined by differential equations defined on Banach spaces and by semigroups with an emphasis on the qualitative properties of equilibria were studied.
Abstract: We address the Lyapunov stability and the boundedness of motions (Lagrange stability) of infinite-dimensional dynamical systems determined by differential equations defined on Banach spaces and by semigroups with an emphasis on the qualitative properties of equilibria. We consider continuous as well as discontinuous dynamical systems (DDS). Most of the results involve monotonic Lyapunov functions. However, some of the stability results for DDS involve non-monotonic Lyapunov functions as well.
TL;DR: This work extends the structural risk minimization framework of lattice regression to monotonic functions by adding linear inequality constraints, and proposes jointly learning interpretable calibrations of each feature to normalize continuous features and handle categorical or missing data.
Abstract: Real-world machine learning applications may require functions that are fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of the learned function can be critical to user trust. We propose meeting these goals for low-dimensional machine learning problems by learning flexible, monotonic functions using calibrated interpolated look-up tables. We extend the structural risk minimization framework of lattice regression to train monotonic look-up tables by solving a convex problem with appropriate linear inequality constraints. In addition, we propose jointly learning interpretable calibrations of each feature to normalize continuous features and handle categorical or missing data, at the cost of making the objective non-convex. We address large-scale learning through parallelization, mini-batching, and propose random sampling of additive regularizer terms. Case studies with real-world problems with five to sixteen features and thousands to millions of training samples demonstrate the proposed monotonic functions can achieve state-of-the-art accuracy on practical problems while providing greater transparency to users.
TL;DR: In this article, two new monotonicity concepts for a nonnegative or nonpositive valued function defined on a discrete domain were introduced, and the mean value theorem on discrete fractional calculus was proved.
Abstract: In this paper, we introduce two new monotonicity concepts for a nonnegative
or nonpositive valued function defined on a discrete domain. We give examples
to illustrate connections between these new monotonicity concepts and the
traditional ones. We then prove some monotonicity criteria based on the sign
of the fractional difference operator of a function f, Δνf with 0 < ν < 1. As
an application, we state and prove the mean value theorem on discrete
fractional calculus.
TL;DR: A novel iterative Q-learning algorithm is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems by using an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the Iterative Q function.
Abstract: In this chapter, a novel iterative Q-learning algorithm, called “policy iteration-based deterministic Q-learning algorithm,” is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically nonincreasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are used to implement the policy iteration-based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
TL;DR: The objective of this work is to develop a method of fusing monotonic decision trees and establish a fusing principe based on maximal probability through combining the base classifiers, which is used to further improve generalization ability of the learning system.
Abstract: Ordinal classification with a monotonicity constraint is a kind of classification tasks, in which the objects with better attribute values should not be assigned to a worse decision class. Several learning algorithms have been proposed to handle this kind of tasks in recent years. The rank entropy-based monotonic decision tree is very representative thanks to its better robustness and generalization. Ensemble learning is an effective strategy to significantly improve the generalization ability of machine learning systems. The objective of this work is to develop a method of fusing monotonic decision trees. In order to achieve this goal, we take two factors into account: attribute reduction and fusing principle. Through introducing variable dominance rough sets, we firstly propose an attribute reduction approach with rank-preservation for learning base classifiers, which can effectively avoid overfitting and improve classification performance. Then, we establish a fusing principe based on maximal probability through combining the base classifiers, which is used to further improve generalization ability of the learning system. The experimental analysis shows that the proposed fusing method can significantly improve classification performance of the learning system constructed by monotonic decision trees.
TL;DR: Experimental results show that by using the relative decision entropy-based feature significance as heuristic information, FSMRDE is efficient for feature selection and is able to achieve good scalability for large data sets.
TL;DR: This paper proposes a general framework for tensor block coordinate ascent methods for hypergraph matching and proposes two algorithms which both come along with the guarantee of monotonic ascent in the matching score on the set of discrete assignment matrices.
Abstract: The estimation of correspondences between two images resp. point sets is a core problem in computer vision. One way to formulate the problem is graph matching leading to the quadratic assignment problem which is NP-hard. Several so called second order methods have been proposed to solve this problem. In recent years hypergraph matching leading to a third order problem became popular as it allows for better integration of geometric information. For most of these third order algorithms no theoretical guarantees are known. In this paper we propose a general framework for tensor block coordinate ascent methods for hypergraph matching. We propose two algorithms which both come along with the guarantee of monotonic ascent in the matching score on the set of discrete assignment matrices. In the experiments we show that our new algorithms outperform previous work both in terms of achieving better matching scores and matching accuracy. This holds in particular for very challenging settings where one has a high number of outliers and other forms of noise.
TL;DR: In this paper, the authors study a dynamic principal-agent model in which the agent's types are serially correlated and explore the conditions under which this approach is valid and can be used to characterize the profit maximizing contract.
Abstract: We study a dynamic principal-agent model in which the agent's types are serially correlated. In these models, the standard approach consists of first solving a relaxed version in which only local incentive compatibility constraints are considered, and then in proving that the local constraints are sufficient for implementability. We explore the conditions under which this approach is valid and can be used to characterize the profit maximizing contract. We show that the approach works when the optimal allocation in the relaxed problem is monotonic in the types, a condition that is satisfied in most solved examples. Contrary to the static model, however, monotonicity is generally violated in many interesting economic environments. Moreover, when the time horizon is long enough and serial correlation is sufficiently high, global incentive compatibility constraints are generically binding. By fully characterizing a simple two period example, we uncover a number of interesting features of the optimal contract that cannot be observed in spatial environments in which the standard approach works. Finally, we show that even in complex environments, approximately optimal allocations can be easily characterized by focusing on a particular class of contracts in which the allocation is forced to be monotonic.
TL;DR: It is shown that a real z is ML random if and only if every computable function of bounded variation is differentiable at z, and similarly for absolutely continuous functions.
Abstract: We characterize some major algorithmic randomness notions via differentiability of effective functions. (1) We show that a real number z ∈ [0, 1] is computably random if and only if every nondecreasing computable function [0, 1] → R is differentiable at z. (2) We prove that a real number z ∈ [0, 1] is weakly 2-random if and only if every almost everywhere differentiable computable function [0, 1] → R is differentiable at z. (3) Recasting results of the constructivist Demuth from 1975 in classical language, we show that a real z is Martin-Lof random if and only if every computable function of bounded variation is differentiable at z, and similarly for absolutely continuous functions. We also use the analytic methods to show that computable randomness of a real is base invariant, and to derive preservation results for randomness notions.
TL;DR: In this article, existence results on nonlinear first order and doubly nonlinear second order evolution equations involving the fractional p-Laplacian are presented, which do not exploit any monotonicity assumption but rely on a compactness argument in combination with regularity of the Galerkin scheme and the nonlocal character of the operator.
Abstract: In this work existence results on nonlinear first order as well as doubly nonlinear second order evolution equations involving the fractional p-Laplacian are presented. The proofs do not exploit any monotonicity assumption but rely on a compactness argument in combination with regularity of the Galerkin scheme and the nonlocal character of the operator.
TL;DR: It is deduced that the trees produced by the Random Forest also hold the monotonicity restriction but achieve a slightly better predictive performance than standard algorithms.
Abstract: In classification problems, it is very common to have ordinal data in the variables, in both explanatory and class variables. When the class variable should increase according to a subset of explanatory variables, the problem must satisfy the monotonicity constraint. It is well known that standard classification tree algorithms, such as CART or C4.5, are not guaranteed to produce monotonic trees, even if the data set is completely monotone. Recently, some classifiers have been designed to handle these kinds of problems. In decision trees, growing and pruning mechanisms have been updated to improve the monotonicity of the trees. In this paper, we study the suitability of using these mechanisms in the generation of Random Forests. For this, we propose a simple ensemble pruning mechanism based on the degree of monotonicity of the resulting trees. The performance of several decision trees are evaluated through experimental studies on monotonic data sets. We deduce that the trees produced by the Random Forest also hold the monotonicity restriction but achieve a slightly better predictive performance than standard algorithms.
TL;DR: In this article, an infinite-dimensional variational inequality (VI) formulation that is equivalent to the conditions defining a continuous-time E-DUE problem is established by applying a fixed-point existence theorem ( Browder, 1968 ) in an extended Hilbert space.
Abstract: This paper is concerned with dynamic user equilibrium with elastic travel demand (E-DUE) when the trip demand matrix is determined endogenously. We present an infinite-dimensional variational inequality (VI) formulation that is equivalent to the conditions defining a continuous-time E-DUE problem. An existence result for this VI is established by applying a fixed-point existence theorem ( Browder, 1968 ) in an extended Hilbert space. We present three computational algorithms based on the aforementioned VI and its re-expression as a differential variational inequality (DVI): a projection method, a self-adaptive projection method, and a proximal point method. Rigorous convergence results are provided for these methods, which rely on increasingly relaxed notions of generalized monotonicity, namely mixed strongly-weakly monotonicity for the projection method; pseudomonotonicity for the self-adaptive projection method, and quasimonotonicity for the proximal point method. These three algorithms are tested and their solution quality, convergence, and computational efficiency are compared. Our convergence results, which transcend the transportation applications studied here, apply to a broad family of VIs and DVIs, and are the weakest reported to date.
TL;DR: In this article, the surface tension of multi-component mixtures with the gradient theory of fluid interfaces is considered, and a linear transformation is constructed to reduce the Euler-Lagrange equations and a path function is introduced.
TL;DR: It is shown that PCDM is monotonic in expectation, which was not confirmed in [Richtárik and Takáč, Parallel coordinate descent methods for big data optimization, Math. Ser. A (2015), pp. 1–52], and the first high probability iteration complexity result where the initial levelset is unbounded is derived.
Abstract: In this work we study the parallel coordinate descent method (PCDM) proposed by Richtarik and Takac [26] for minimizing a regularized convex function. We adopt elements from the work of Xiao and Lu [39], and combine them with several new insights, to obtain sharper iteration complexity results for PCDM than those presented in [26]. Moreover, we show that PCDM is monotonic in expectation, which was not confirmed in [26], and we also derive the first high probability iteration complexity result where the initial levelset is unbounded.
TL;DR: In this paper, the existence and uniqueness of positive solutions to a nonlinear arbitrary order fractional differential equation was studied using a mixed monotone operator method, and the authors provided an example to illustrate the main result.
Abstract: In this paper, using a mixed monotone operator method, we study the existence and uniqueness of positive solutions to a nonlinear arbitrary order fractional differential equation. An example is provided to illustrate the main result.
TL;DR: In this paper, the authors established inequalities, monotonicity, convexity, and unimodality for functions concerning the modified Bessel functions of the first kind and computed the completely monotonic degrees of differences between the exponential and trigamma functions.
Abstract: In the paper, the author establishes inequalities, monotonicity, convexity, and unimodality for functions concerning the modified Bessel functions of the first kind and compute the completely monotonic degrees of differences between the exponential and trigamma functions. Mathematics subject classification (2010): Primary 26A12; Secondary 26A48, 26A51, 26D15, 30D10, 30E20, 33B10, 33B15, 33C20, 42B10, 44A10.
TL;DR: In this article, a new hierarchy over monotone set functions, referred to as MPH (Maximum over Positive Hypergraphs), was introduced, and the authors showed that the maximum welfare problem can be approximated within a ratio of κ + 1 if all players hold valuation functions in MPH-κ.
Abstract: We introduce a new hierarchy over monotone set functions, that we refer to as MPH (Maximum over Positive Hyper-graphs). Levels of the hierarchy correspond to the degree of complementarity in a given function. The highest level of the hierarchy, MPH-m (where m is the total number of items) captures all monotone functions. The lowest level, MPH-1, captures all monotone submodular functions, and more generally, the class of functions known as XOS. Every monotone function that has a positive hypergraph representation of rank κ (in the sense defined by Abraham, Babaioff, Dughmi and Roughgarden [EC 2012]) is in MPH-κ. Every monotone function that has supermodular degree κ (in the sense defined by Feige and Izsak [ITCS 2013]) is in MPH-(κ+1). In both cases, the converse direction does not hold, even in an approximate sense. We present additional results that demonstrate the expressiveness power of MPH-κ.
One can obtain good approximation ratios for some natural optimization problems, provided that functions are required to lie in low levels of the MPH hierarchy. We present two such applications. One shows that the maximum welfare problem can be approximated within a ratio of κ + 1 if all players hold valuation functions in MPH-κ. The other is an upper bound of 2κ on the price of anarchy of simultaneous first price auctions.
TL;DR: In this paper, the problem of finding the zeros of the sum of a maximally monotone operator and a Lipschitz continuous one in a real Hilbert space via an implicit forward-backward-forward dynamical system with nonconstant relaxation parameters and stepsizes of the resolvents was studied.
Abstract: In this paper, we approach the problem of finding the zeros of the sum of a maximally monotone operator and a monotone and Lipschitz continuous one in a real Hilbert space via an implicit forward-backward-forward dynamical system with nonconstant relaxation parameters and stepsizes of the resolvents. Besides proving existence and uniqueness of strong global solutions for the differential equation under consideration, we show weak convergence of the generated trajectories and, under strong monotonicity assumptions, strong convergence with exponential rate. In the particular setting of minimizing the sum of a proper, convex and lower semicontinuous function with a smooth convex one, we provide a rate for the convergence of the objective function along the ergodic trajectory to its minimum value.
TL;DR: The proposed wavelet thresholding function based on hyperbolic tangent function can achieve better denoising effect than the classical hard and soft thresholding functions under different signal types and noise intensities.
Abstract: Thresholding function is an important part of the wavelet threshold denoising method, which can influence the signal denoising effect significantly However, some defects are present in the existing methods, such as function discontinuity, fixed bias, and parameters determined by trial and error In order to solve these problems, a new wavelet thresholding function based on hyperbolic tangent function is proposed in this paper Firstly, the basic properties of hyperbolic tangent function are analyzed Then, a new thresholding function with a shape parameter is presented based on hyperbolic tangent function The continuity, monotonicity, and high-order differentiability of the new function are theoretically proven Finally, in order to determine the final form of the new function, a shape parameter optimization strategy based on artificial fish swarm algorithm is given in this paper Mean square error is adopted to construct the objective function, and the optimal shape parameter is achieved by iterative search At the end of the paper, a simulation experiment is provided to verify the effectiveness of the new function In the experiment, two benchmark signals are used as test signals Simulation results show that the proposed function can achieve better denoising effect than the classical hard and soft thresholding functions under different signal types and noise intensities
TL;DR: In this paper, the authors concisely survey and review some functions involving the gamma function and its various ratios, and find necessaryand sufficient conditions for a new function involving the ratio of twogamma functions and originating from the coding gain to be logarithmically completely monotonic.
Abstract: In the paper, the authors concisely survey and review some functionsinvolving the gamma function and its various ratios, simply state theirlogarithmically complete monotonicity and related results, and find necessaryand sufficient conditions for a new function involving the ratio of twogamma functions and originating from the coding gain to be logarithmicallycompletely monotonic.
TL;DR: In this paper, the authors established an integral representation of the Catalan numbers and connected the Catalan number with the (logarithmically) complete monotonicity of a function involving ratio of gamma functions.
Abstract: In the paper, the authors establish an integral representation of the Catalan numbers, connect the Catalan numbers with the (logarithmically) complete monotonicity, and pose an open problem on the logarithmically complete monotonicity of a function involving ratio of gamma functions.
TL;DR: In this paper, a scalarization result and a density theorem concerned with the sets of weakly efficient and efficient approximate solutions to a generalized vector equilibrium problem are given, respectively.
Abstract: In this paper, a scalarization result and a density theorem concerned with the sets of weakly efficient and efficient approximate solutions to a generalized vector equilibrium problem are given, respectively. By using the scalarization result and the density theorem, the connectedness of the sets of weakly efficient and efficient approximate solutions to the generalized vector equilibrium problem are established without the assumptions of monotonicity and compactness. The lower semicontinuity of weakly efficient and efficient approximate solution mappings to the parametric generalized vector equilibrium problem with perturbing both the objective mapping and the feasible region are obtained without the assumptions of monotonicity and compactness. Furthermore, the upper semicontinuity of weakly efficient approximate solution mapping and the Hausdorff upper semicontinuity of efficient approximate solution mapping to the parametric generalized vector equilibrium problem with perturbing both the objective mapping and the feasible region are also given under some suitable conditions.
TL;DR: In the steady-state model, the necessary conditions are shown to be satisfied, provided that the route cost vector is a continuously differentiable monotone function of the route flow vector, but continuous differentiability of the cost function is shown not to hold in the dynamic queueing model.
Abstract: The traffic assignment problem aims to calculate an equilibrium route flow vector, generally by seeking a zero of an appropriate objective function. If a continuous dynamical system follows a descent direction for this objective function at each nonequilibrium route flow vector, the system converges to equilibrium. It is shown that when this dynamical system is discretized with a fixed step length, the system eventually approaches close to equilibrium provided that the objective function is continuously differentiable and that the rate of descent is bounded below. The method of successive averages is widely used in traffic assignment; it has a decreasing step size at each iteration. With the same conditions as above, it is shown that the resulting dynamical system converges to equilibrium. In the steady-state model, the necessary conditions are shown to be satisfied, provided that the route cost vector is a continuously differentiable monotone function of the route flow vector. However, continuous differentiability of the cost function is shown not to hold in the dynamic queueing model.
TL;DR: The monotonicity of similarity measures between intuitionistic fuzzy sets are investigated by means of analyzing the geometrical relation between intuitionist fuzzy sets, and three types ofmonotonicity properties are defined.
TL;DR: In this paper, the authors find necessary conditions and sufficient conditions for a function involving the gamma function and originating from investigation of properties of the Catalan numbers and function in combinatorics to be logarithmically completely monotonic.
Abstract: In the paper, the authors find necessary conditions and sufficient conditions for a function involving the gamma function and originating from investigation of properties of the Catalan numbers and function in combinatorics to be logarithmically completely monotonic.
TL;DR: In this paper, the authors consider the problem of inference on a regression function at a point when the entire function satisfies a sign or shape restriction under the null and propose a test that achieves the optimal minimax rate adaptively over a range of Holder classes, up to a $\log\log n$ term, which is necessary for adaptation.
Abstract: We consider the problem of inference on a regression function at a point when the entire function satisfies a sign or shape restriction under the null. We propose a test that achieves the optimal minimax rate adaptively over a range of Holder classes, up to a $\log\log n$ term, which we show to be necessary for adaptation. We apply the results to adaptive one-sided tests for the regression discontinuity parameter under a monotonicity restriction, the value of a monotone regression function at the boundary and the proportion of true null hypotheses in a multiple testing problem.