Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Function (mathematics)
  4. 2016
  1. Home
  2. Topics
  3. Function (mathematics)
  4. 2016
Showing papers on "Function (mathematics) published in 2016"
Book Chapter•10.1007/978-3-319-46493-0_20•
Generative Image Modeling Using Style and Structure Adversarial Networks

[...]

Xiaolong Wang1, Abhinav Gupta1•
Carnegie Mellon University1
8 Oct 2016
TL;DR: This paper factorize the image generation process and proposes Style and Structure Generative Adversarial Network, a model that is interpretable, generates more realistic images and can be used to learn unsupervised RGBD representations.
Abstract: Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution. However, these approaches ignore the most basic principle of image formation: images are product of: (a) Structure: the underlying 3D model; (b) Style: the texture mapped onto structure. In this paper, we factorize the image generation process and propose Style and Structure Generative Adversarial Network (\({\text {S}^2}\)-GAN). Our \({\text {S}^2}\)-GAN has two components: the Structure-GAN generates a surface normal map; the Style-GAN takes the surface normal map as input and generates the 2D image. Apart from a real vs. generated loss function, we use an additional loss with computed surface normals from generated images. The two GANs are first trained independently, and then merged together via joint learning. We show our \({\text {S}^2}\)-GAN model is interpretable, generates more realistic images and can be used to learn unsupervised RGBD representations.

872 citations

Proceedings Article•
Unsupervised Cross-Domain Image Generation

[...]

Yaniv Taigman1, Adam Polyak2, Lior Wolf2•
Facebook1, Tel Aviv University2
4 Nov 2016
TL;DR: The Domain Transfer Network (DTN) is presented, which employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves.
Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

752 citations

Posted Content•
Third-Degree Stochastic Dominance

[...]

G. A. Whitmore
01 Jan 2016-The American Economic Review
TL;DR: The third-degree stochastic dominance condition was introduced in this article, where the authors show that the set of probability distributions that can be ordered by means of second-degree Stieltjes dominance is, in general, larger than that which can be order by first-degree SDE.
Abstract: Here F(x) and G(x) are less-than cumulative probability distributionis where x is a continuous or discrete random variable representing the outcome of a prospect. The closed interval [a, b] is the sample space of both prospects. The integral shown in Rule 2 and those shown throughout the paper are Stieltjes integrals. Recall that the Stieltjes integral fb f(x)dg(x) exists if one of the functions f and g is continuous and the other has finite variation in [a, b]. Let D1, D2, and D3 be three sets of utility functions ?(x). D1 is the set containing all utility functions with 4(x) and +1(x) continuous, and 41(x) >0 for all xE[a, b]. D2 is the set with ?(x), ?1(x), ?2(x) continuous, and q$j(x)>0, 02(x)?O for all xC[a, b]. D3 is the set with ?(x), ?1(x), ?2(X), ?3(X) continuous, and +1(x) > 04 2(x) O O for all xC[a, b]. Here +1(x) denotes the ith derivative of +(x). Hadar and Russell proved that Rule 1 is valid for all ,CD1 and Rutle 2 is valid for all ED2. The authors point out that the set of probability distributions that can be ordered by means of second-degree stochastic dominance is, in general, larger than that which can be ordered by means of first-degree stochastic dominance. Note that in Rule 2, they assume that +(x) is not only an increasing function of x but also exhibits weak global risk aversion, a condition guaranteed by requiring the second derivative of ?(x) to be nonpositive. In this paper, a condition which will be called third-degree stochastic dominance is considered. It is based on the following assumption about the form of the utility function ?(x). From a normative point of view, one expects the risk premium associated with an uncertain prospect to become smaller the greater is the individual's wealth. The plausibility and implications of this assumption h'ave been explored by John Pratt, as well as others. The risk premium of an uncertain prospect is that amount by which the certainty equivalent of the prospect differs from its expected value. In mathematical terms, given the prospect F(x) with expected value A, the corresponding risk premium -t is obtained by solving the following equation. rb

585 citations

Posted Content•
Unsupervised Cross-Domain Image Generation

[...]

Yaniv Taigman1, Adam Polyak2, Lior Wolf2•
Facebook1, Tel Aviv University2
07 Nov 2016-arXiv: Computer Vision and Pattern Recognition
TL;DR: Domain Transfer Network (DTN) as discussed by the authors employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves.
Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.

465 citations

Journal Article•10.1109/TCYB.2015.2492242•
Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems

[...]

Qinglai Wei1, Derong Liu2, Hanquan Lin1•
Chinese Academy of Sciences1, University of Science and Technology Beijing2
01 Mar 2016-IEEE Transactions on Systems, Man, and Cybernetics
TL;DR: In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms and it is emphasized that new termination criteria are established to guarantee the effectiveness of the iteration control laws.
Abstract: In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

443 citations

Proceedings Article•10.1109/CDC.2016.7798263•
Harnessing smoothness to accelerate distributed optimization

[...]

Guannan Qu1, Na Li1•
Harvard University1
1 Dec 2016
TL;DR: This paper proposes a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t) if the objective function is strongly convex and smooth.
Abstract: There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. Literature has developed consensus-based distributed (sub)gradient descent (DGD) methods and has shown that they have the same convergence rate O(log t/√t) as the centralized (sub)gradient methods (CGD) when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGD to achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.

432 citations

Journal Article•10.1109/TWC.2015.2467386•
Performance Analysis of Free-Space Optical Links Over Málaga ( $\mathcal{M} $ ) Turbulence Channels With Pointing Errors

[...]

Imran Shafique Ansari1, Ferkan Yilmaz, Mohamed-Slim Alouini1•
King Abdullah University of Science and Technology1
01 Jan 2016-IEEE Transactions on Wireless Communications
TL;DR: In this article, a unified performance analysis of a single-link free-space optical (FSO) link that accounts for pointing errors and both types of detection techniques is presented.
Abstract: In this work, we present a unified performance analysis of a free-space optical (FSO) link that accounts for pointing errors and both types of detection techniques [i.e., intensity modulation/direct detection (IM/DD) and heterodyne detection]. More specifically, we present unified exact closed-form expressions for the cumulative distribution function, the probability density function, the moment generating function, and the moments of the end-to-end signal-to-noise ratio (SNR) of a single link FSO transmission system, all in terms of the Meijer’s G function except for the moments that is in terms of simple elementary functions. We then capitalize on these unified results to offer unified exact closed-form expressions for various performance metrics of FSO link transmission systems, such as the outage probability, the scintillation index (SI), the average error rate for binary and $M$ -ary modulation schemes, and the ergodic capacity (except for IM/DD technique, where we present closed-form lower bound results), all in terms of Meijer’s G functions except for the SI that is in terms of simple elementary functions. Additionally, we derive the asymptotic results for all the expressions derived earlier in terms of Meijer’s G function in the high SNR regime in terms of simple elementary functions via an asymptotic expansion of the Meijer’s G function. We also derive new asymptotic expressions for the ergodic capacity in the low as well as high SNR regimes in terms of simple elementary functions via utilizing moments. All the presented results are verified via computer-based Monte-Carlo simulations.

384 citations

Proceedings Article•
High-Dimensional Continuous Control Using Generalized Advantage Estimation

[...]

John Schulman1, Philipp Moritz1, Sergey Levine1, Michael I. Jordan1, Pieter Abbeel1 •
University of California, Berkeley1
1 Jan 2016
TL;DR: This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias.
Abstract: Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.

367 citations

Journal Article•10.1109/JSTSP.2015.2505682•
Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

[...]

Jakub Konecny1, Jie Liu2, Peter Richtárik1, Martin Takáč2•
University of Edinburgh1, Lehigh University2
01 Mar 2016-IEEE Journal of Selected Topics in Signal Processing
TL;DR: It is proved that as long as b is below a certain threshold, the authors can reach any predefined accuracy with less overall work than without mini-batching, and is suitable for further acceleration by parallelization.
Abstract: We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the complexity of the method and show that it benefits from two speedup effects. First, we prove that as long as $b$ is below a certain threshold, we can reach any predefined accuracy with less overall work than without mini-batching. Second, our mini-batching scheme admits a simple parallel implementation, and hence is suitable for further acceleration by parallelization.

363 citations

Proceedings Article•10.1145/2976749.2978429•
Function Secret Sharing: Improvements and Extensions

[...]

Elette Boyle1, Niv Gilboa2, Yuval Ishai3•
Interdisciplinary Center Herzliya1, Ben-Gurion University of the Negev2, Technion – Israel Institute of Technology3
24 Oct 2016
TL;DR: In this article, a tensoring operation was introduced to obtain a conceptually simpler derivation of previous constructions and present new constructions for m-party FSS schemes, which are useful for applications that involve private reading from or writing to distributed databases while minimizing the amount of communication.
Abstract: Function Secret Sharing (FSS), introduced by Boyle et al. (Eurocrypt 2015), provides a way for additively secret-sharing a function from a given function family F. More concretely, an m-party FSS scheme splits a function f : {0, 1}n -> G, for some abelian group G, into functions f1,...,fm, described by keys k1,...,km, such that f = f1 + ... + fm and every strict subset of the keys hides f. A Distributed Point Function (DPF) is a special case where F is the family of point functions, namely functions f_{a,b} that evaluate to b on the input a and to 0 on all other inputs. FSS schemes are useful for applications that involve privately reading from or writing to distributed databases while minimizing the amount of communication. These include different flavors of private information retrieval (PIR), as well as a recent application of DPF for large-scale anonymous messaging. We improve and extend previous results in several ways: * Simplified FSS constructions. We introduce a tensoring operation for FSS which is used to obtain a conceptually simpler derivation of previous constructions and present our new constructions. * Improved 2-party DPF. We reduce the key size of the PRG-based DPF scheme of Boyle et al. roughly by a factor of 4 and optimize its computational cost. The optimized DPF significantly improves the concrete costs of 2-server PIR and related primitives. * FSS for new function families. We present an efficient PRG-based 2-party FSS scheme for the family of decision trees, leaking only the topology of the tree and the internal node labels. We apply this towards FSS for multi-dimensional intervals. We also present a general technique for extending FSS schemes by increasing the number of parties. * Verifiable FSS. We present efficient protocols for verifying that keys (k*/1,...,k*/m ), obtained from a potentially malicious user, are consistent with some f in F. Such a verification may be critical for applications that involve private writing or voting by many users.

353 citations

Muscles Testing And Function

[...]

Andrea Faber
1 Jan 2016
TL;DR: The muscles testing and function is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading muscles testing and function. Maybe you have knowledge that, people have search numerous times for their chosen books like this muscles testing and function, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they are facing with some harmful bugs inside their laptop. muscles testing and function is available in our digital library an online access to it is set as public so you can get it instantly. Our books collection saves in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the muscles testing and function is universally compatible with any devices to read.
Journal Article•10.1021/ACS.ACCOUNTS.6B00356•
Quantum embedding theories

[...]

Qiming Sun1, Garnet Kin-Lic Chan1•
California Institute of Technology1
08 Dec 2016-arXiv: Chemical Physics
TL;DR: In this article, the authors present a unified presentation of the different formalisms of density functional embedding, Green's function embedding and density matrix embeddings, and introduce the basic equations of all three formulations in such a way as to highlight their many common intellectual strands.
Abstract: In complex systems, it is often the case that the region of interest forms only one part of a much larger system. The idea of joining two different quantum simulations - a high level calculation on the active region of interest, and a low level calculation on its environment - formally defines a quantum embedding. While any combination of techniques constitutes an embedding, several rigorous formalisms have emerged that provide for exact feedback between the embedded system and its environment. These three formulations: it density functional embedding, Green's function embedding, and density matrix embedding, respectively use the single-particle density, single-particle Green's function, and single-particle density matrix as the quantum variables of interest. Many excellent reviews exist covering these methods individually. However, a unified presentation of the different formalisms is so far lacking. Indeed, the various languages commonly used: functional equations for density functional embedding; diagrammatics for Green's function embedding; and entanglement arguments for density matrix embedding, make the three formulations appear vastly different. In this account, we introduce the basic equations of all three formulations in such a way as to highlight their many common intellectual strands. While we focus primarily on a straightforward theoretical perspective, we also give a brief overview of recent applications, and possible future developments.
Journal Article•10.1016/J.APENERGY.2016.05.064•
A parameter extraction technique exploiting intrinsic properties of solar cells

[...]

Nhan Thanh Tong1, Wanchalerm Pora1•
Chulalongkorn University1
15 Aug 2016-Applied Energy
TL;DR: In this paper, a parameter extraction technique for the five-parameter solar-cell model is presented, which only requires the priori knowledge of three load points: the open circuit, the short circuit, and the maximum power points.
Journal Article•10.1109/TFUZZ.2015.2453020•
Preaggregation Functions: Construction and an Application

[...]

Giancarlo Lucca1, José Antonio Sanz1, Graçaliz Pereira Dimuro2, Benjamin Bedregal3, Radko Mesiar4, Anna Kolesárová4, Humberto Bustince1 •
Universidad Pública de Navarra1, Universidade Federal do Rio Grande do Sul2, Federal University of Rio Grande do Norte3, Slovak University of Technology in Bratislava4
01 Apr 2016-IEEE Transactions on Fuzzy Systems
TL;DR: This paper proposes three different methods to build preaggregation functions and experimentally shows that in fuzzy rule-based classification systems, the results obtained when applying the fuzzy reasoning methods obtained using two classical averaging operators such as the maximum and the Choquet integral are improved.
Abstract: In this paper, we introduce the notion of preaggregation function. Such a function satisfies the same boundary conditions as an aggregation function, but, instead of requiring monotonicity, only monotonicity along some fixed direction (directional monotonicity) is required. We present some examples of such functions. We propose three different methods to build preaggregation functions. We experimentally show that in fuzzy rule-based classification systems, when we use one of these methods, namely, the one based on the use of the Choquet integral replacing the product by other aggregation functions, if we consider the minimum or the Hamacher product t-norms for such construction, we improve the results obtained when applying the fuzzy reasoning methods obtained using two classical averaging operators such as the maximum and the Choquet integral.
Proceedings Article•10.1145/2933057.2933105•
Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms

[...]

Lili Su1, Nitin H. Vaidya1•
University of Illinois at Urbana–Champaign1
25 Jul 2016
TL;DR: This paper presents an iterative distributed algorithm that achieves optimal fault-tolerance, and ensures that at least |N|-f agents have weights that are bounded away from 0 (in particular, lower bounded by 1/2|N |-f}).
Abstract: This paper addresses the problem of distributed multi-agent optimization in which each agent i has a local cost function hi(x), and the goal is to optimize a global cost function that aggregates the local cost functions. Such optimization problems are of interest in many contexts, including distributed machine learning, distributed resource allocation, and distributed robotics.We consider the distributed optimization problem in the presence of faulty agents. We focus primarily on Byzantine failures, but also briey discuss some results for crash failures. For the Byzantine fault-tolerant optimization problem, the ideal goal is to optimize the average of local cost functions of the non-faulty agents. However, this goal also cannot be achieved. Therefore, we consider a relaxed version of the fault-tolerant optimization problem.The goal for the relaxed problem is to generate an output that is an optimum of a global cost function formed as a convex combination of local cost functions of the non-faulty agents. More precisely, there must exist weights αi for i∈N such that αi ≥ 0 and ∑i≥ Nαi=1, and the output is an optimum of the cost function ∑i≥ N αihi(x). Ideally, we would like αi=1/|N| for all i≥ N, however, this cannot be guaranteed due to the presence of faulty agents. In fact, the maximum number of nonzero weights (αi's) that can be guaranteed is |N|-f, where f is the maximum number of Byzantine faulty agents.We present an iterative distributed algorithm that achieves optimal fault-tolerance. Specifically, it ensures that at least |N|-f agents have weights that are bounded away from 0 (in particular, lower bounded by 1/2|N|-f}). The proposed distributed algorithm has a simple iterative structure, with each agent maintaining only a small amount of local state. We show that the iterative algorithm ensures two properties as time goes to ∞: consensus (i.e., output of non-faulty agents becomes identical in the time limit), and optimality (in the sense that the output is the optimum of a suitably defined global cost function).
Journal Article•10.1016/J.INS.2015.09.021•
Approximation with random bases

[...]

Alexander N. Gorban1, Ivan Tyukin2, Danil V. Prokhorov3, Konstantin Sofeikov1•
University of Leicester1, Saint Petersburg State University2, Toyota3
10 Oct 2016-Information Sciences
TL;DR: This work considers and analyze published procedures, both randomized and deterministic, for selecting elements from families of parameterized elementary functions that have been shown to ensure the rate of convergence in L2 norm of order O(1/N), where N is the number of elements.
Book•
Riemann–Hilbert Problems, their Numerical Solution, and the Computation of Nonlinear Special Functions

[...]

Thomas Trogdon, Sheehan Olver
17 Oct 2016
TL;DR: In this article, the applied theory of Riemann-Hilbert problems, using both Holder and Lebesgue spaces, is reviewed, and the numerical solution of RHPs is discussed.
Abstract: The computation of special functions has important implications throughout engineering and the physical sciences. Nonlinear special functions include the solutions of integrable partial differential equations and the Painleve transcendents. Many problems in water wave theory, nonlinear optics and statistical mechanics are reduced to the study of a nonlinear special function in particular limits. The universal object that these functions share is a Riemann – Hilbert representation: the nonlinear special function can be recovered from the solution of a Riemann-Hilbert problem (RHP). A RHP consists of finding a piecewise-analytic function in the complex plane when the behavior of its discontinuities is specified. In this dissertation, the applied theory of Riemann-Hilbert problems, using both Holder and Lebesgue spaces, is reviewed. The numerical solution of RHPs is discussed. Furthermore, the uniform approximation theory for the numerical solution of RHPs is presented, proving that in certain cases the convergence of the numerical method is uniform with respect to a parameter. This theory shares close relation to the method of nonlinear steepest descent for RHPs. The inverse scattering transform for the Korteweg – de Vries and Nonlinear Schroedinger equation is made effective by solving the associated RHPs numerically. This technique is extended to solve the Painleve II equation numerically. Similar Riemann-Hilbert techniques are used to compute the so-called finite-genus solutions of the Korteweg-de Vries equation. This involves ideas from Riemann surface theory. Finally, the methodology is applied to compute orthogonal polynomials with exponential weights. This allows for the computation of statistical quantities stemming from random matrix ensembles.
Proceedings Article•10.1145/2988450.2988456•
Hybrid Recommender System based on Autoencoders

[...]

Florian Strub1, Romaric Gaudel1, Jérémie Mary1•
university of lille1
24 Jun 2016-arXiv: Learning
TL;DR: This paper enhanced the architecture of Recommender Systems by using a loss function adapted to input data with missing values, and by incorporating side information, demonstrating that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/ items.
Abstract: A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings. In the last decades, few attempts where done to handle that objective with Neural Networks, but recently an architecture based on Autoencoders proved to be a promising approach. In current paper, we enhanced that architecture (i) by using a loss function adapted to input data with missing values, and (ii) by incorporating side information. The experiments demonstrate that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/items.
Proceedings Article•
Diverse Neural Network Learns True Target Functions

[...]

Bo Xie1, Yingyu Liang2, Le Song1•
Georgia Institute of Technology1, Princeton University2
9 Nov 2016
TL;DR: In this article, the authors show that neural networks with ReLU activation have no spurious local minima and saddle points, and that the loss can be made arbitrarily small if the minimum singular value of the "extended feature matrix" is large enough.
Abstract: Neural networks are a powerful class of functions that can be trained with simple gradient descent to achieve state-of-the-art performance on a variety of applications. Despite their practical success, there is a paucity of results that provide theoretical guarantees on why they are so effective. Lying in the center of the problem is the difficulty of analyzing the non-convex loss function with potentially numerous local minima and saddle points. Can neural networks corresponding to the stationary points of the loss function learn the true target function? If yes, what are the key factors contributing to such nice optimization properties? In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima. We bypass the non-convexity issue by directly analyzing the first order optimality condition, and show that the loss can be made arbitrarily small if the minimum singular value of the "extended feature matrix" is large enough. We make novel use of techniques from kernel methods and geometric discrepancy, and identify a new relation linking the smallest singular value to the spectrum of a kernel function associated with the activation function and to the diversity of the units. Our results also suggest a novel regularization function to promote unit diversity for potentially better generalization.
Journal Article•10.1016/J.CMA.2016.06.027•
Feature-driven topology optimization method with signed distance function

[...]

Ying Zhou1, Weihong Zhang1, Jihong Zhu1, Zhao Xu1•
Northwestern Polytechnical University1
01 Oct 2016-Computer Methods in Applied Mechanics and Engineering
TL;DR: This is the first study on layout design of multiple engineering features using level-set functions (LSFs) and Boolean operations and numerical examples are tested to demonstrate the validity and merits of the proposed feature-driven topology optimization for complicated design problems.
Proceedings Article•
Dropping Convexity for Faster Semi-definite Optimization

[...]

Srinadh Bhojanapalli1, Anastasios Kyrillidis2, Sujay Sanghavi2•
Toyota Technological Institute at Chicago1, University of Texas at Austin2
6 Jun 2016
TL;DR: This is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions and to provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle.
Abstract: We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semi-definite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r \leq n$. We study the performance of gradient descent on $g$---which we refer to as Factored Gradient Descent (FGD)---under standard assumptions on the original function $f$. We provide a rule for selecting the step size and, with this choice, show that the local convergence rate of FGD mirrors that of standard gradient descent on the original $f$: i.e., after $k$ steps, the error is $O(1/k)$ for smooth $f$, and exponentially small in $k$ when $f$ is (restricted) strongly convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to $f$ via a first-order oracle; for several problem instances, such proper initialization leads to global convergence guarantees. FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.
Posted Content•
Parallel Bayesian Global Optimization of Expensive Functions

[...]

Jialei Wang, Scott Clark, Eric Liu, Peter I. Frazier1•
Cornell University1
16 Feb 2016-arXiv: Machine Learning
TL;DR: This work considers parallel global optimization of derivative-free expensive-to-evaluate functions, and proposes an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007).
Abstract: We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement'', or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics Optimization Engine (MOE).
Journal Article•10.1016/J.JFRANKLIN.2016.02.013•
The recursive least squares identification algorithm for a class of Wiener nonlinear systems

[...]

Feng Ding1, Ximei Liu1, Manman Liu1•
Qingdao University of Science and Technology1
01 May 2016-Journal of The Franklin Institute-engineering and Applied Mathematics
TL;DR: This work is concerned with the identification of Wiener systems whose output nonlinear function is assumed to be continuous and invertible, and a recursive least squares algorithm is presented based on the auxiliary model identification idea.
Abstract: Many physical systems can be modeled by a Wiener nonlinear model, which consists of a linear dynamic system followed by a nonlinear static function. This work is concerned with the identification of Wiener systems whose output nonlinear function is assumed to be continuous and invertible. A recursive least squares algorithm is presented based on the auxiliary model identification idea. To solve the difficulty of the information vector including the unmeasurable variables, the unknown terms in the information vector are replaced with their estimates, which are computed through the preceding parameter estimates. Finally, an example is given to support the proposed method.
Journal Article•10.1088/0953-4075/49/22/224001•
Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes

[...]

Jie Cui, Roman V. Krems
24 Oct 2016-Journal of Physics B
TL;DR: In this article, Gaussian Process (GP) regression is used to construct multi-dimensional potential energy surfaces (PESs) for polyatomic molecules, using an example of the molecule N4.
Abstract: We explore the efficiency of a statistical learning technique based on Gaussian process (GP) regression as an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyatomic molecules. Using an example of the molecule N4, we show that a realistic GP model of the six-dimensional PES can be constructed with only 240 potential energy points. We construct a series of the GP models and illustrate the accuracy of the resulting surfaces as a function of the number of ab initio points. We show that the GP model based on ~1500 potential energy points achieves the same level of accuracy as the conventional regression fits based on 16 421 points. The GP model of the PES requires no fitting of ab initio data with analytical functions and can be readily extended to surfaces of higher dimensions.
Journal Article•10.1017/S0963548315000103•
On the Method of Typical Bounded Differences

[...]

Lutz Warnke1•
University of Cambridge1
01 Mar 2016-Combinatorics, Probability & Computing
TL;DR: A variant of the bounded differences inequality which can be used to establish concentration of functions f(X) where (i) the typical changes are small, although (ii) the worst case changes might be very large, is proved.
Abstract: Concentration inequalities are fundamental tools in probabilistic combinatorics and theoretical computer science for proving that functions of random variables are typically near their means. Of particular importance is the case where f(X) is a function of independent random variables X = (X 1, . . ., Xn ). Here the well-known bounded differences inequality (also called McDiarmid's inequality or the Hoeffding–Azuma inequality) establishes sharp concentration if the function f does not depend too much on any of the variables. One attractive feature is that it relies on a very simple Lipschitz condition (L): it suffices to show that |f(X) − f(X′)| ⩽ ck whenever X, X′ differ only in Xk . While this is easy to check, the main disadvantage is that it considers worst-case changes ck , which often makes the resulting bounds too weak to be useful. In this paper we prove a variant of the bounded differences inequality which can be used to establish concentration of functions f(X) where (i) the typical changes are small, although (ii) the worst case changes might be very large. One key aspect of this inequality is that it relies on a simple condition that (a) is easy to check and (b) coincides with heuristic considerations as to why concentration should hold. Indeed, given an event Γ that holds with very high probability, we essentially relax the Lipschitz condition (L) to situations where Γ occurs. The point is that the resulting typical changes ck are often much smaller than the worst case ones. To illustrate its application we consider the reverse H-free process, where H is 2-balanced. We prove that the final number of edges in this process is concentrated, and also determine its likely value up to constant factors. This answers a question of Bollobas and Erdős.
Journal Article•10.1371/JOURNAL.PONE.0159148•
Measuring Spatial Accessibility of Health Care Providers - Introduction of a Variable Distance Decay Function within the Floating Catchment Area (FCA) Method.

[...]

Jan Michael Bauer1, David A. Groneberg1•
Goethe University Frankfurt1
08 Jul 2016-PLOS ONE
TL;DR: This work introduced for the first time, a variable distance decay function within an integrated FCA method that inherits effective variable catchment sizes and therefore obviates the need for determining variable catchments sizes separately.
Abstract: We integrated recent improvements within the floating catchment area (FCA) method family into an integrated ‘iFCA`method. Within this method we focused on the distance decay function and its parameter. So far only distance decay functions with constant parameters have been applied. Therefore, we developed a variable distance decay function to be used within the FCA method. We were able to replace the impedance coefficient β by readily available distribution parameter (i.e. median and standard deviation (SD)) within a logistic based distance decay function. Hence, the function is shaped individually for every single population location by the median and SD of all population-to-provider distances within a global catchment size. Theoretical application of the variable distance decay function showed conceptually sound results. Furthermore, the existence of effective variable catchment sizes defined by the asymptotic approach to zero of the distance decay function was revealed, satisfying the need for variable catchment sizes. The application of the iFCA method within an urban case study in Berlin (Germany) confirmed the theoretical fit of the suggested method. In summary, we introduced for the first time, a variable distance decay function within an integrated FCA method. This function accounts for individual travel behaviors determined by the distribution of providers. Additionally, the function inherits effective variable catchment sizes and therefore obviates the need for determining variable catchment sizes separately.
Journal Article•10.3847/0004-637X/824/1/10•
The next generation virgo cluster survey (ngvs). xiii. the luminosity and mass function of galaxies in the core of the virgo cluster and the contribution from disrupted satellites

[...]

Laura Ferrarese1, Patrick Côté1, Rubén Sánchez-Janssen1, Joel Roediger1, Alan W. McConnachie1, Patrick R. Durrell2, Lauren A. MacArthur3, Lauren A. MacArthur1, John P. Blakeslee1, Pierre-Alain Duc4, Samuel Boissier5, Alessandro Boselli5, Stéphane Courteau6, Jean-Charles Cuillandre4, Eric Emsellem7, Stephen Gwyn1, Puragra Guhathakurta8, Andrés Jordán9, Ariane Lançon10, Chengze Liu11, Simona Mei12, Simona Mei13, J. Christopher Mihos14, Julio F. Navarro15, Eric W. Peng16, Thomas H. Puzia9, James E. Taylor17, Elisa Toloba18, Elisa Toloba8, Hongxin Zhang9, Hongxin Zhang16 •
National Research Council1, Youngstown State University2, Princeton University3, Paris Diderot University4, Aix-Marseille University5, Queen's University6, European Southern Observatory7, University of California, Santa Cruz8, Pontifical Catholic University of Chile9, University of Strasbourg10, Shanghai Jiao Tong University11, Centre national de la recherche scientifique12, University of Paris13, Case Western Reserve University14, University of Victoria15, Peking University16, University of Waterloo17, Texas Tech University18
03 Jun 2016-The Astrophysical Journal
TL;DR: In this paper, the authors present measurements of the galaxy luminosity and stellar mass function in a 3.71 deg$^2$ area in the core of the Virgo cluster, based on data from the Next Generation Virgo Cluster Survey (NGVS).
Abstract: We present measurements of the galaxy luminosity and stellar mass function in a 3.71 deg$^2$ (0.3 Mpc$^2$) area in the core of the Virgo cluster, based on $ugriz$ data from the Next Generation Virgo Cluster Survey (NGVS). The galaxy sample consists of 352 objects brighter than $M_g=-9.13$ mag, the 50% completeness limit of the survey. Using a Bayesian analysis, we find a best-fit faint end slope of $\\alpha=-1.33 \\pm 0.02$ for the g-band luminosity function; consistent results are found for the stellar mass function as well as the luminosity function in the other four NGVS bandpasses. We discuss the implications for the faint-end slope of adding 92 ultra compact dwarfs galaxies (UCDs) -- previously compiled by the NGVS in this region -- to the galaxy sample, assuming that UCDs are the stripped remnants of nucleated dwarf galaxies. Under this assumption, the slope of the luminosity function (down to the UCD faint magnitude limit, $M_g = -9.6$ mag) increases dramatically, up to $\\alpha = -1.60 \\pm 0.06$ when correcting for the expected number of disrupted non-nucleated galaxies. We also calculate the total number of UCDs and globular clusters that may have been deposited in the core of Virgo due to the disruption of satellites, both nucleated and non-nucleated. We estimate that ~150 objects with $M_g\\lesssim-9.6$ mag and that are currently classified as globular clusters, might, in fact, be the nuclei of disrupted galaxies. We further estimate that as many as 40% of the (mostly blue) globular clusters in the core of Virgo might once have belonged to such satellites; these same disrupted satellites might have contributed ~40% of the total luminosity in galaxies observed in the core region today. Finally, we use an updated Local Group galaxy catalog to provide a new measurement of the luminosity function of Local Group satellites, $\\alpha=-1.21\\pm0.05$.
Proceedings Article•
l 1 -regularized neural networks are improperly learnable in polynomial time

[...]

Yuchen Zhang1, Jason D. Lee1, Michael I. Jordan1•
University of California, Berkeley1
19 Jun 2016
TL;DR: A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time.
Abstract: We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the l1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/e, log(1/δ), F(k, L)), where F(k, L) is a function depending on (k, L) and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.
Journal Article•10.1140/EPJC/S10052-016-3895-1•
Renormalization group equation and scaling solutions for f(R) gravity in exponential parametrization

[...]

Nobuyoshi Ohta1, Roberto Percacci2, Gian Paolo Vacca•
Kindai University1, International School for Advanced Studies2
27 Jan 2016-European Physical Journal C
TL;DR: In this article, the authors employ the exponential parametrization of the metric and a physical gauge fixing procedure to write a functional flow equation for the gravitational effective average action in an f(R) truncation.
Abstract: We employ the exponential parametrization of the metric and a “physical” gauge fixing procedure to write a functional flow equation for the gravitational effective average action in an f(R) truncation. The background metric is a four-sphere and the coarse-graining procedure contains three free parameters. We look for scaling solutions, i.e. non-Gaussian fixed points for the function f. For a discrete set of values of the parameters, we find simple global solutions of quadratic polynomial form. For other values, global solutions can be found numerically. Such solutions can be extended in certain regions of parameter space and have two relevant directions. We discuss the merits and the shortcomings of this procedure.
Journal Article•10.1109/TNNLS.2015.2471262•
A Unified Approach to Adaptive Neural Control for Nonlinear Discrete-Time Systems With Nonlinear Dead-Zone Input

[...]

Yan-Jun Liu1, Ying Gao1, Shaocheng Tong1, C. L. Philip Chen2•
Liaoning University of Technology1, University of Macau2
01 Jan 2016-IEEE Transactions on Neural Networks
TL;DR: An effective adaptive control approach is constructed to stabilize a class of nonlinear discrete-time systems, which contain unknown functions, unknown dead-zone input, and unknown control direction, and the neural networks are used to approximate the unknown function.
Abstract: In this paper, an effective adaptive control approach is constructed to stabilize a class of nonlinear discrete-time systems, which contain unknown functions, unknown dead-zone input, and unknown control direction. Different from linear dead zone, the dead zone, in this paper, is a kind of nonlinear dead zone. To overcome the noncausal problem, which leads to the control scheme infeasible, the systems can be transformed into a $m$ -step-ahead predictor. Due to nonlinear dead-zone appearance, the transformed predictor still contains the nonaffine function. In addition, it is assumed that the gain function of dead-zone input and the control direction are unknown. These conditions bring about the difficulties and the complicacy in the controller design. Thus, the implicit function theorem is applied to deal with nonaffine dead-zone appearance, the problem caused by the unknown control direction can be resolved through applying the discrete Nussbaum gain, and the neural networks are used to approximate the unknown function. Based on the Lyapunov theory, all the signals of the resulting closed-loop system are proved to be semiglobal uniformly ultimately bounded. Moreover, the tracking error is proved to be regulated to a small neighborhood around zero. The feasibility of the proposed approach is demonstrated by a simulation example.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve