TL;DR: It is proven that the fourth measure, called relative neighborhood self-information, is better for feature selection than the other measures, because not only does it consider both the lower and the upper approximations but also the change of its magnitude is largest with the variation of feature subsets.
Abstract: The concept of dependency in a neighborhood rough set model is an important evaluation function for the feature selection. This function considers only the classification information contained in the lower approximation of the decision while ignoring the upper approximation. In this paper, we construct a class of uncertainty measures: decision self-information for the feature selection. These measures take into account the uncertainty information in the lower and the upper approximations. The relationships between these measures and their properties are discussed in detail. It is proven that the fourth measure, called relative neighborhood self-information, is better for feature selection than the other measures, because not only does it consider both the lower and the upper approximations but also the change of its magnitude is largest with the variation of feature subsets. This helps to facilitate the selection of optimal feature subsets. Finally, a greedy algorithm for feature selection has been designed and a series of numerical experiments was carried out to verify the effectiveness of the proposed algorithm. The experimental results show that the proposed algorithm often chooses fewer features and improves the classification accuracy in most cases.
TL;DR: In this article, the authors proposed a measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level, and used this measure to derive constraints which are used to obtain a maximum entropy distribution.
Abstract: The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables Xi. It can be thought of as quantifying overlapping information content or similarities in the representation of S between the Xi. We present a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level. We provide a game-theoretic operational definition of unique information, and use this to derive constraints which are used to obtain a maximum entropy distribution. Redundancy is then calculated from this maximum entropy distribution by counting only those local co-information terms which admit an unambiguous interpretation as redundant information. We show how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions. We compare our new measure to existing approaches over a range of example systems, including continuous Gaussian variables. Matlab code for the measure is provided, including all considered examples.
TL;DR: The major goal of this research is to develop general nonparametric methods for the estimation of entropy and mutual information, giving a unifying point of view for their use in signal processing and neural computation.
Abstract: The major goal of this research is to develop general nonparametric methods for the estimation of entropy and mutual information, giving a unifying point of view for their use in signal processing and neural computation. In many real world problems, the information is carried solely by data samples without any other a priori knowledge. The central issue of “learning from examples” is to estimate energy, entropy or mutual information of a variable only from its samples and adapt the system parameters by optimizing a criterion based on the estimation.
By using alternative entropy measures such as Renyi's quadratic entropy, coupled with the Parzen window estimation of the probability density function for data samples, we developed an “information potential” method for entropy estimation. In this method, data samples are treated as physical particles and the entropy turns out to be related to the potential energy of these “information particles.” The entropy maximization or minimization is then equivalent to the minimization or the maximization of the “information potential.” Based on the Cauchy-Schwartz inequality and the Euclidean distance metric, we further proposed the quadratic mutual information as an alternative to Shannon's mutual information. There is also a “cross information potential” implementation for the quadratic mutual information that measures the correlation between the “marginal information potentials” at several levels. “Learning from examples” at the output of a mapper by the “information potential” or the “cross information potential” is implemented by propagating the “information force” or the “cross information force” back to the system parameters. Since the criteria are decoupled from the structure of learning machines, they are general learning schemes. The “information potential” and the “cross information potential” provide a microscopic expression for the macroscopic measure of the entropy and mutual information at the data sample level. The algorithms examine the relative position of each data pair and thus have a computational complexity of O(N2).
An on-line local algorithm for learning is also discussed, where the energy field is related to the famous biological Hebbian and anti-Hebbian learning rules. Based on this understanding, an on-line local algorithm for the generalized eigendecomposition is proposed.
The information potential methods have been successfully applied to various problems such as aspect angle estimation in synthetic aperture radar (SAR) imagery, target recognition in SAR imagery, layer-by-layer training of multilayer neural networks and blind source separation. The good performance of the methods on various problems confirms the validity and efficiency of the information potential methods.
TL;DR: In this article, the number of entropy data bits needed to satisfy a predetermined security strength of the cryptographic operation is estimated based on the entropy strength of a string of entropy bits, which is a measure of randomness.
Abstract: A seed for use in a cryptographic operation for an electronic device is determined by estimating the number of entropy data bits needed to satisfy a predetermined security strength of the cryptographic operation. The estimation is based on an entropy strength of a string of entropy data bits. Entropy strength is a measure of randomness. Furthermore, guiding a determination of the seed differently according to the estimated number of entropy data bits may be performed.