TL;DR: This study aims to investigate the impact of fourteen data normalization methods on classification performance considering full feature set, feature selection, and feature weighting and suggests a set of the best and the worst methods combining the normalization procedure and empirical analysis of results.
TL;DR: In this paper, the authors show that layer normalization is crucial to the performance of pre-LN Transformers and remove the warm-up stage for the training of Pre-LNs.
Abstract: The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.
TL;DR: Extended experiments show that BNNeck can boost the baseline, and the baseline can improve the performance of existing state-of-the-art methods.
Abstract: This study proposes a simple but strong baseline for deep person re-identification (ReID). Deep person ReID has achieved great progress and high performance in recent years. However, many state-of-the-art methods design complex network structures and concatenate multi-branch features. In the literature, some effective training tricks briefly appear in several papers or source codes. The present study collects and evaluates these effective training tricks in person ReID. By combining these tricks, the model achieves 94.5% rank-1 and 85.9% mean average precision on Market1501 with only using the global features of ResNet50. The performance surpasses all existing global- and part-based baselines in person ReID. We propose a novel neck structure named as batch normalization neck (BNNeck). BNNeck adds a batch normalization layer after global pooling layer to separate metric and classification losses into two different feature spaces because we observe they are inconsistent in one embedding space. Extended experiments show that BNNeck can boost the baseline, and our baseline can improve the performance of existing state-of-the-art methods. Our codes and models are available at: https://github.com/michuanhaohao/reid-strong-baseline
TL;DR: Semantic Region Adaptive Normalization (SEAN) as mentioned in this paper is a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image.
Abstract: We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.
TL;DR: Ordered Quantile (ORQ) normalization is introduced, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution.
Abstract: Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be ada...
TL;DR: The aim of this paper is to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains, and to enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features.
Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.
TL;DR: The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increaseIn prediction time (important for constrained environments, such as smartphones and low-powered IoT devices) and showed that a non-adaptive optimizer can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning.
Abstract: Overfitting and long training time are two fundamental challenges in multilayered neural network learning and deep learning in particular. Dropout and batch normalization are two well-recognized approaches to tackle these challenges. While both approaches share overlapping design principles, numerous research results have shown that they have unique strengths to improve deep learning. Many tools simplify these two approaches as a simple function call, allowing flexible stacking to form deep learning architectures. Although their usage guidelines are available, unfortunately no well-defined set of rules or comprehensive studies to investigate them concerning data input, network configurations, learning efficiency, and accuracy. It is not clear when users should consider using dropout and/or batch normalization, and how they should be combined (or used alternatively) to achieve optimized deep learning outcomes. In this paper we conduct an empirical study to investigate the effect of dropout and batch normalization on training deep learning models. We use multilayered dense neural networks and convolutional neural networks (CNN) as the deep learning models, and mix dropout and batch normalization to design different architectures and subsequently observe their performance in terms of training and test CPU time, number of parameters in the model (as a proxy for model size), and classification accuracy. The interplay between network structures, dropout, and batch normalization, allow us to conclude when and how dropout and batch normalization should be considered in deep learning. The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increase in prediction time (important for constrained environments, such as smartphones and low-powered IoT devices). It showed that a non-adaptive optimizer (e.g. SGD) can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning, while an adaptive optimizer (e.g. RMSProp) performs well without much tuning. Finally, it showed that dropout and batch normalization should be used in CNNs only with caution and experimentation (when in doubt and short on time to experiment, use only batch normalization).
TL;DR: SkewScout is presented, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions and it is shown that group normalization can recover much of the accuracy loss of batch normalization.
Abstract: Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.
TL;DR: Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values.
Abstract: Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
TL;DR: Experiments on benchmark datasets demonstrate that the family of new loss functions created by the APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.
Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels Whilst new loss functions have been designed, they are only partially robust In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs By investigating several robust loss functions, we find that they suffer from a problem of underfitting To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL) APL combines two robust loss functions that mutually boost each other Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels
TL;DR: A generally applicable transformation unit for visual recognition with deep convolutional neural networks that explicitly models channel relationships with explainable control variables and is applicable to operator-level without much increase of additional parameters.
Abstract: In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.
TL;DR: This paper depicts the improvement in predictive accuracies with the help of normalization techniques and various criteria needed to achieve such data normalization are described.
Abstract: Recent developments in analytical technologies helped in developing applications for real-time problems faced by industries. These applications are often found to consume more time in the training phase. To reduce this pre-treatment of data in the training phase is pointed out to be an appropriate methodology. Normalization is the best technique for pre-processing data before the training phase of application. Normalization using metrics with criteria is found to be very important to attain good predictive results with less amount of time. This paper depicts the improvement in predictive accuracies with the help of normalization techniques. Various criteria needed to achieve such data normalization are also described. In this paper, will be having a glance on three different machine learning classifier i.e., Radial SVM, KNN, Sigmoid SVM and seven different standardization techniques i.e., StandardScaler, Scale, RobustScaler, QuantileTransform, PowerTransform, MinMaxS caler and MaxAbsS caler.
TL;DR: The proposed Dynamic Instance Normalization (DIN) provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.
Abstract: Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way. Such manually-defined nature eventually results in the high-cost and shared encoders for both style and content encoding, making style transfer systems cumbersome to be deployed in resource-constrained environments like on the mobile-terminal side. In this paper, we propose a new and generalized normalization module, termed as Dynamic Instance Normalization (DIN), that allows for flexible and more efficient arbitrary style transfers. Comprising an instance normalization and a dynamic convolution, DIN encodes a style image into learnable convolution parameters, upon which the content image is stylized. Unlike conventional methods that use shared complex encoders to encode content and style, the proposed DIN introduces a sophisticated style encoder, yet comes with a compact and lightweight content encoder for fast inference. Experimental results demonstrate that the proposed approach yields very encouraging results on challenging style patterns and, to our best knowledge, for the first time enables an arbitrary style transfer using MobileNet-based lightweight architecture, leading to a reduction factor of more than twenty in computational cost as compared to existing approaches. Furthermore, the proposed DIN provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.
TL;DR: This matrix, developed by the Consensus for Experimental Design in Electromyography (CEDE) project, presents six approaches to EMG normalization and general considerations for normalization, features that should be reported, definitions, and "pros and cons" of each normalization approach are presented.
TL;DR: It is shown that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness and combining the two further improves performance, and has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
Abstract: Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before prediction time This interesting observation enables a simple but surprisingly effective method which we call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 6028\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness (eg deep ensembles) and combining the two further improves performance Our findings are supported by detailed measurements of the effect of this strategy on model behavior across rigorous ablations on various dataset modalities However, the method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift, and is therefore worthy of additional study We include links to the data in our figures to improve reproducibility, including a Python notebooks that can be run to easily modify our analysis at this https URL
TL;DR: In this article, Zhang et al. proposed a Geometry-aware self-attention (GSA) to explicitly and efficiently consider the relative geometry relations between the objects in the image.
Abstract: Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method and demonstrate that it is both possible and beneficial to perform it on the hidden activations inside SA. Second, to compensate for the major limit of Transformer that it fails to model the geometry structure of the input objects, we propose a class of Geometry-aware Self-Attention (GSA) that extends SA to explicitly and efficiently consider the relative geometry relations between the objects in the image. To construct our image captioning model, we combine the two modules and apply it to the vanilla self-attention network. We extensively evaluate our proposals on MS-COCO image captioning dataset and superior results are achieved when comparing to state-of-the-art approaches. Further experiments on three challenging tasks, i.e. video captioning, machine translation, and visual question answering, show the generality of our methods.
TL;DR: It is shown that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and a spatial region-wise normalization named Region Normalization (RN) is proposed to overcome the limitation.
Abstract: Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.
TL;DR: In this paper, the authors proposed a multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains, which can enhance the generalizability of the learned model.
Abstract: We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain. For each domain, the activations are normalized by a weighted average of multiple normalization statistics. The normalization statistics are kept track of separately for each normalization type if necessary. Specifically, we employ batch and instance normalizations in our implementation to identify the best combination of these two normalization methods in each domain. The optimized normalization layers are effective to enhance the generalizability of the learned model. We demonstrate the state-of-the-art accuracy of our algorithm in the standard domain generalization benchmarks, as well as viability to further tasks such as multi-source domain adaptation and domain generalization in the presence of label noise.
TL;DR: A temporal Batch Normalization Through Time (BNTT) technique is proposed and it is found that varying the BN parameters at every time-step allows the model to learn the time-varying input distribution better.
Abstract: Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, training high-accuracy and low-latency SNNs from scratch suffers from non-differentiable nature of a spiking neuron. To address this training issue in SNNs, we revisit batch normalization and propose a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. Different from previous works, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. The temporally evolving learnable parameters in BNTT allow a neuron to control its spike rate through different time-steps, enabling low-latency and low-energy training from scratch. We conduct experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and event-driven DVS-CIFAR10 datasets. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. We also propose an early exit algorithm using the distribution of parameters in BNTT to reduce the latency at inference, that further improves the energy-efficiency.
TL;DR: In this article, the authors adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set.
Abstract: Real-noise denoising is a challenging task because the statistics of real-noise do not follow the normal distribution, and they are also spatially and temporally changing. In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme. Specifically, we adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set. We also introduce a transfer learning scheme that transfers knowledge learned from synthetic-noise data to the real-noise denoiser. From the proposed transfer learning, the synthetic-noise denoiser can learn general features from various synthetic-noise data, and the real-noise denoiser can learn the real-noise characteristics from real data. From the experiments, we find that the proposed denoising method has great generalization ability, such that our network trained with synthetic-noise achieves the best performance for Darmstadt Noise Dataset (DND) among the methods from published papers. We can also see that the proposed transfer learning scheme robustly works for real-noise images through the learning with a very small number of labeled data.
TL;DR: NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools.
Abstract: Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.
TL;DR: This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches, which compared favorably to standard normalization procedures.
TL;DR: A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.
Abstract: Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.
TL;DR: The first time such a rate schedule has been successfully used, let alone for highly successful architectures, is suggested, and as expected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization.
Abstract: Intriguing empirical evidence exists that deep learning can work well with exotic schedules for varying the learning rate. This paper suggests that the phenomenon may be due to Batch Normalization or BN(Ioffe & Szegedy, 2015), which is ubiq- uitous and provides benefits in optimization and generalization across all standard architectures. The following new results are shown about BN with weight decay and momentum (in other words, the typical use case which was not considered in earlier theoretical analyses of stand-alone BN (Ioffe & Szegedy, 2015; Santurkar et al., 2018; Arora et al., 2018)
• Training can be done using SGD with momentum and an exponentially in- creasing learning rate schedule, i.e., learning rate increases by some (1 + α) factor in every epoch for some α > 0. (Precise statement in the paper.) To the best of our knowledge this is the first time such a rate schedule has been successfully used, let alone for highly successful architectures. As ex- pected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization.
• Mathematical explanation of the success of the above rate schedule: a rigor- ous proof that it is equivalent to the standard setting of BN + SGD + Standard Rate Tuning + Weight Decay + Momentum. This equivalence holds for other normalization layers as well, Group Normalization(Wu & He, 2018), Layer Normalization(Ba et al., 2016), Instance Norm(Ulyanov et al., 2016), etc.
• A worked-out toy example illustrating the above linkage of hyper- parameters. Using either weight decay or BN alone reaches global minimum, but convergence fails when both are used.
TL;DR: A comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types is developed, which focuses on using probabilistic linguistic term sets to express the qualitative evaluations.
Abstract: This paper develops a comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types. We focus on using probabilistic linguistic term sets to express the qualitative evaluations due to their excellence in expressing complex individual and collective linguistic assessments. Firstly, we develop a target-based linear normalization technique and a target-based vector normalization technique. A weight adjustment method is proposed to achieve the tradeoff between criteria after normalization. Given that the two target-based normalization techniques have different advantages, we then propose a ranking method, which consists three subordinate models, based on these two target-based normalization approaches and three aggregation techniques. Reliable results of a multi-expert multi-criteria decision making problem are determined by integrating the subordinate utility values and the ranks of alternatives. The proposed method is implemented to solve the green enterprise ranking problems and the excavation scheme selection problem for shallow buried tunnels, respectively. The advantages of the proposed method are emphasized through comparative analyses with other ranking methods.
TL;DR: DGN is introduced, which normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue.
Abstract: Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.
TL;DR: In this article, the authors propose an attention-based normalization of the feature maps of a permutation-equivariant network to find the essential data points in high-dimensional space to solve a given task.
Abstract: Many problems in computer vision require dealing with sparse, unordered data in the form of point clouds. Permutation-equivariant networks have become a popular solution – they operate on individual data points with simple perceptrons and extract contextual information with global pooling. This can be achieved with a simple normalization of the feature maps, a global operation that is unaffected by the order. In this paper, we propose Attentive Context Normalization (ACN), a simple yet effective technique to build permutation-equivariant networks robust to outliers. Specifically, we show how to normalize the feature maps with weights that are estimated within the network, excluding outliers from this normalization. We use this mechanism to leverage two types of attention: local and global – by combining them, our method is able to find the essential data points in high-dimensional space in order to solve a given task. We demonstrate through extensive experiments that our approach, which we call Attentive Context Networks (ACNe), provides a significant leap in performance compared to the state-of-the-art on camera pose estimation, robust fitting, and point cloud classification under noise and outliers. Source code: https://github.com/vcg-uvic/acne.
TL;DR: A principled normalization method, Graph Normalization (GraphNorm), where the key idea is to normalize the feature values across all nodes for each individual graph with a learnable shift, which improves generalization of GNNs, achieving better performance on graph classification benchmarks.
Abstract: Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.
TL;DR: A new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients, to achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation.
Abstract: Determining criteria weight coefficients is a crucial step in multi-criteria decision making models. Therefore, this problem is given great attention in literature. This paper presents a new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients. Modifying the CRITIC method (CRITIC-M) entails changing the element normalization process of the initial decision matrix and changing data aggregation from the normalized decision matrix. By introducing a new normalization process, we achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation. Thus, the relationships between data in the initial decision matrix are presented in a more objective way. By introducing a new process of aggregation of weight coefficient values in the CRITIC-M method, a more comprehensive understanding of data in the initial decision matrix is made possible, leading to more objective values of weight coefficients. The presented CRITIC-M method has been tested in two examples, followed by a discussion of results via comparison to the classic CRITIC method.