Top 846 papers published in the topic of Normalization (statistics) in 2020

Showing papers on "Normalization (statistics) published in 2020"

10.5281/ZENODO.5138159•

Advanced Normalization Tools (ANTs)

[...]

Brian B. Avants, Nicholas J. Tustison, Hans J. Johnson

20 Dec 2020

1,389 citations

Journal Article•10.1016/J.ASOC.2019.105524•

Investigating the impact of data normalization on classification performance

[...]

Dalwinder Singh¹, Birmohan Singh¹•Institutions (1)

Sant Longowal Institute of Engineering and Technology¹

01 Dec 2020-Applied Soft Computing

TL;DR: This study aims to investigate the impact of fourteen data normalization methods on classification performance considering full feature set, feature selection, and feature weighting and suggests a set of the best and the worst methods combining the normalization procedure and empirical analysis of results.

...read moreread less

1,172 citations

Posted Content•

On Layer Normalization in the Transformer Architecture

[...]

Ruibin Xiong¹, Yunchang Yang², Di He², Kai Zheng², Shuxin Zheng³, Chen Xing⁴, Huishuai Zhang³, Yanyan Lan, Liwei Wang², Tie-Yan Liu³ - Show less +6 more•Institutions (4)

Chinese Academy of Sciences¹, Peking University², Microsoft³, Nankai University⁴

12 Feb 2020-arXiv: Learning

TL;DR: In this paper, the authors show that layer normalization is crucial to the performance of pre-LN Transformers and remove the warm-up stage for the training of Pre-LNs.

...read moreread less

Abstract: The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.

...read moreread less

761 citations

Journal Article•10.1109/TMM.2019.2958756•

A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification

[...]

Hao Luo¹, Wei Jiang¹, Youzhi Gu¹, Fuxu Liu, Xingyu Liao², Shenqi Lai³, Jianyang Gu¹ - Show less +3 more•Institutions (3)

Zhejiang University¹, Chinese Academy of Sciences², Xi'an Jiaotong University³

01 Oct 2020-IEEE Transactions on Multimedia

TL;DR: Extended experiments show that BNNeck can boost the baseline, and the baseline can improve the performance of existing state-of-the-art methods.

...read moreread less

Abstract: This study proposes a simple but strong baseline for deep person re-identification (ReID). Deep person ReID has achieved great progress and high performance in recent years. However, many state-of-the-art methods design complex network structures and concatenate multi-branch features. In the literature, some effective training tricks briefly appear in several papers or source codes. The present study collects and evaluates these effective training tricks in person ReID. By combining these tricks, the model achieves 94.5% rank-1 and 85.9% mean average precision on Market1501 with only using the global features of ResNet50. The performance surpasses all existing global- and part-based baselines in person ReID. We propose a novel neck structure named as batch normalization neck (BNNeck). BNNeck adds a batch normalization layer after global pooling layer to separate metric and classification losses into two different feature spaces because we observe they are inconsistent in one embedding space. Extended experiments show that BNNeck can boost the baseline, and our baseline can improve the performance of existing state-of-the-art methods. Our codes and models are available at: https://github.com/michuanhaohao/reid-strong-baseline

...read moreread less

625 citations

Proceedings Article•10.1109/CVPR42600.2020.00515•

SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

[...]

Peihao Zhu¹, Rameen Abdal¹, Yipeng Qin, Peter Wonka¹•Institutions (1)

King Abdullah University of Science and Technology¹

14 Jun 2020

TL;DR: Semantic Region Adaptive Normalization (SEAN) as mentioned in this paper is a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image.

...read moreread less

Abstract: We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.

...read moreread less

588 citations

Journal Article•10.1080/02664763.2019.1630372•

Ordered quantile normalization: a semiparametric transformation built for the cross-validation era

[...]

Ryan A. Peterson¹, Ryan A. Peterson², Joseph E. Cavanaugh²•Institutions (2)

Anschutz Medical Campus¹, University of Iowa²

17 Nov 2020-Journal of Applied Statistics

TL;DR: Ordered Quantile (ORQ) normalization is introduced, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution.

...read moreread less

Abstract: Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be ada...

...read moreread less

523 citations

Proceedings Article•10.1109/CVPR42600.2020.00321•

Style Normalization and Restitution for Generalizable Person Re-Identification

[...]

Xin Jin¹, Cuiling Lan², Wenjun Zeng², Zhibo Chen¹, Li Zhang³ - Show less +1 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², University of Oxford³

14 Jun 2020

TL;DR: The aim of this paper is to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains, and to enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features.

...read moreread less

Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.

...read moreread less

482 citations

Journal Article•10.1007/S11042-019-08453-9•

Dropout vs. batch normalization: an empirical study of their impact to deep learning

[...]

Christian Garbin¹, Xingquan Zhu¹, Oge Marques¹•Institutions (1)

Florida Atlantic University¹

01 May 2020-Multimedia Tools and Applications

TL;DR: The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increaseIn prediction time (important for constrained environments, such as smartphones and low-powered IoT devices) and showed that a non-adaptive optimizer can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning.

...read moreread less

Abstract: Overfitting and long training time are two fundamental challenges in multilayered neural network learning and deep learning in particular. Dropout and batch normalization are two well-recognized approaches to tackle these challenges. While both approaches share overlapping design principles, numerous research results have shown that they have unique strengths to improve deep learning. Many tools simplify these two approaches as a simple function call, allowing flexible stacking to form deep learning architectures. Although their usage guidelines are available, unfortunately no well-defined set of rules or comprehensive studies to investigate them concerning data input, network configurations, learning efficiency, and accuracy. It is not clear when users should consider using dropout and/or batch normalization, and how they should be combined (or used alternatively) to achieve optimized deep learning outcomes. In this paper we conduct an empirical study to investigate the effect of dropout and batch normalization on training deep learning models. We use multilayered dense neural networks and convolutional neural networks (CNN) as the deep learning models, and mix dropout and batch normalization to design different architectures and subsequently observe their performance in terms of training and test CPU time, number of parameters in the model (as a proxy for model size), and classification accuracy. The interplay between network structures, dropout, and batch normalization, allow us to conclude when and how dropout and batch normalization should be considered in deep learning. The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increase in prediction time (important for constrained environments, such as smartphones and low-powered IoT devices). It showed that a non-adaptive optimizer (e.g. SGD) can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning, while an adaptive optimizer (e.g. RMSProp) performs well without much tuning. Finally, it showed that dropout and batch normalization should be used in CNNs only with caution and experimentation (when in doubt and short on time to experiment, use only batch normalization).

...read moreread less

452 citations

Proceedings Article•

The Non-IID Data Quagmire of Decentralized Machine Learning

[...]

Kevin Hsieh¹, Amar Phanishayee¹, Onur Mutlu², Phillip B. Gibbons³•Institutions (3)

Microsoft¹, ETH Zurich², Carnegie Mellon University³

12 Jul 2020

TL;DR: SkewScout is presented, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions and it is shown that group normalization can recover much of the accuracy loss of batch normalization.

...read moreread less

Abstract: Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

...read moreread less

414 citations

Journal Article•10.1038/S41467-020-19015-1•

Benchmarking of cell type deconvolution pipelines for transcriptomics data.

[...]

Francisco Avila Cobos¹, Francisco Avila Cobos², José Alquicira-Hernandez³, José Alquicira-Hernandez¹, Joseph E. Powell¹, Joseph E. Powell³, Pieter Mestdagh², Katleen De Preter² - Show less +4 more•Institutions (3)

Garvan Institute of Medical Research¹, Ghent University², University of Queensland³

06 Nov 2020-Nature Communications

TL;DR: Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values.

...read moreread less

Abstract: Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.

...read moreread less

396 citations

Posted Content•

Normalized Loss Functions for Deep Learning with Noisy Labels

[...]

Xingjun Ma¹, Hanxun Huang¹, Yisen Wang², Simone Romano¹, Sarah M. Erfani¹, James Bailey¹ - Show less +2 more•Institutions (2)

University of Melbourne¹, Shanghai Jiao Tong University²

24 Jun 2020-arXiv: Learning

TL;DR: Experiments on benchmark datasets demonstrate that the family of new loss functions created by the APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.

...read moreread less

Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels Whilst new loss functions have been designed, they are only partially robust In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs By investigating several robust loss functions, we find that they suffer from a problem of underfitting To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL) APL combines two robust loss functions that mutually boost each other Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01181•

Gated Channel Transformation for Visual Recognition

[...]

Zongxin Yang¹, Linchao Zhu¹, Yu Wu¹, Yi Yang¹•Institutions (1)

University of Technology, Sydney¹

14 Jun 2020

TL;DR: A generally applicable transformation unit for visual recognition with deep convolutional neural networks that explicitly models channel relationships with explainable control variables and is applicable to operator-level without much increase of additional parameters.

...read moreread less

Abstract: In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

...read moreread less

Proceedings Article•10.1109/ICSSIT48917.2020.9214160•

Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification

[...]

V N Ganapathi Raju¹, K. Prasanna Lakshmi¹, Vinod Mahesh Jain¹, Archana Kalidindi¹, V Padma¹ - Show less +1 more•Institutions (1)

Gokaraju Rangaraju Institute of Engineering and Technology¹

1 Aug 2020

TL;DR: This paper depicts the improvement in predictive accuracies with the help of normalization techniques and various criteria needed to achieve such data normalization are described.

...read moreread less

Abstract: Recent developments in analytical technologies helped in developing applications for real-time problems faced by industries. These applications are often found to consume more time in the training phase. To reduce this pre-treatment of data in the training phase is pointed out to be an appropriate methodology. Normalization is the best technique for pre-processing data before the training phase of application. Normalization using metrics with criteria is found to be very important to attain good predictive results with less amount of time. This paper depicts the improvement in predictive accuracies with the help of normalization techniques. Various criteria needed to achieve such data normalization are also described. In this paper, will be having a glance on three different machine learning classifier i.e., Radial SVM, KNN, Sigmoid SVM and seven different standardization techniques i.e., StandardScaler, Scale, RobustScaler, QuantileTransform, PowerTransform, MinMaxS caler and MaxAbsS caler.

...read moreread less

Journal Article•10.1609/AAAI.V34I04.5862•

Dynamic Instance Normalization for Arbitrary Style Transfer

[...]

Yongcheng Jing¹, Xiao Liu², Ding Yukang², Xinchao Wang³, Errui Ding², Mingli Song¹, Shilei Wen² - Show less +3 more•Institutions (3)

Zhejiang University¹, Baidu², Stevens Institute of Technology³

3 Apr 2020

TL;DR: The proposed Dynamic Instance Normalization (DIN) provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.

...read moreread less

Abstract: Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way. Such manually-defined nature eventually results in the high-cost and shared encoders for both style and content encoding, making style transfer systems cumbersome to be deployed in resource-constrained environments like on the mobile-terminal side. In this paper, we propose a new and generalized normalization module, termed as Dynamic Instance Normalization (DIN), that allows for flexible and more efficient arbitrary style transfers. Comprising an instance normalization and a dynamic convolution, DIN encodes a style image into learnable convolution parameters, upon which the content image is stylized. Unlike conventional methods that use shared complex encoders to encode content and style, the proposed DIN introduces a sophisticated style encoder, yet comes with a compact and lightweight content encoder for fast inference. Experimental results demonstrate that the proposed approach yields very encouraging results on challenging style patterns and, to our best knowledge, for the first time enables an arbitrary style transfer using MobileNet-based lightweight architecture, leading to a reduction factor of more than twenty in computational cost as compared to existing approaches. Furthermore, the proposed DIN provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.

...read moreread less

Journal Article•10.1016/J.JELEKIN.2020.102438•

Consensus for experimental design in electromyography (CEDE) project: Amplitude normalization matrix.

[...]

Manuela Besomi¹, Paul W. Hodges¹, Edward A. Clancy², Jaap H. van Dieën³, François Hug¹, Madeleine M. Lowery⁴, Roberto Merletti⁵, Karen Søgaard⁶, Tim V. Wrigley⁷, Thor F. Besier⁸, Richard G. Carson¹, Catherine Disselhorst-Klug⁹, Roger M. Enoka¹⁰, Deborah Falla¹¹, Dario Farina¹², Simon C. Gandevia¹³, Ales Holobar¹⁴, Matthew C. Kiernan¹⁵, Kevin C. McGill¹⁶, Eric J. Perreault¹⁷, John C. Rothwell¹⁸, Kylie Tucker¹ - Show less +18 more•Institutions (18)

University of Queensland¹, Worcester Polytechnic Institute², VU University Amsterdam³, University College Dublin⁴, Polytechnic University of Turin⁵, University of Southern Denmark⁶, University of Melbourne⁷, University of Auckland⁸, RWTH Aachen University⁹, University of Colorado Boulder¹⁰, University of Birmingham¹¹, Imperial College London¹², University of New South Wales¹³, University of Maribor¹⁴, Royal Prince Alfred Hospital¹⁵, United States Department of Veterans Affairs¹⁶, Rehabilitation Institute of Chicago¹⁷, UCL Institute of Neurology¹⁸

01 Aug 2020-Journal of Electromyography and Kinesiology

TL;DR: This matrix, developed by the Consensus for Experimental Design in Electromyography (CEDE) project, presents six approaches to EMG normalization and general considerations for normalization, features that should be reported, definitions, and "pros and cons" of each normalization approach are presented.

...read moreread less

Posted Content•

Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

[...]

Zachary Nado¹, Shreyas Padhy, D. Sculley, Alexander D'Amour, Balaji Lakshminarayanan, Jasper Snoek - Show less +2 more•Institutions (1)

Google¹

19 Jun 2020-arXiv: Learning

TL;DR: It is shown that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness and combining the two further improves performance, and has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.

...read moreread less

Abstract: Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before prediction time This interesting observation enables a simple but surprisingly effective method which we call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 6028\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness (eg deep ensembles) and combining the two further improves performance Our findings are supported by detailed measurements of the effect of this strategy on model behavior across rigorous ablations on various dataset modalities However, the method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift, and is therefore worthy of additional study We include links to the data in our figures to improve reproducibility, including a Python notebooks that can be run to easily modify our analysis at this https URL

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01034•

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

[...]

Longteng Guo¹, Jing Liu¹, Xinxin Zhu¹, Peng Yao², Shichen Lu³, Hanqing Lu¹ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Science and Technology Beijing², Wuhan University³

14 Jun 2020

TL;DR: In this article, Zhang et al. proposed a Geometry-aware self-attention (GSA) to explicitly and efficiently consider the relative geometry relations between the objects in the image.

...read moreread less

Abstract: Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method and demonstrate that it is both possible and beneficial to perform it on the hidden activations inside SA. Second, to compensate for the major limit of Transformer that it fails to model the geometry structure of the input objects, we propose a class of Geometry-aware Self-Attention (GSA) that extends SA to explicitly and efficiently consider the relative geometry relations between the objects in the image. To construct our image captioning model, we combine the two modules and apply it to the vanilla self-attention network. We extensively evaluate our proposals on MS-COCO image captioning dataset and superior results are achieved when comparing to state-of-the-art approaches. Further experiments on three challenging tasks, i.e. video captioning, machine translation, and visual question answering, show the generality of our methods.

...read moreread less

Journal Article•10.1609/AAAI.V34I07.6967•

Region Normalization for Image Inpainting

[...]

Tao Yu¹, Zongyu Guo¹, Xin Jin¹, Shilin Wu¹, Zhibo Chen¹, Weiping Li¹, Zhizheng Zhang¹, Sen Liu¹ - Show less +4 more•Institutions (1)

University of Science and Technology of China¹

3 Apr 2020

TL;DR: It is shown that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and a spatial region-wise normalization named Region Normalization (RN) is proposed to overcome the limitation.

...read moreread less

Abstract: Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.

...read moreread less

Book Chapter•10.1007/978-3-030-58542-6_5•

Learning to Optimize Domain Specific Normalization for Domain Generalization

[...]

Seonguk Seo¹, Yumin Suh², Dongwan Kim¹, Geeho Kim¹, Jong-Woo Han³, Bohyung Han¹ - Show less +2 more•Institutions (3)

Seoul National University¹, Princeton University², LG Electronics³

23 Aug 2020

TL;DR: In this paper, the authors proposed a multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains, which can enhance the generalizability of the learned model.

...read moreread less

Abstract: We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain. For each domain, the activations are normalized by a weighted average of multiple normalization statistics. The normalization statistics are kept track of separately for each normalization type if necessary. Specifically, we employ batch and instance normalizations in our implementation to identify the best combination of these two normalization methods in each domain. The optimized normalization layers are effective to enhance the generalizability of the learned model. We demonstrate the state-of-the-art accuracy of our algorithm in the standard domain generalization benchmarks, as well as viability to further tasks such as multi-source domain adaptation and domain generalization in the presence of label noise.

...read moreread less

Posted Content•

Revisiting Batch Normalization for Training Low-latency Deep Spiking Neural Networks from Scratch.

[...]

Youngeun Kim¹, Priyadarshini Panda¹•Institutions (1)

Yale University¹

05 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A temporal Batch Normalization Through Time (BNTT) technique is proposed and it is found that varying the BN parameters at every time-step allows the model to learn the time-varying input distribution better.

...read moreread less

Abstract: Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, training high-accuracy and low-latency SNNs from scratch suffers from non-differentiable nature of a spiking neuron. To address this training issue in SNNs, we revisit batch normalization and propose a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. Different from previous works, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. The temporally evolving learnable parameters in BNTT allow a neuron to control its spike rate through different time-steps, enabling low-latency and low-energy training from scratch. We conduct experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and event-driven DVS-CIFAR10 datasets. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. We also propose an early exit algorithm using the distribution of parameters in BNTT to reduce the latency at inference, that further improves the energy-efficiency.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.00354•

Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization

[...]

Yoonsik Kim¹, Jae Woong Soh¹, Gu Yong Park¹, Nam Ik Cho¹•Institutions (1)

Seoul National University¹

14 Jun 2020

TL;DR: In this article, the authors adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set.

...read moreread less

Abstract: Real-noise denoising is a challenging task because the statistics of real-noise do not follow the normal distribution, and they are also spatially and temporally changing. In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme. Specifically, we adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set. We also introduce a transfer learning scheme that transfers knowledge learned from synthetic-noise data to the real-noise denoiser. From the proposed transfer learning, the synthetic-noise denoiser can learn general features from various synthetic-noise data, and the real-noise denoiser can learn the real-noise characteristics from real data. From the experiments, we find that the proposed denoising method has great generalization ability, such that our network trained with synthetic-noise achieves the best performance for Darmstadt Noise Dataset (DND) among the methods from published papers. We can also see that the proposed transfer learning scheme robustly works for real-noise images through the learning with a very small number of labeled data.

...read moreread less

Journal Article•10.1093/NAR/GKAA258•

NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data.

[...]

Qingxia Yang¹, Qingxia Yang², Yunxia Wang¹, Ying Zhang¹, Fengcheng Li¹, Weiqi Xia¹, Ying Zhou¹, Yunqing Qiu¹, Honglin Li³, Feng Zhu², Feng Zhu¹ - Show less +7 more•Institutions (3)

Zhejiang University¹, Chongqing University², East China University of Science and Technology³

02 Jul 2020-Nucleic Acids Research

TL;DR: NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools.

...read moreread less

Abstract: Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.

...read moreread less

Journal Article•10.1002/CYTO.A.23904•

CytoNorm: A Normalization Algorithm for Cytometry Data.

[...]

Sofie Van Gassen¹, Brice Gaudilliere², Martin S. Angst², Yvan Saeys¹, Nima Aghaeepour² - Show less +1 more•Institutions (2)

Ghent University¹, Stanford University²

01 Mar 2020-Cytometry Part A

TL;DR: This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches, which compared favorably to standard normalization procedures.

...read moreread less

Abstract: High-dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high-content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample preparation and measurements might drift over time. While various controls exist for assessment and improvement of data quality in a single sample, the challenges of cross-sample normalization attempts have been limited to aligning marker distributions across subjects. These approaches, inspired by bulk genomics and proteomics assays, ignore the single-cell nature of the data and risk the removal of biologically relevant signals. This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches. Data from the shared controls is used to learn the appropriate transformations for each batch (e.g., each analysis day). Importantly, some sources of technical variation are strongly influenced by the amount of protein expressed on specific cell types, requiring several population-specific transformations to normalize cells from a heterogeneous sample. To address this, our approach first identifies the overall cellular distribution using a clustering step, and calculates subset-specific transformations on the control samples by computing their quantile distributions and aligning them with splines. These transformations are then applied to all other clinical samples in the batch to remove the batch-specific variations. We evaluated the algorithm on a customized data set with two shared controls across batches. One control sample was used for calculation of the normalization transformations and the second control was used as a blinded test set and evaluated with Earth Mover's distance. Additional results are provided using two real-world clinical data sets. Overall, our method compared favorably to standard normalization procedures. The algorithm is implemented in the R package "CytoNorm" and available via the following link: www.github.com/saeyslab/CytoNorm © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

...read moreread less

Posted Content•

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

[...]

Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, Ling Shao - Show less +2 more

27 Sep 2020-arXiv: Learning

TL;DR: A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.

...read moreread less

Abstract: Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.

...read moreread less

Proceedings Article•

An Exponential Learning Rate Schedule for Deep Learning

[...]

Zhiyuan Li¹, Sanjeev Arora¹•Institutions (1)

Princeton University¹

1 Apr 2020

TL;DR: The first time such a rate schedule has been successfully used, let alone for highly successful architectures, is suggested, and as expected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization.

...read moreread less

Abstract: Intriguing empirical evidence exists that deep learning can work well with exotic schedules for varying the learning rate. This paper suggests that the phenomenon may be due to Batch Normalization or BN(Ioffe & Szegedy, 2015), which is ubiq- uitous and provides benefits in optimization and generalization across all standard architectures. The following new results are shown about BN with weight decay and momentum (in other words, the typical use case which was not considered in earlier theoretical analyses of stand-alone BN (Ioffe & Szegedy, 2015; Santurkar et al., 2018; Arora et al., 2018) • Training can be done using SGD with momentum and an exponentially in- creasing learning rate schedule, i.e., learning rate increases by some (1 + α) factor in every epoch for some α > 0. (Precise statement in the paper.) To the best of our knowledge this is the first time such a rate schedule has been successfully used, let alone for highly successful architectures. As ex- pected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization. • Mathematical explanation of the success of the above rate schedule: a rigor- ous proof that it is equivalent to the standard setting of BN + SGD + Standard Rate Tuning + Weight Decay + Momentum. This equivalence holds for other normalization layers as well, Group Normalization(Wu & He, 2018), Layer Normalization(Ba et al., 2016), Instance Norm(Ulyanov et al., 2016), etc. • A worked-out toy example illustrating the above linkage of hyper- parameters. Using either weight decay or BN alone reaches global minimum, but convergence fails when both are used.

...read moreread less

Journal Article•10.1016/J.OMEGA.2019.04.001•

DNMA: A double normalization-based multiple aggregation method for multi-expert multi-criteria decision making

[...]

Huchang Liao¹, Huchang Liao², Xingli Wu²•Institutions (2)

King Abdulaziz University¹, Sichuan University²

01 Jul 2020-Omega-international Journal of Management Science

TL;DR: A comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types is developed, which focuses on using probabilistic linguistic term sets to express the qualitative evaluations.

...read moreread less

Abstract: This paper develops a comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types. We focus on using probabilistic linguistic term sets to express the qualitative evaluations due to their excellence in expressing complex individual and collective linguistic assessments. Firstly, we develop a target-based linear normalization technique and a target-based vector normalization technique. A weight adjustment method is proposed to achieve the tradeoff between criteria after normalization. Given that the two target-based normalization techniques have different advantages, we then propose a ranking method, which consists three subordinate models, based on these two target-based normalization approaches and three aggregation techniques. Reliable results of a multi-expert multi-criteria decision making problem are determined by integrating the subordinate utility values and the ranks of alternatives. The proposed method is implemented to solve the green enterprise ranking problems and the excavation scheme selection problem for shallow buried tunnels, respectively. The advantages of the proposed method are emphasized through comparative analyses with other ranking methods.

...read moreread less

Posted Content•

Towards Deeper Graph Neural Networks with Differentiable Group Normalization

[...]

Kaixiong Zhou¹, Xiao Huang², Yuening Li¹, Daochen Zha¹, Rui Chen³, Xia Hu¹ - Show less +2 more•Institutions (3)

Texas A&M University¹, Hong Kong Polytechnic University², Samsung³

12 Jun 2020-arXiv: Learning

TL;DR: DGN is introduced, which normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue.

...read moreread less

Abstract: Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.

...read moreread less

Proceedings Article•10.1109/CVPR42600.2020.01130•

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

[...]

Weiwei Sun¹, Wei Jiang¹, Eduard Trulls², Andrea Tagliasacchi², Kwang Moo Yi¹ - Show less +1 more•Institutions (2)

University of Victoria¹, Google²

14 Jun 2020

TL;DR: In this article, the authors propose an attention-based normalization of the feature maps of a permutation-equivariant network to find the essential data points in high-dimensional space to solve a given task.

...read moreread less

Abstract: Many problems in computer vision require dealing with sparse, unordered data in the form of point clouds. Permutation-equivariant networks have become a popular solution – they operate on individual data points with simple perceptrons and extract contextual information with global pooling. This can be achieved with a simple normalization of the feature maps, a global operation that is unaffected by the order. In this paper, we propose Attentive Context Normalization (ACN), a simple yet effective technique to build permutation-equivariant networks robust to outliers. Specifically, we show how to normalize the feature maps with weights that are estimated within the network, excluding outliers from this normalization. We use this mechanism to leverage two types of attention: local and global – by combining them, our method is able to find the essential data points in high-dimensional space in order to solve a given task. We demonstrate through extensive experiments that our approach, which we call Attentive Context Networks (ACNe), provides a significant leap in performance compared to the state-of-the-art on camera pose estimation, robust fitting, and point cloud classification under noise and outliers. Source code: https://github.com/vcg-uvic/acne.

...read moreread less

Posted Content•

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

[...]

Tianle Cai¹, Shengjie Luo², Keyulu Xu³, Di He⁴, Tie-Yan Liu⁴, Liwei Wang² - Show less +2 more•Institutions (4)

Princeton University¹, Peking University², Massachusetts Institute of Technology³, Microsoft⁴

07 Sep 2020-arXiv: Learning

TL;DR: A principled normalization method, Graph Normalization (GraphNorm), where the key idea is to normalize the feature values across all nodes for each individual graph with a learnable shift, which improves generalization of GNNs, achieving better performance on graph classification benchmarks.

...read moreread less

Abstract: Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.

...read moreread less

Journal Article•10.31181/DMAME2003149Z•

Objective methods for determining criteria weight coefficients: A modification of the CRITIC method

[...]

Mališa Žižović¹, Boža D. Miljković², Dragan Marinković³•Institutions (3)

University of Kragujevac¹, University of Novi Sad², Technical University of Berlin³

1 Oct 2020

TL;DR: A new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients, to achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation.

...read moreread less

Abstract: Determining criteria weight coefficients is a crucial step in multi-criteria decision making models. Therefore, this problem is given great attention in literature. This paper presents a new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients. Modifying the CRITIC method (CRITIC-M) entails changing the element normalization process of the initial decision matrix and changing data aggregation from the normalized decision matrix. By introducing a new normalization process, we achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation. Thus, the relationships between data in the initial decision matrix are presented in a more objective way. By introducing a new process of aggregation of weight coefficient values in the CRITIC-M method, a more comprehensive understanding of data in the initial decision matrix is made possible, leading to more objective values of weight coefficients. The presented CRITIC-M method has been tested in two examples, followed by a discussion of results via comparison to the classic CRITIC method.

...read moreread less

...

Expand