Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Normalization (statistics)
  4. 2020
  1. Home
  2. Topics
  3. Normalization (statistics)
  4. 2020
Showing papers on "Normalization (statistics) published in 2020"
10.5281/ZENODO.5138159•
Advanced Normalization Tools (ANTs)

[...]

Brian B. Avants, Nicholas J. Tustison, Hans J. Johnson
20 Dec 2020

1,389 citations

Journal Article•10.1016/J.ASOC.2019.105524•
Investigating the impact of data normalization on classification performance

[...]

Dalwinder Singh1, Birmohan Singh1•
Sant Longowal Institute of Engineering and Technology1
01 Dec 2020-Applied Soft Computing
TL;DR: This study aims to investigate the impact of fourteen data normalization methods on classification performance considering full feature set, feature selection, and feature weighting and suggests a set of the best and the worst methods combining the normalization procedure and empirical analysis of results.

1,172 citations

Posted Content•
On Layer Normalization in the Transformer Architecture

[...]

Ruibin Xiong1, Yunchang Yang2, Di He2, Kai Zheng2, Shuxin Zheng3, Chen Xing4, Huishuai Zhang3, Yanyan Lan, Liwei Wang2, Tie-Yan Liu3 •
Chinese Academy of Sciences1, Peking University2, Microsoft3, Nankai University4
12 Feb 2020-arXiv: Learning
TL;DR: In this paper, the authors show that layer normalization is crucial to the performance of pre-LN Transformers and remove the warm-up stage for the training of Pre-LNs.
Abstract: The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.

761 citations

Journal Article•10.1109/TMM.2019.2958756•
A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification

[...]

Hao Luo1, Wei Jiang1, Youzhi Gu1, Fuxu Liu, Xingyu Liao2, Shenqi Lai3, Jianyang Gu1 •
Zhejiang University1, Chinese Academy of Sciences2, Xi'an Jiaotong University3
01 Oct 2020-IEEE Transactions on Multimedia
TL;DR: Extended experiments show that BNNeck can boost the baseline, and the baseline can improve the performance of existing state-of-the-art methods.
Abstract: This study proposes a simple but strong baseline for deep person re-identification (ReID). Deep person ReID has achieved great progress and high performance in recent years. However, many state-of-the-art methods design complex network structures and concatenate multi-branch features. In the literature, some effective training tricks briefly appear in several papers or source codes. The present study collects and evaluates these effective training tricks in person ReID. By combining these tricks, the model achieves 94.5% rank-1 and 85.9% mean average precision on Market1501 with only using the global features of ResNet50. The performance surpasses all existing global- and part-based baselines in person ReID. We propose a novel neck structure named as batch normalization neck (BNNeck). BNNeck adds a batch normalization layer after global pooling layer to separate metric and classification losses into two different feature spaces because we observe they are inconsistent in one embedding space. Extended experiments show that BNNeck can boost the baseline, and our baseline can improve the performance of existing state-of-the-art methods. Our codes and models are available at: https://github.com/michuanhaohao/reid-strong-baseline

625 citations

Proceedings Article•10.1109/CVPR42600.2020.00515•
SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

[...]

Peihao Zhu1, Rameen Abdal1, Yipeng Qin, Peter Wonka1•
King Abdullah University of Science and Technology1
14 Jun 2020
TL;DR: Semantic Region Adaptive Normalization (SEAN) as mentioned in this paper is a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image.
Abstract: We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.

588 citations

Journal Article•10.1080/02664763.2019.1630372•
Ordered quantile normalization: a semiparametric transformation built for the cross-validation era

[...]

Ryan A. Peterson1, Ryan A. Peterson2, Joseph E. Cavanaugh2•
Anschutz Medical Campus1, University of Iowa2
17 Nov 2020-Journal of Applied Statistics
TL;DR: Ordered Quantile (ORQ) normalization is introduced, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution.
Abstract: Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be ada...

523 citations

Proceedings Article•10.1109/CVPR42600.2020.00321•
Style Normalization and Restitution for Generalizable Person Re-Identification

[...]

Xin Jin1, Cuiling Lan2, Wenjun Zeng2, Zhibo Chen1, Li Zhang3 •
University of Science and Technology of China1, Microsoft2, University of Oxford3
14 Jun 2020
TL;DR: The aim of this paper is to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains, and to enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features.
Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.

482 citations

Journal Article•10.1007/S11042-019-08453-9•
Dropout vs. batch normalization: an empirical study of their impact to deep learning

[...]

Christian Garbin1, Xingquan Zhu1, Oge Marques1•
Florida Atlantic University1
01 May 2020-Multimedia Tools and Applications
TL;DR: The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increaseIn prediction time (important for constrained environments, such as smartphones and low-powered IoT devices) and showed that a non-adaptive optimizer can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning.
Abstract: Overfitting and long training time are two fundamental challenges in multilayered neural network learning and deep learning in particular. Dropout and batch normalization are two well-recognized approaches to tackle these challenges. While both approaches share overlapping design principles, numerous research results have shown that they have unique strengths to improve deep learning. Many tools simplify these two approaches as a simple function call, allowing flexible stacking to form deep learning architectures. Although their usage guidelines are available, unfortunately no well-defined set of rules or comprehensive studies to investigate them concerning data input, network configurations, learning efficiency, and accuracy. It is not clear when users should consider using dropout and/or batch normalization, and how they should be combined (or used alternatively) to achieve optimized deep learning outcomes. In this paper we conduct an empirical study to investigate the effect of dropout and batch normalization on training deep learning models. We use multilayered dense neural networks and convolutional neural networks (CNN) as the deep learning models, and mix dropout and batch normalization to design different architectures and subsequently observe their performance in terms of training and test CPU time, number of parameters in the model (as a proxy for model size), and classification accuracy. The interplay between network structures, dropout, and batch normalization, allow us to conclude when and how dropout and batch normalization should be considered in deep learning. The empirical study quantified the increase in training time when dropout and batch normalization are used, as well as the increase in prediction time (important for constrained environments, such as smartphones and low-powered IoT devices). It showed that a non-adaptive optimizer (e.g. SGD) can outperform adaptive optimizers, but only at the cost of a significant amount of training times to perform hyperparameter tuning, while an adaptive optimizer (e.g. RMSProp) performs well without much tuning. Finally, it showed that dropout and batch normalization should be used in CNNs only with caution and experimentation (when in doubt and short on time to experiment, use only batch normalization).

452 citations

Proceedings Article•
The Non-IID Data Quagmire of Decentralized Machine Learning

[...]

Kevin Hsieh1, Amar Phanishayee1, Onur Mutlu2, Phillip B. Gibbons3•
Microsoft1, ETH Zurich2, Carnegie Mellon University3
12 Jul 2020
TL;DR: SkewScout is presented, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions and it is shown that group normalization can recover much of the accuracy loss of batch normalization.
Abstract: Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

414 citations

Journal Article•10.1038/S41467-020-19015-1•
Benchmarking of cell type deconvolution pipelines for transcriptomics data.

[...]

Francisco Avila Cobos1, Francisco Avila Cobos2, José Alquicira-Hernandez3, José Alquicira-Hernandez1, Joseph E. Powell1, Joseph E. Powell3, Pieter Mestdagh2, Katleen De Preter2 •
Garvan Institute of Medical Research1, Ghent University2, University of Queensland3
06 Nov 2020-Nature Communications
TL;DR: Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values.
Abstract: Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.

396 citations

Posted Content•
Normalized Loss Functions for Deep Learning with Noisy Labels

[...]

Xingjun Ma1, Hanxun Huang1, Yisen Wang2, Simone Romano1, Sarah M. Erfani1, James Bailey1 •
University of Melbourne1, Shanghai Jiao Tong University2
24 Jun 2020-arXiv: Learning
TL;DR: Experiments on benchmark datasets demonstrate that the family of new loss functions created by the APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.
Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels Whilst new loss functions have been designed, they are only partially robust In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs By investigating several robust loss functions, we find that they suffer from a problem of underfitting To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL) APL combines two robust loss functions that mutually boost each other Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels
Proceedings Article•10.1109/CVPR42600.2020.01181•
Gated Channel Transformation for Visual Recognition

[...]

Zongxin Yang1, Linchao Zhu1, Yu Wu1, Yi Yang1•
University of Technology, Sydney1
14 Jun 2020
TL;DR: A generally applicable transformation unit for visual recognition with deep convolutional neural networks that explicitly models channel relationships with explainable control variables and is applicable to operator-level without much increase of additional parameters.
Abstract: In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.
Proceedings Article•10.1109/ICSSIT48917.2020.9214160•
Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification

[...]

V N Ganapathi Raju1, K. Prasanna Lakshmi1, Vinod Mahesh Jain1, Archana Kalidindi1, V Padma1 •
Gokaraju Rangaraju Institute of Engineering and Technology1
1 Aug 2020
TL;DR: This paper depicts the improvement in predictive accuracies with the help of normalization techniques and various criteria needed to achieve such data normalization are described.
Abstract: Recent developments in analytical technologies helped in developing applications for real-time problems faced by industries. These applications are often found to consume more time in the training phase. To reduce this pre-treatment of data in the training phase is pointed out to be an appropriate methodology. Normalization is the best technique for pre-processing data before the training phase of application. Normalization using metrics with criteria is found to be very important to attain good predictive results with less amount of time. This paper depicts the improvement in predictive accuracies with the help of normalization techniques. Various criteria needed to achieve such data normalization are also described. In this paper, will be having a glance on three different machine learning classifier i.e., Radial SVM, KNN, Sigmoid SVM and seven different standardization techniques i.e., StandardScaler, Scale, RobustScaler, QuantileTransform, PowerTransform, MinMaxS caler and MaxAbsS caler.
Journal Article•10.1609/AAAI.V34I04.5862•
Dynamic Instance Normalization for Arbitrary Style Transfer

[...]

Yongcheng Jing1, Xiao Liu2, Ding Yukang2, Xinchao Wang3, Errui Ding2, Mingli Song1, Shilei Wen2 •
Zhejiang University1, Baidu2, Stevens Institute of Technology3
3 Apr 2020
TL;DR: The proposed Dynamic Instance Normalization (DIN) provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.
Abstract: Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way. Such manually-defined nature eventually results in the high-cost and shared encoders for both style and content encoding, making style transfer systems cumbersome to be deployed in resource-constrained environments like on the mobile-terminal side. In this paper, we propose a new and generalized normalization module, termed as Dynamic Instance Normalization (DIN), that allows for flexible and more efficient arbitrary style transfers. Comprising an instance normalization and a dynamic convolution, DIN encodes a style image into learnable convolution parameters, upon which the content image is stylized. Unlike conventional methods that use shared complex encoders to encode content and style, the proposed DIN introduces a sophisticated style encoder, yet comes with a compact and lightweight content encoder for fast inference. Experimental results demonstrate that the proposed approach yields very encouraging results on challenging style patterns and, to our best knowledge, for the first time enables an arbitrary style transfer using MobileNet-based lightweight architecture, leading to a reduction factor of more than twenty in computational cost as compared to existing approaches. Furthermore, the proposed DIN provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.
Journal Article•10.1016/J.JELEKIN.2020.102438•
Consensus for experimental design in electromyography (CEDE) project: Amplitude normalization matrix.

[...]

Manuela Besomi1, Paul W. Hodges1, Edward A. Clancy2, Jaap H. van Dieën3, François Hug1, Madeleine M. Lowery4, Roberto Merletti5, Karen Søgaard6, Tim V. Wrigley7, Thor F. Besier8, Richard G. Carson1, Catherine Disselhorst-Klug9, Roger M. Enoka10, Deborah Falla11, Dario Farina12, Simon C. Gandevia13, Ales Holobar14, Matthew C. Kiernan15, Kevin C. McGill16, Eric J. Perreault17, John C. Rothwell18, Kylie Tucker1 •
University of Queensland1, Worcester Polytechnic Institute2, VU University Amsterdam3, University College Dublin4, Polytechnic University of Turin5, University of Southern Denmark6, University of Melbourne7, University of Auckland8, RWTH Aachen University9, University of Colorado Boulder10, University of Birmingham11, Imperial College London12, University of New South Wales13, University of Maribor14, Royal Prince Alfred Hospital15, United States Department of Veterans Affairs16, Rehabilitation Institute of Chicago17, UCL Institute of Neurology18
01 Aug 2020-Journal of Electromyography and Kinesiology
TL;DR: This matrix, developed by the Consensus for Experimental Design in Electromyography (CEDE) project, presents six approaches to EMG normalization and general considerations for normalization, features that should be reported, definitions, and "pros and cons" of each normalization approach are presented.
Posted Content•
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

[...]

Zachary Nado1, Shreyas Padhy, D. Sculley, Alexander D'Amour, Balaji Lakshminarayanan, Jasper Snoek •
Google1
19 Jun 2020-arXiv: Learning
TL;DR: It is shown that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness and combining the two further improves performance, and has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
Abstract: Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before prediction time This interesting observation enables a simple but surprisingly effective method which we call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 6028\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness (eg deep ensembles) and combining the two further improves performance Our findings are supported by detailed measurements of the effect of this strategy on model behavior across rigorous ablations on various dataset modalities However, the method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift, and is therefore worthy of additional study We include links to the data in our figures to improve reproducibility, including a Python notebooks that can be run to easily modify our analysis at this https URL
Proceedings Article•10.1109/CVPR42600.2020.01034•
Normalized and Geometry-Aware Self-Attention Network for Image Captioning

[...]

Longteng Guo1, Jing Liu1, Xinxin Zhu1, Peng Yao2, Shichen Lu3, Hanqing Lu1 •
Chinese Academy of Sciences1, University of Science and Technology Beijing2, Wuhan University3
14 Jun 2020
TL;DR: In this article, Zhang et al. proposed a Geometry-aware self-attention (GSA) to explicitly and efficiently consider the relative geometry relations between the objects in the image.
Abstract: Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method and demonstrate that it is both possible and beneficial to perform it on the hidden activations inside SA. Second, to compensate for the major limit of Transformer that it fails to model the geometry structure of the input objects, we propose a class of Geometry-aware Self-Attention (GSA) that extends SA to explicitly and efficiently consider the relative geometry relations between the objects in the image. To construct our image captioning model, we combine the two modules and apply it to the vanilla self-attention network. We extensively evaluate our proposals on MS-COCO image captioning dataset and superior results are achieved when comparing to state-of-the-art approaches. Further experiments on three challenging tasks, i.e. video captioning, machine translation, and visual question answering, show the generality of our methods.
Journal Article•10.1609/AAAI.V34I07.6967•
Region Normalization for Image Inpainting

[...]

Tao Yu1, Zongyu Guo1, Xin Jin1, Shilin Wu1, Zhibo Chen1, Weiping Li1, Zhizheng Zhang1, Sen Liu1 •
University of Science and Technology of China1
3 Apr 2020
TL;DR: It is shown that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and a spatial region-wise normalization named Region Normalization (RN) is proposed to overcome the limitation.
Abstract: Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.
Book Chapter•10.1007/978-3-030-58542-6_5•
Learning to Optimize Domain Specific Normalization for Domain Generalization

[...]

Seonguk Seo1, Yumin Suh2, Dongwan Kim1, Geeho Kim1, Jong-Woo Han3, Bohyung Han1 •
Seoul National University1, Princeton University2, LG Electronics3
23 Aug 2020
TL;DR: In this paper, the authors proposed a multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains, which can enhance the generalizability of the learned model.
Abstract: We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain. For each domain, the activations are normalized by a weighted average of multiple normalization statistics. The normalization statistics are kept track of separately for each normalization type if necessary. Specifically, we employ batch and instance normalizations in our implementation to identify the best combination of these two normalization methods in each domain. The optimized normalization layers are effective to enhance the generalizability of the learned model. We demonstrate the state-of-the-art accuracy of our algorithm in the standard domain generalization benchmarks, as well as viability to further tasks such as multi-source domain adaptation and domain generalization in the presence of label noise.
Posted Content•
Revisiting Batch Normalization for Training Low-latency Deep Spiking Neural Networks from Scratch.

[...]

Youngeun Kim1, Priyadarshini Panda1•
Yale University1
05 Oct 2020-arXiv: Computer Vision and Pattern Recognition
TL;DR: A temporal Batch Normalization Through Time (BNTT) technique is proposed and it is found that varying the BN parameters at every time-step allows the model to learn the time-varying input distribution better.
Abstract: Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, training high-accuracy and low-latency SNNs from scratch suffers from non-differentiable nature of a spiking neuron. To address this training issue in SNNs, we revisit batch normalization and propose a temporal Batch Normalization Through Time (BNTT) technique. Most prior SNN works till now have disregarded batch normalization deeming it ineffective for training temporal SNNs. Different from previous works, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. The temporally evolving learnable parameters in BNTT allow a neuron to control its spike rate through different time-steps, enabling low-latency and low-energy training from scratch. We conduct experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and event-driven DVS-CIFAR10 datasets. BNTT allows us to train deep SNN architectures from scratch, for the first time, on complex datasets with just few 25-30 time-steps. We also propose an early exit algorithm using the distribution of parameters in BNTT to reduce the latency at inference, that further improves the energy-efficiency.
Proceedings Article•10.1109/CVPR42600.2020.00354•
Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization

[...]

Yoonsik Kim1, Jae Woong Soh1, Gu Yong Park1, Nam Ik Cho1•
Seoul National University1
14 Jun 2020
TL;DR: In this article, the authors adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set.
Abstract: Real-noise denoising is a challenging task because the statistics of real-noise do not follow the normal distribution, and they are also spatially and temporally changing. In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme. Specifically, we adopt an adaptive instance normalization to build a denoiser, which can regularize the feature map and prevent the network from overfitting to the training set. We also introduce a transfer learning scheme that transfers knowledge learned from synthetic-noise data to the real-noise denoiser. From the proposed transfer learning, the synthetic-noise denoiser can learn general features from various synthetic-noise data, and the real-noise denoiser can learn the real-noise characteristics from real data. From the experiments, we find that the proposed denoising method has great generalization ability, such that our network trained with synthetic-noise achieves the best performance for Darmstadt Noise Dataset (DND) among the methods from published papers. We can also see that the proposed transfer learning scheme robustly works for real-noise images through the learning with a very small number of labeled data.
Journal Article•10.1093/NAR/GKAA258•
NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data.

[...]

Qingxia Yang1, Qingxia Yang2, Yunxia Wang1, Ying Zhang1, Fengcheng Li1, Weiqi Xia1, Ying Zhou1, Yunqing Qiu1, Honglin Li3, Feng Zhu2, Feng Zhu1 •
Zhejiang University1, Chongqing University2, East China University of Science and Technology3
02 Jul 2020-Nucleic Acids Research
TL;DR: NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools.
Abstract: Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.
Journal Article•10.1002/CYTO.A.23904•
CytoNorm: A Normalization Algorithm for Cytometry Data.

[...]

Sofie Van Gassen1, Brice Gaudilliere2, Martin S. Angst2, Yvan Saeys1, Nima Aghaeepour2 •
Ghent University1, Stanford University2
01 Mar 2020-Cytometry Part A
TL;DR: This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches, which compared favorably to standard normalization procedures.
Abstract: High-dimensional flow cytometry has matured to a level that enables deep phenotyping of cellular systems at a clinical scale. The resulting high-content data sets allow characterizing the human immune system at unprecedented single cell resolution. However, the results are highly dependent on sample preparation and measurements might drift over time. While various controls exist for assessment and improvement of data quality in a single sample, the challenges of cross-sample normalization attempts have been limited to aligning marker distributions across subjects. These approaches, inspired by bulk genomics and proteomics assays, ignore the single-cell nature of the data and risk the removal of biologically relevant signals. This work proposes CytoNorm, a normalization algorithm to ensure internal consistency between clinical samples based on shared controls across various study batches. Data from the shared controls is used to learn the appropriate transformations for each batch (e.g., each analysis day). Importantly, some sources of technical variation are strongly influenced by the amount of protein expressed on specific cell types, requiring several population-specific transformations to normalize cells from a heterogeneous sample. To address this, our approach first identifies the overall cellular distribution using a clustering step, and calculates subset-specific transformations on the control samples by computing their quantile distributions and aligning them with splines. These transformations are then applied to all other clinical samples in the batch to remove the batch-specific variations. We evaluated the algorithm on a customized data set with two shared controls across batches. One control sample was used for calculation of the normalization transformations and the second control was used as a blinded test set and evaluated with Earth Mover's distance. Additional results are provided using two real-world clinical data sets. Overall, our method compared favorably to standard normalization procedures. The algorithm is implemented in the R package "CytoNorm" and available via the following link: www.github.com/saeyslab/CytoNorm © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
Posted Content•
Normalization Techniques in Training DNNs: Methodology, Analysis and Application

[...]

Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, Ling Shao 
27 Sep 2020-arXiv: Learning
TL;DR: A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.
Abstract: Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.
Proceedings Article•
An Exponential Learning Rate Schedule for Deep Learning

[...]

Zhiyuan Li1, Sanjeev Arora1•
Princeton University1
1 Apr 2020
TL;DR: The first time such a rate schedule has been successfully used, let alone for highly successful architectures, is suggested, and as expected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization.
Abstract: Intriguing empirical evidence exists that deep learning can work well with exotic schedules for varying the learning rate. This paper suggests that the phenomenon may be due to Batch Normalization or BN(Ioffe & Szegedy, 2015), which is ubiq- uitous and provides benefits in optimization and generalization across all standard architectures. The following new results are shown about BN with weight decay and momentum (in other words, the typical use case which was not considered in earlier theoretical analyses of stand-alone BN (Ioffe & Szegedy, 2015; Santurkar et al., 2018; Arora et al., 2018) • Training can be done using SGD with momentum and an exponentially in- creasing learning rate schedule, i.e., learning rate increases by some (1 + α) factor in every epoch for some α > 0. (Precise statement in the paper.) To the best of our knowledge this is the first time such a rate schedule has been successfully used, let alone for highly successful architectures. As ex- pected, such training rapidly blows up network weights, but the net stays well-behaved due to normalization. • Mathematical explanation of the success of the above rate schedule: a rigor- ous proof that it is equivalent to the standard setting of BN + SGD + Standard Rate Tuning + Weight Decay + Momentum. This equivalence holds for other normalization layers as well, Group Normalization(Wu & He, 2018), Layer Normalization(Ba et al., 2016), Instance Norm(Ulyanov et al., 2016), etc. • A worked-out toy example illustrating the above linkage of hyper- parameters. Using either weight decay or BN alone reaches global minimum, but convergence fails when both are used.
Journal Article•10.1016/J.OMEGA.2019.04.001•
DNMA: A double normalization-based multiple aggregation method for multi-expert multi-criteria decision making

[...]

Huchang Liao1, Huchang Liao2, Xingli Wu2•
King Abdulaziz University1, Sichuan University2
01 Jul 2020-Omega-international Journal of Management Science
TL;DR: A comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types is developed, which focuses on using probabilistic linguistic term sets to express the qualitative evaluations.
Abstract: This paper develops a comprehensive algorithm for multi-expert multi-criteria decision making problems considering quantitative and qualitative criteria in forms of benefit, cost or target types. We focus on using probabilistic linguistic term sets to express the qualitative evaluations due to their excellence in expressing complex individual and collective linguistic assessments. Firstly, we develop a target-based linear normalization technique and a target-based vector normalization technique. A weight adjustment method is proposed to achieve the tradeoff between criteria after normalization. Given that the two target-based normalization techniques have different advantages, we then propose a ranking method, which consists three subordinate models, based on these two target-based normalization approaches and three aggregation techniques. Reliable results of a multi-expert multi-criteria decision making problem are determined by integrating the subordinate utility values and the ranks of alternatives. The proposed method is implemented to solve the green enterprise ranking problems and the excavation scheme selection problem for shallow buried tunnels, respectively. The advantages of the proposed method are emphasized through comparative analyses with other ranking methods.
Posted Content•
Towards Deeper Graph Neural Networks with Differentiable Group Normalization

[...]

Kaixiong Zhou1, Xiao Huang2, Yuening Li1, Daochen Zha1, Rui Chen3, Xia Hu1 •
Texas A&M University1, Hong Kong Polytechnic University2, Samsung3
12 Jun 2020-arXiv: Learning
TL;DR: DGN is introduced, which normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue.
Abstract: Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.
Proceedings Article•10.1109/CVPR42600.2020.01130•
ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

[...]

Weiwei Sun1, Wei Jiang1, Eduard Trulls2, Andrea Tagliasacchi2, Kwang Moo Yi1 •
University of Victoria1, Google2
14 Jun 2020
TL;DR: In this article, the authors propose an attention-based normalization of the feature maps of a permutation-equivariant network to find the essential data points in high-dimensional space to solve a given task.
Abstract: Many problems in computer vision require dealing with sparse, unordered data in the form of point clouds. Permutation-equivariant networks have become a popular solution – they operate on individual data points with simple perceptrons and extract contextual information with global pooling. This can be achieved with a simple normalization of the feature maps, a global operation that is unaffected by the order. In this paper, we propose Attentive Context Normalization (ACN), a simple yet effective technique to build permutation-equivariant networks robust to outliers. Specifically, we show how to normalize the feature maps with weights that are estimated within the network, excluding outliers from this normalization. We use this mechanism to leverage two types of attention: local and global – by combining them, our method is able to find the essential data points in high-dimensional space in order to solve a given task. We demonstrate through extensive experiments that our approach, which we call Attentive Context Networks (ACNe), provides a significant leap in performance compared to the state-of-the-art on camera pose estimation, robust fitting, and point cloud classification under noise and outliers. Source code: https://github.com/vcg-uvic/acne.
Posted Content•
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

[...]

Tianle Cai1, Shengjie Luo2, Keyulu Xu3, Di He4, Tie-Yan Liu4, Liwei Wang2 •
Princeton University1, Peking University2, Massachusetts Institute of Technology3, Microsoft4
07 Sep 2020-arXiv: Learning
TL;DR: A principled normalization method, Graph Normalization (GraphNorm), where the key idea is to normalize the feature values across all nodes for each individual graph with a learnable shift, which improves generalization of GNNs, achieving better performance on graph classification benchmarks.
Abstract: Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.
Journal Article•10.31181/DMAME2003149Z•
Objective methods for determining criteria weight coefficients: A modification of the CRITIC method

[...]

Mališa Žižović1, Boža D. Miljković2, Dragan Marinković3•
University of Kragujevac1, University of Novi Sad2, Technical University of Berlin3
1 Oct 2020
TL;DR: A new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients, to achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation.
Abstract: Determining criteria weight coefficients is a crucial step in multi-criteria decision making models. Therefore, this problem is given great attention in literature. This paper presents a new approach in modifying the CRiteria Importance Through Intercreteria Correlation (CRITIC) method, which falls under objective methods for determining criteria weight coefficients. Modifying the CRITIC method (CRITIC-M) entails changing the element normalization process of the initial decision matrix and changing data aggregation from the normalized decision matrix. By introducing a new normalization process, we achieve smaller deviations between normalized elements, which in turn causes lower values of standard deviation. Thus, the relationships between data in the initial decision matrix are presented in a more objective way. By introducing a new process of aggregation of weight coefficient values in the CRITIC-M method, a more comprehensive understanding of data in the initial decision matrix is made possible, leading to more objective values of weight coefficients. The presented CRITIC-M method has been tested in two examples, followed by a discussion of results via comparison to the classic CRITIC method.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve