Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Vector quantization
  4. 2020
  1. Home
  2. Topics
  3. Vector quantization
  4. 2020
Showing papers on "Vector quantization published in 2020"
Proceedings Article•10.1109/ICASSP40776.2020.9054168•
Federated Learning with Quantization Constraints

[...]

Nir Shlezinger1, Mingzhe Chen2, Yonina C. Eldar1, H. Vincent Poor2, Shuguang Cui3 •
Weizmann Institute of Science1, Princeton University2, The Chinese University of Hong Kong3
4 May 2020
TL;DR: This work identifies the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups, and shows that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible.
Abstract: Traditional deep learning models are trained on centralized servers using labeled sample data collected from edge devices. This data often includes private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to efficiently transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups. We show that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible. We also derive theoretical performance guarantees of the system. Our numerical results illustrate the substantial performance gains of our scheme over FL with previously proposed quantization approaches.

136 citations

Proceedings Article•10.21437/INTERSPEECH.2020-1443•
VQVC+: One-shot voice conversion by vector quantization and U-Net architecture

[...]

Da-Yi Wu1, Yen-Hao Chen1, Hung-yi Lee1•
National Taiwan University1
7 Jun 2020
TL;DR: To further improve audio quality, the U-Net architecture is used within an auto-encoder-based VC system and the VQ-based method, which quantizes the latent vectors, can serve the purpose.
Abstract: Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting. Auto-encoder-based VC methods disentangle the speaker and the content in input speech without given the speaker's identity, so these methods can further generalize to unseen speakers. The disentangle capability is achieved by vector quantization (VQ), adversarial training, or instance normalization (IN). However, the imperfect disentanglement may harm the quality of output speech. In this work, to further improve audio quality, we use the U-Net architecture within an auto-encoder-based VC system. We find that to leverage the U-Net architecture, a strong information bottleneck is necessary. The VQ-based method, which quantizes the latent vectors, can serve the purpose. The objective and the subjective evaluations show that the proposed method performs well in both audio naturalness and speaker similarity.

116 citations

Proceedings Article•10.1109/ICASSP40776.2020.9053854•
One-Shot Voice Conversion by Vector Quantization

[...]

Da-Yi Wu1, Hung-yi Lee1•
National Taiwan University1
4 May 2020
TL;DR: This paper proposes a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label that has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one- shot VC is thus achieved.
Abstract: In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and quantize-after vector as the speaker embedding. We show that this approach has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one-shot VC is thus achieved.

108 citations

Proceedings Article•
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

[...]

David Harwath1, Wei-Ning Hsu1, James Glass2•
Massachusetts Institute of Technology1, Qatar Foundation2
30 Apr 2020
TL;DR: This paper presents a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech and shows that this method is capable of capturing both word-level and sub-word units, depending on how it is configured.
Abstract: In this paper, we present a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech. We show that our method is capable of capturing both word-level and sub-word units, depending on how it is configured. What differentiates this paper from prior work on speech unit learning is the choice of training objective. Rather than using a reconstruction-based loss, we use a discriminative, multimodal grounding objective which forces the learned units to be useful for semantic image retrieval. We evaluate the sub-word units on the ZeroSpeech 2019 challenge, achieving a 27.3% reduction in ABX error rate over the top-performing submission, while keeping the bitrate approximately the same. We also present experiments demonstrating the noise robustness of these units. Finally, we show that a model with multiple quantizers can simultaneously learn phone-like detectors at a lower layer and word-like detectors at a higher layer. We show that these detectors are highly accurate, discovering 279 words with an F1 score of greater than 0.5.

76 citations

Posted Content•
A Memory Efficient Baseline for Open Domain Question Answering.

[...]

Gautier Izacard1, Fabio Petroni1, Lucas Hosseini1, Nicola De Cao1, Sebastian Riedel1, Edouard Grave1 •
Facebook1
30 Dec 2020-arXiv: Computation and Language
TL;DR: This paper considers three strategies to reduce the index size of dense retriever-reader systems: dimension reduction, vector quantization and passage filtering, and shows that it is possible to get competitive systems using less than 6Gb of memory.
Abstract: Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider three strategies to reduce the index size: dimension reduction, vector quantization and passage filtering. We evaluate our approach on two question answering benchmarks: TriviaQA and NaturalQuestions, showing that it is possible to get competitive systems using less than 6Gb of memory.

51 citations

Journal Article•10.1109/TSP.2020.2983166•
High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning

[...]

Yuqing Du1, Sheng Yang2, Kaibin Huang1•
University of Hong Kong1, University of Paris2
30 Mar 2020-IEEE Transactions on Signal Processing
TL;DR: A novel framework of hierarchical gradient quantization that is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits and to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme.
Abstract: Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional, communication overhead becomes a bottleneck for edge learning. To address this issue, we propose a novel framework of hierarchical gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional Grassmannian codebook, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a socalled hinge vector, which is compressed using another low-dimensional Grassmannian quantizer designed under the criterion of minimum distortion. The other feature of the framework is a bit-allocation scheme for reducing the distortion, which divides the total quantization bits to determine the resolutions of low-dimensional quantizers. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while achieving similar learning accuracies.

44 citations

Proceedings Article•
And the Bit Goes Down: Revisiting the Quantization of Neural Networks

[...]

Pierre Stock1, Armand Joulin1, Rémi Gribonval, Benjamin Graham1, Hervé Jégou1 •
Facebook1
30 Apr 2020
TL;DR: In this article, a vector quantization method was proposed to reduce the memory footprint of convolutional network architectures by preserving the quality of the reconstruction of the network outputs rather than its weights.
Abstract: In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unlabelled data at quantization time and allows for efficient inference on CPU by using byte-aligned codebooks to store the compressed weights. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a 26x factor.

43 citations

Journal Article•10.1109/JSTSP.2020.2975903•
Universal Deep Neural Network Compression

[...]

Yoojin Choi1, Mostafa El-Khamy1, Jungwon Lee1•
Samsung1
24 Feb 2020-IEEE Journal of Selected Topics in Signal Processing
TL;DR: In this paper, weight quantization and lossless source coding are used for memory-efficient deployment of deep neural networks (DNNs) by universal vector quantization, which can perform near-optimally on any source.
Abstract: We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, the proposed scheme utilizes universal lattice quantization, which randomizes the source by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of the source distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover any accuracy loss due to quantization. From our experiments, we show that the proposed scheme compresses the MobileNet and ShuffleNet models trained on ImageNet with the state-of-the-art compression ratios of 10.7 and 8.8, respectively.

31 citations

Journal Article•10.1016/J.MEASUREMENT.2019.107369•
Child emotion recognition using probabilistic neural network with effective features

[...]

Mihir Narayan Mohanty1, Hemanta Kumar Palo1•
Siksha O Anusandhan University1
01 Feb 2020-Measurement
TL;DR: A feature reduction mechanism using the combination of Vector Quantization (VQ) and eigenvalue decomposition for effective feature utility and the lower order Eigen components are more informative as compared to the Principal Components and are considered in this work to analyze children speech emotions.

30 citations

Journal Article•10.1155/2020/8821868•
Multipose Face Recognition-Based Combined Adaptive Deep Learning Vector Quantization

[...]

Shahenda Sarhan1, Shahenda Sarhan2, Aida A. Nasr3, Mahmoud Shams3•
Mansoura University1, King Abdulaziz University2, Kafrelsheikh University3
24 Sep 2020-Computational Intelligence and Neuroscience
TL;DR: The proposed classifier has boosted the weakness of the adaptive deep learning vector quantization classifiers through using the majority voting algorithm with the speeded up robust feature extractor and provided promising results in terms of sensitivity, specificity, precision, and accuracy compared to recent approaches in deep learning, statistical, and classical neural networks.
Abstract: Multipose face recognition system is one of the recent challenges faced by the researchers interested in security applications. Different researches have been introduced discussing the accuracy improvement of multipose face recognition through enhancing the face detector as Viola-Jones, Real Adaboost, and Cascade Object Detector while others concentrated on the recognition systems as support vector machine and deep convolution neural networks. In this paper, a combined adaptive deep learning vector quantization (CADLVQ) classifier is proposed. The proposed classifier has boosted the weakness of the adaptive deep learning vector quantization classifiers through using the majority voting algorithm with the speeded up robust feature extractor. Experimental results indicate that, the proposed classifier provided promising results in terms of sensitivity, specificity, precision, and accuracy compared to recent approaches in deep learning, statistical, and classical neural networks. Finally, the comparison is empirically performed using confusion matrix to ensure the reliability and robustness of the proposed system compared to the state-of art.

29 citations

Journal Article•10.1609/AAAI.V34I04.6108•
Vector Quantization-Based Regularization for Autoencoders

[...]

Hanwei Wu1, Markus Flierl1•
Royal Institute of Technology1
3 Apr 2020
TL;DR: This paper introduces a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations and shows that the proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.
Abstract: Autoencoders and their variations provide unsupervised models for learning low-dimensional representations for downstream tasks. Without proper regularization, autoencoder models are susceptible to the overfitting problem and the so-called posterior collapse phenomenon. In this paper, we introduce a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations. We combine both perspectives of Vector Quantized-Variational AutoEncoders (VQ-VAE) and classical denoising regularization methods of neural networks. We interpret quantizers as regularizers that constrain latent representations while fostering a similarity-preserving mapping at the encoder. Before quantization, we impose noise on the latent codes and use a Bayesian estimator to optimize the quantizer-based representation. The introduced bottleneck Bayesian estimator outputs the posterior mean of the centroids to the decoder, and thus, is performing soft quantization of the noisy latent codes. We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.
Journal Article•10.1007/S11042-018-6358-X•
Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

[...]

Varun Tiwari1, Mohammad Farukh Hashmi, Avinash G. Keskar1, N. C. Shivaprakash2•
Visvesvaraya National Institute of Technology1, Indian Institute of Science2
01 Feb 2020-Multimedia Tools and Applications
TL;DR: A cloud-connected voice based home assistant that accepts voice commands to control or monitor devices in a home through a simple voice based approach and is designed to identify the speakers.
Abstract: With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion.
Journal Article•10.1109/TIP.2020.2984357•
Gaussian Lifting for Fast Bilateral and Nonlocal Means Filtering

[...]

Sean I. Young1, Bernd Girod1, David Taubman2•
Stanford University1, University of New South Wales2
13 Apr 2020-IEEE Transactions on Image Processing
TL;DR: This work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids and shows that it filters images more accurately and efficiently across many filter scales.
Abstract: Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. This work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateral-regularized inverse problems, where the accuracy of the solutions depends ultimately on accurate filter implementations. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.
Journal Article•10.1016/J.INS.2019.06.038•
Smoothed self-organizing map for robust clustering

[...]

Pierpaolo D'Urso1, Livia De Giovanni2, Riccardo Massari1•
Sapienza University of Rome1, Libera Università Internazionale degli Studi Sociali Guido Carli2
01 Feb 2020-Information Sciences
TL;DR: S-SOM improves the properties of input density mapping, vector quantization, and clustering of the standard SOM in the presence of outliers by upgrading the learning rule in order to smooth the representation of outlying input vectors onto the map.
Journal Article•10.1109/TPAMI.2019.2906207•
Learning of Gaussian Processes in Distributed and Communication Limited Systems

[...]

Mostafa Tavassolipour1, Seyed Abolfazl Motahari1, Mohammad Taghi Manzuri Shalmani1•
Sharif University of Technology1
01 Aug 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: In this article, the authors consider learning of Gaussian Processes in distributed systems and propose a vector quantization scheme to estimate the inner products of some Gaussian vectors across distributed machines.
Abstract: It is of fundamental importance to find algorithms obtaining optimal performance for learning of statistical models in distributed and communication limited systems. Aiming at characterizing the optimal strategies, we consider learning of Gaussian Processes (GP) in distributed systems as a pivotal example. We first address a very basic problem: how many bits are required to estimate the inner-products of some Gaussian vectors across distributed machines? Using information theoretic bounds, we obtain an optimal solution for the problem which is based on vector quantization. Two suboptimal and more practical schemes are also presented as substitutes for the vector quantization scheme. In particular, it is shown that the performance of one of the practical schemes which is called per-symbol quantization is very close to the optimal one. Schemes provided for the inner-product calculations are incorporated into our proposed distributed learning methods for GPs. Experimental results show that with spending few bits per symbol in our communication scheme, our proposed methods outperform previous zero rate distributed GP learning schemes such as Bayesian Committee Model (BCM) and Product of experts (PoE).
Journal Article•10.1109/TIP.2019.2936097•
A Framework of Reversible Color-to-Grayscale Conversion With Watermarking Feature

[...]

Yuk-Hee Chan1, Zi-Xin Xu1, Daniel P. K. Lun1•
Hong Kong Polytechnic University1
01 Jan 2020-IEEE Transactions on Image Processing
TL;DR: An information-embedding framework based on a vector quantization-based (VQ-based) RCGC algorithm recently proposed is developed and a palette generation algorithm is proposed to support the information embedding process such that the visual quality of the color-embedded grayscale images and the reconstructed color images can be significantly improved.
Abstract: Reversible color-to-grayscale conversion (RCGC) is a method that embeds the chromatic information of a full color image into its grayscale version such that the original color image can be reconstructed in the future when necessary. In practical applications, it is required to provide a means to authenticate an information-embedded image such that its integrity can be guaranteed. However, none of the current RCGC algorithms take this factor into account. In this paper, to address this issue, we develop an information-embedding framework based on a vector quantization-based (VQ-based) RCGC algorithm recently proposed by us. Under this framework, we propose a RCGC algorithm that can embed both chromatic information and fragile watermark simultaneously into a grayscale image with the same technique to reduce the complexity and improve the efficiency. Like other VQ-based RCGC algorithms, the performance of the proposed RCGC algorithm highly relies on the palette it uses. We also propose a palette generation algorithm in this paper to support the information embedding process such that the visual quality of the color-embedded grayscale images and the reconstructed color images can be significantly improved.
Journal Article•10.1109/LSP.2019.2961610•
Fast Steganalysis Method for VoIP Streams

[...]

Hao Yang1, Zhongliang Yang1, YongJian Bao1, Sheng Liu1, Yongfeng Huang1 •
Tsinghua University1
01 Jan 2020-IEEE Signal Processing Letters
TL;DR: Wang et al. as mentioned in this paper presented a fast steganalysis method for voice over IP (VOIP) streams, driven by the need for a quick and accurate detection of possible steganography in VoIP streams.
Abstract: in this letter, we present a novel and extremely fast steganalysis method for voice over ip (voip) streams, driven by the need for a quick and accurate detection of possible steganography in VoIP streams. We firstly analyzed the correlations in carriers. To better exploit the correlations in code-words, we mapped vector quantization code-words into a semantic space. In order to achieve high detection efficiency, only one hidden layer was utilized to extract the correlations between these code-words. Finally, based on the extracted correlation features, we used the softmax classifier to categorize the input stream carriers. To boost the performance of this proposed model, we incorporate a simple knowledge distillation framework into the training process. Experimental results show that the proposed method achieves state-of-the-art performance both in detection accuracy and efficiency. In particular, the processing time of this method on average is only about 0.05% when sample length is as short as 0.1 s, attaching strong practical value to online serving of steganography monitor.
Journal Article•10.1016/J.IFACOL.2020.12.006•
Convergence of Stochastic Vector Quantization and Learning Vector Quantization with Bregman Divergences

[...]

Christos N. Mavridis1, John S. Baras1•
University of Maryland, College Park1
01 Jan 2020-IFAC-PapersOnLine
TL;DR: The theory of stochastic approximation is employed to study the conditions on the initialization and the Bregman divergence generating functions, under which, the algorithms converge to desired configurations, and formally support the use of Breg man divergences, such as the Kullback-Leibler divergence, in vector quantization algorithms.
Proceedings Article•10.1109/ICISCT50599.2020.9351483•
Optimization of identification of micro-objects based on the use of characteristics of images and properties of models

[...]

Jumanov Isroil Ibragimovich1, Djumanov Olimjon Isroilovich1, Safarov Rustam Abdullayevich1•
Samarkand State University1
4 Nov 2020
TL;DR: In this paper, a methodology has been developed for optimizing the identification of micro-objects based on the use of dynamic models, neural networks (NN) of various topologies, synthesis of mechanisms for extracting statistical, dynamic, specific characteristics of images, selecting and segmenting a contour, selecting reference points, reducing redundant points, and setting variables.
Abstract: A methodology has been developed for optimizing the identification of micro-objects based on the use of dynamic models, neural networks (NN) of various topologies, synthesis of mechanisms for extracting statistical, dynamic, specific characteristics of images, selecting and segmenting a contour, selecting reference points, reducing redundant points, and setting variables. Identification mechanisms based on statistical relationships, many points and dynamics of change, formalization of the coordinate matrices of distorted points, approximations during the deformation of a sequence of segments of stationary contour sections are proposed. A comparative analysis of the effectiveness of tools for preliminary image processing, recognition, and classification on the examples of pollen grains is carried out. Modified component circuits, adaptive learning algorithms of the Kohonen NN. A software package for visualization, recognition, classification of images of pollen grains was developed, a hybrid identification model was implemented with non-linear effects of factors and conditions of a priori insufficiency and uncertainty of parameters. The implemented software package is based on a three-layer NN of forwarding and backward propagation, learning algorithms with and without a teacher, a modified Kohonen network with vector quantization, clustering, segmentation, and the formation of a “sliding window”. The mechanisms of image identification in the presence of “noise”, error filtering, and neural network approximation of the contour curve of micro-objects images are investigated.
Journal Article•10.12928/TELKOMNIKA.V18I5.13717•
Gender voice classification with huge accuracy rate

[...]

Mustafa Sahib Shareef, Thulfiqar Abd, Yaqeen S. Mezaal
01 Oct 2020-TELKOMNIKA Telecommunication Computing Electronics and Control
TL;DR: This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample through Mel frequency cepstrum coefficient, vector quantization, and machine learning algorithm.
Abstract: Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script.
Proceedings Article•10.21437/INTERSPEECH.2020-1785•
Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders.

[...]

Mingjie Chen1, Thomas Hain1•
University of Sheffield1
25 Oct 2020
TL;DR: In this article, a WaveNet auto-encoder is used to generate waveform data directly from the latent representation, and the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization.
Abstract: Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the top position for naturalness (Mean Opinion Score 4.06), top position for intelligibility (Character Error Rate 0.15), and third position for the quality of the representation (ABX test score 12.5). These and further analysis in this paper illustrates that quality of the converted speech and the acoustic units representation can be well balanced.
Journal Article•10.1016/J.PATREC.2019.11.011•
A bag of constrained informative deep visual words for image retrieval

[...]

Anindita Mukherjee, Jaya Sil1, Abhimanyu Sahu2, Ananda S. Chowdhury2•
Indian Institute of Engineering Science and Technology, Shibpur1, Jadavpur University2
01 Jan 2020-Pattern Recognition Letters
TL;DR: A bag (histogram) of constrained informative visual words is developed for image retrieval using the Linear-time Constrained Vector Quantization Error (LCVQE), a fast yet accurate constrained K-means algorithm.
Journal Article•10.1109/TPAMI.2019.2925347•
Asymmetric Mapping Quantization for Nearest Neighbor Search

[...]

Weixiang Hong1, Xueyan Tang2, Jingjing Meng3, Junsong Yuan3•
National University of Singapore1, Nanyang Technological University2, University at Buffalo3
01 Jul 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: This paper proposes a novel addition-based vector quantization algorithm, Asymmetric Mapping Quantization (AMQ), to efficiently conduct ANN search and proposes Distributed Asymmetrical MappingQuantization (DAMQ) to enable AMQ to work on very large dataset by distributed learning.
Abstract: Nearest neighbor search is a fundamental problem in computer vision and machine learning. The straightforward solution, linear scan, is both computationally and memory intensive in large scale high-dimensional cases, hence is not preferable in practice. Therefore, there have been a lot of interests in algorithms that perform approximate nearest neighbor (ANN) search. In this paper, we propose a novel addition-based vector quantization algorithm, Asymmetric Mapping Quantization (AMQ), to efficiently conduct ANN search. Unlike existing addition-based quantization methods that suffer from handling the problem caused by the norm of database vector, we map the query vector and database vector using different mapping functions to transform the computation of L-2 distance to inner product similarity, thus do not need to evaluate the norm of database vector. Moreover, we further propose Distributed Asymmetric Mapping Quantization (DAMQ) to enable AMQ to work on very large dataset by distributed learning. Extensive experiments on approximate nearest neighbor search and image retrieval validate the merits of the proposed AMQ and DAMQ.
Journal Article•10.1109/TNNLS.2019.2935502•
Probability Density Rank-Based Quantization for Convex Universal Learning Machines

[...]

Zhengda Qin1, Badong Chen1, Yuantao Gu2, Nanning Zheng1, Jose C. Principe3 •
Xi'an Jiaotong University1, Tsinghua University2, University of Florida3
01 Aug 2020-IEEE Transactions on Neural Networks
TL;DR: An efficient quantization method called Probability density Rank-based Quantization (PRQ) is proposed to decrease the computational complexity of CULMs and keeps the similarity of data distribution between the code book and data set but also reduces the computational cost by using the kd-tree.
Abstract: The distributions of input data are very important for learning machines, such as the convex universal learning machines (CULMs). The CULMs are a family of universal learning machines with convex optimization. However, the computational complexity is a crucial problem in CULMs, because the dimension of the nonlinear mapping layer (the hidden layer) of the CULMs is usually rather large in complex system modeling. In this article, we propose an efficient quantization method called Probability density Rank-based Quantization (PRQ) to decrease the computational complexity of CULMs. The PRQ ranks the data according to the estimated probability densities and then selects a subset whose elements are equally spaced in the ranked data sequence. We apply the PRQ to kernel ridge regression (KRR) and random Fourier feature recursive least squares (RFF-RLS), which are two typical algorithms of CULMs. The proposed method not only keeps the similarity of data distribution between the code book and data set but also reduces the computational cost by using the kd-tree. Meanwhile, for a given data set, the method yields deterministic quantization results, and it can also exclude the outliers and avoid too many borders in the code book. This brings great convenience to practical applications of the CULMs. The proposed PRQ is evaluated on several real-world benchmark data sets. Experimental results show satisfactory performance of PRQ compared with some state-of-the-art methods.
Posted Content•
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

[...]

Benjamin van Niekerk1, Leanne Nortje, Herman Kamper1•
Stellenbosch University1
19 May 2020-arXiv: Audio and Speech Processing
TL;DR: In this article, vector quantization is used to map continuous features to a finite set of codes for acoustic unit discovery, which can separate phonetic content from speaker-specific details.
Abstract: In this paper, we explore vector quantization for acoustic unit discovery. Leveraging unlabelled data, we aim to learn discrete representations of speech that separate phonetic content from speaker-specific details. We propose two neural models to tackle this challenge - both use vector quantization to map continuous features to a finite set of codes. The first model is a type of vector-quantized variational autoencoder (VQ-VAE). The VQ-VAE encodes speech into a sequence of discrete units before reconstructing the audio waveform. Our second model combines vector quantization with contrastive predictive coding (VQ-CPC). The idea is to learn a representation of speech by predicting future acoustic units. We evaluate the models on English and Indonesian data for the ZeroSpeech 2020 challenge. In ABX phone discrimination tests, both models outperform all submissions to the 2019 and 2020 challenges, with a relative improvement of more than 30%. The models also perform competitively on a downstream voice conversion task. Of the two, VQ-CPC performs slightly better in general and is simpler and faster to train. Finally, probing experiments show that vector quantization is an effective bottleneck, forcing the models to discard speaker information.
Journal Article•10.1007/S00034-019-01152-8•
Lossless Compression of CT Images by an Improved Prediction Scheme Using Least Square Algorithm

[...]

Subbiahpillai Neelakantapillai Kumar1, A. Lenin Fred, H. Ajay Kumar, P. Sebastin Varghese•
Sathyabama University1
01 Feb 2020-Circuits Systems and Signal Processing
TL;DR: A prediction-based lossless compression algorithm using least square approach is proposed for the compression of CT images and was found to be efficient and tested on DICOM abdomen CT datasets.
Abstract: The storage and transmission of medical data such as CT/MR DICOM images are an essential part of the telemedicine application. In this paper, a prediction-based lossless compression algorithm using least square approach is proposed for the compression of CT images. Prior to compression, the preprocessing was performed by neutrosophic median filter. The gradient adjusted prediction scheme was employed for the determination of prediction coefficients, and polynomial least square fitting approach was used for optimal selection of prediction coefficients. The selected prediction coefficients are finally encoded by Huffman coder for transmission. The quality of the reconstructed image was validated by performance metrics and compared with other compression techniques like JPEG, contextual vector quantization and vector quantization using bat optimization (BAT-VQ). The proposed neutrosophic set-based least square compression algorithm was found to be efficient and tested on DICOM abdomen CT datasets. The hardware implementation was done by Raspberry Pi processor using Java platform for transferring the data through cloud network for telemedicine application.
Posted Content•
A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection.

[...]

Songyan Xue, Yi Ma, Na Yi, Terence E. Dodgson
01 Apr 2020-arXiv: Signal Processing
TL;DR: It is revealed that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure.
Abstract: In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches.
Journal Article•10.1109/TPWRD.2020.2964431•
Improved Transient Data Compression Algorithm Based on Wavelet Spectral Quantization Models

[...]

Francisco Assis de Oliveira Nascimento1, Raimundo G. Saraiva1, Jorge Cormane1•
University of Brasília1
06 Jan 2020-IEEE Transactions on Power Delivery
TL;DR: A dedicated encoding algorithm is presented for current and voltage transient signals digitalized from electrical networks and a parametric configuration for optimization algorithm is proposed for the fixed spectral profile models in order to improve performance.
Abstract: In this work a dedicated encoding algorithm is presented for current and voltage transient signals digitalized from electrical networks. New models for dynamic bit allocation based on adaptive and fixed spectral envelope estimation models for transformed coefficients are proposed. The fixed profiles adjust the number of bits to be used by the wavelet transform decomposition level, which is used in the transformed vector quantization. The models proposed to estimate the adaptive spectral envelope have the transformed spectrum segmented by subbands or by decomposition levels of the wavelet transform. The dynamic bit allocation is implemented according to the signal spectral behavior. The quantized coefficient vector is encoded using an entropy coder, and it is asynchronously packaged in a compressed file. A parametric configuration for optimization algorithm is also proposed for the fixed spectral profile models in order to improve performance. Simulation results are presented using a signal data bank with a set of reported events of electric power networks. Performance comparisons with other work are also presented.
Book Chapter•10.1007/978-3-030-61616-8_16•
Hopfield Networks for Vector Quantization

[...]

Christian Bauckhage, Rajkumar Ramamurthy, Rafet Sifa
15 Sep 2020
TL;DR: This work considers the problem of finding representative prototypes within a set of data and solves it using Hopfield networks to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes to suggest that vector quantization can be accomplished via adiabatic quantum computing.
Abstract: We consider the problem of finding representative prototypes within a set of data and solve it using Hopfield networks. Our key idea is to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes. We show that this objective can be cast as a quadratic unconstrained binary optimization problem which is equivalent to a Hopfield energy minimization problem. This result is of current interest as it suggests that vector quantization can be accomplished via adiabatic quantum computing.
Proceedings Article•10.1109/RUSAUTOCON49822.2020.9208164•
Optimization of Identification of Images of Micro-Objects Taking Into Account Systematic Error Based on Neural Networks

[...]

Isroil I. Jumanov1, Olimjon I. Djumanov1, Rustam A. Safarov1•
Samarkand State University1
1 Sep 2020
TL;DR: The results of image identification in the presence of "noise", optimization based on filtering systematic error and NN extrapolation of the trend of the contour curve of the images of pollen grains were obtained.
Abstract: A methodology has been developed for optimizing the identification of micro-objects based on the use of neural networks (NN) of various topologies, synthesis of image processing mechanisms, extracting statistical, dynamic, specific characteristics, selecting and segmenting a contour, selecting reference points and reducing redundant points, taking into account systematic error factors, choosing an adequate model, setting variables and optimization. Methods and algorithms for determined and multivariate analysis, obtaining the coefficients of influence and elasticity of factors, approximating the contours represented by time series are proposed. Modified component schemes of the NN, training algorithms, developed a software package (SP) for visualization, recognition, classification of images of pollen grains, implemented a hybrid identification model taking into account the non-linearity of the effects of factors under the condition of a priori insufficiency and uncertainty of parameters. The efficiency of the SP was studied on the basis of a three-layer NN of forward and backward propagation of errors, learning algorithms with and without a teacher, Kohonen network with procedures for vector quantization, clustering and segmentation and the formation of a "sliding windows". The results of image identification in the presence of "noise", optimization based on filtering systematic error and NN extrapolation of the trend of the contour curve of the images of pollen grains were obtained.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve