Top 163 papers published in the topic of Vector quantization in 2020

Showing papers on "Vector quantization published in 2020"

Proceedings Article•10.1109/ICASSP40776.2020.9054168•

Federated Learning with Quantization Constraints

[...]

Nir Shlezinger¹, Mingzhe Chen², Yonina C. Eldar¹, H. Vincent Poor², Shuguang Cui³ - Show less +1 more•Institutions (3)

Weizmann Institute of Science¹, Princeton University², The Chinese University of Hong Kong³

4 May 2020

TL;DR: This work identifies the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups, and shows that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible.

...read moreread less

Abstract: Traditional deep learning models are trained on centralized servers using labeled sample data collected from edge devices. This data often includes private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to efficiently transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups. We show that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible. We also derive theoretical performance guarantees of the system. Our numerical results illustrate the substantial performance gains of our scheme over FL with previously proposed quantization approaches.

...read moreread less

136 citations

Proceedings Article•10.21437/INTERSPEECH.2020-1443•

VQVC+: One-shot voice conversion by vector quantization and U-Net architecture

[...]

Da-Yi Wu¹, Yen-Hao Chen¹, Hung-yi Lee¹•Institutions (1)

National Taiwan University¹

7 Jun 2020

TL;DR: To further improve audio quality, the U-Net architecture is used within an auto-encoder-based VC system and the VQ-based method, which quantizes the latent vectors, can serve the purpose.

...read moreread less

Abstract: Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting. Auto-encoder-based VC methods disentangle the speaker and the content in input speech without given the speaker's identity, so these methods can further generalize to unseen speakers. The disentangle capability is achieved by vector quantization (VQ), adversarial training, or instance normalization (IN). However, the imperfect disentanglement may harm the quality of output speech. In this work, to further improve audio quality, we use the U-Net architecture within an auto-encoder-based VC system. We find that to leverage the U-Net architecture, a strong information bottleneck is necessary. The VQ-based method, which quantizes the latent vectors, can serve the purpose. The objective and the subjective evaluations show that the proposed method performs well in both audio naturalness and speaker similarity.

...read moreread less

116 citations

Proceedings Article•10.1109/ICASSP40776.2020.9053854•

One-Shot Voice Conversion by Vector Quantization

[...]

Da-Yi Wu¹, Hung-yi Lee¹•Institutions (1)

National Taiwan University¹

4 May 2020

TL;DR: This paper proposes a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label that has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one- shot VC is thus achieved.

...read moreread less

Abstract: In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and quantize-after vector as the speaker embedding. We show that this approach has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one-shot VC is thus achieved.

...read moreread less

108 citations

Proceedings Article•

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

[...]

David Harwath¹, Wei-Ning Hsu¹, James Glass²•Institutions (2)

Massachusetts Institute of Technology¹, Qatar Foundation²

30 Apr 2020

TL;DR: This paper presents a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech and shows that this method is capable of capturing both word-level and sub-word units, depending on how it is configured.

...read moreread less

Abstract: In this paper, we present a method for learning discrete linguistic units by incorporating vector quantization layers into neural models of visually grounded speech. We show that our method is capable of capturing both word-level and sub-word units, depending on how it is configured. What differentiates this paper from prior work on speech unit learning is the choice of training objective. Rather than using a reconstruction-based loss, we use a discriminative, multimodal grounding objective which forces the learned units to be useful for semantic image retrieval. We evaluate the sub-word units on the ZeroSpeech 2019 challenge, achieving a 27.3% reduction in ABX error rate over the top-performing submission, while keeping the bitrate approximately the same. We also present experiments demonstrating the noise robustness of these units. Finally, we show that a model with multiple quantizers can simultaneously learn phone-like detectors at a lower layer and word-like detectors at a higher layer. We show that these detectors are highly accurate, discovering 279 words with an F1 score of greater than 0.5.

...read moreread less

76 citations

Posted Content•

A Memory Efficient Baseline for Open Domain Question Answering.

[...]

Gautier Izacard¹, Fabio Petroni¹, Lucas Hosseini¹, Nicola De Cao¹, Sebastian Riedel¹, Edouard Grave¹ - Show less +2 more•Institutions (1)

Facebook¹

30 Dec 2020-arXiv: Computation and Language

TL;DR: This paper considers three strategies to reduce the index size of dense retriever-reader systems: dimension reduction, vector quantization and passage filtering, and shows that it is possible to get competitive systems using less than 6Gb of memory.

...read moreread less

Abstract: Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider three strategies to reduce the index size: dimension reduction, vector quantization and passage filtering. We evaluate our approach on two question answering benchmarks: TriviaQA and NaturalQuestions, showing that it is possible to get competitive systems using less than 6Gb of memory.

...read moreread less

51 citations

Journal Article•10.1109/TSP.2020.2983166•

High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning

[...]

Yuqing Du¹, Sheng Yang², Kaibin Huang¹•Institutions (2)

University of Hong Kong¹, University of Paris²

30 Mar 2020-IEEE Transactions on Signal Processing

TL;DR: A novel framework of hierarchical gradient quantization that is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits and to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme.

...read moreread less

Abstract: Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional, communication overhead becomes a bottleneck for edge learning. To address this issue, we propose a novel framework of hierarchical gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional Grassmannian codebook, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a socalled hinge vector, which is compressed using another low-dimensional Grassmannian quantizer designed under the criterion of minimum distortion. The other feature of the framework is a bit-allocation scheme for reducing the distortion, which divides the total quantization bits to determine the resolutions of low-dimensional quantizers. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while achieving similar learning accuracies.

...read moreread less

44 citations

Proceedings Article•

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

[...]

Pierre Stock¹, Armand Joulin¹, Rémi Gribonval, Benjamin Graham¹, Hervé Jégou¹ - Show less +1 more•Institutions (1)

Facebook¹

30 Apr 2020

TL;DR: In this article, a vector quantization method was proposed to reduce the memory footprint of convolutional network architectures by preserving the quality of the reconstruction of the network outputs rather than its weights.

...read moreread less

Abstract: In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unlabelled data at quantization time and allows for efficient inference on CPU by using byte-aligned codebooks to store the compressed weights. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a 26x factor.

...read moreread less

43 citations

Journal Article•10.1109/JSTSP.2020.2975903•

Universal Deep Neural Network Compression

[...]

Yoojin Choi¹, Mostafa El-Khamy¹, Jungwon Lee¹•Institutions (1)

Samsung¹

24 Feb 2020-IEEE Journal of Selected Topics in Signal Processing

TL;DR: In this paper, weight quantization and lossless source coding are used for memory-efficient deployment of deep neural networks (DNNs) by universal vector quantization, which can perform near-optimally on any source.

...read moreread less

Abstract: We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, the proposed scheme utilizes universal lattice quantization, which randomizes the source by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of the source distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover any accuracy loss due to quantization. From our experiments, we show that the proposed scheme compresses the MobileNet and ShuffleNet models trained on ImageNet with the state-of-the-art compression ratios of 10.7 and 8.8, respectively.

...read moreread less

31 citations

Journal Article•10.1016/J.MEASUREMENT.2019.107369•

Child emotion recognition using probabilistic neural network with effective features

[...]

Mihir Narayan Mohanty¹, Hemanta Kumar Palo¹•Institutions (1)

Siksha O Anusandhan University¹

01 Feb 2020-Measurement

TL;DR: A feature reduction mechanism using the combination of Vector Quantization (VQ) and eigenvalue decomposition for effective feature utility and the lower order Eigen components are more informative as compared to the Principal Components and are considered in this work to analyze children speech emotions.

...read moreread less

30 citations

Journal Article•10.1155/2020/8821868•

Multipose Face Recognition-Based Combined Adaptive Deep Learning Vector Quantization

[...]

Shahenda Sarhan¹, Shahenda Sarhan², Aida A. Nasr³, Mahmoud Shams³•Institutions (3)

Mansoura University¹, King Abdulaziz University², Kafrelsheikh University³

24 Sep 2020-Computational Intelligence and Neuroscience

TL;DR: The proposed classifier has boosted the weakness of the adaptive deep learning vector quantization classifiers through using the majority voting algorithm with the speeded up robust feature extractor and provided promising results in terms of sensitivity, specificity, precision, and accuracy compared to recent approaches in deep learning, statistical, and classical neural networks.

...read moreread less

Abstract: Multipose face recognition system is one of the recent challenges faced by the researchers interested in security applications. Different researches have been introduced discussing the accuracy improvement of multipose face recognition through enhancing the face detector as Viola-Jones, Real Adaboost, and Cascade Object Detector while others concentrated on the recognition systems as support vector machine and deep convolution neural networks. In this paper, a combined adaptive deep learning vector quantization (CADLVQ) classifier is proposed. The proposed classifier has boosted the weakness of the adaptive deep learning vector quantization classifiers through using the majority voting algorithm with the speeded up robust feature extractor. Experimental results indicate that, the proposed classifier provided promising results in terms of sensitivity, specificity, precision, and accuracy compared to recent approaches in deep learning, statistical, and classical neural networks. Finally, the comparison is empirically performed using confusion matrix to ensure the reliability and robustness of the proposed system compared to the state-of art.

...read moreread less

29 citations

Journal Article•10.1609/AAAI.V34I04.6108•

Vector Quantization-Based Regularization for Autoencoders

[...]

Hanwei Wu¹, Markus Flierl¹•Institutions (1)

Royal Institute of Technology¹

3 Apr 2020

TL;DR: This paper introduces a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations and shows that the proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.

...read moreread less

Abstract: Autoencoders and their variations provide unsupervised models for learning low-dimensional representations for downstream tasks. Without proper regularization, autoencoder models are susceptible to the overfitting problem and the so-called posterior collapse phenomenon. In this paper, we introduce a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations. We combine both perspectives of Vector Quantized-Variational AutoEncoders (VQ-VAE) and classical denoising regularization methods of neural networks. We interpret quantizers as regularizers that constrain latent representations while fostering a similarity-preserving mapping at the encoder. Before quantization, we impose noise on the latent codes and use a Bayesian estimator to optimize the quantizer-based representation. The introduced bottleneck Bayesian estimator outputs the posterior mean of the centroids to the decoder, and thus, is performing soft quantization of the noisy latent codes. We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.

...read moreread less

Journal Article•10.1007/S11042-018-6358-X•

Virtual home assistant for voice based controlling and scheduling with short speech speaker identification

[...]

Varun Tiwari¹, Mohammad Farukh Hashmi, Avinash G. Keskar¹, N. C. Shivaprakash²•Institutions (2)

Visvesvaraya National Institute of Technology¹, Indian Institute of Science²

01 Feb 2020-Multimedia Tools and Applications

TL;DR: A cloud-connected voice based home assistant that accepts voice commands to control or monitor devices in a home through a simple voice based approach and is designed to identify the speakers.

...read moreread less

Abstract: With the advancement of interface technologies in smart devices, voice-controlled assistants have quickly gained popularity. These assistants are designed to use voice commands to achieve a more human-friendly interaction. On these lines, we propose a cloud-connected voice based home assistant in this paper. It accepts voice commands to control or monitor devices in a home. It can understand and schedule device operations based on time or sensor data through a simple voice based approach. To enhance its capability, it is designed to identify the speakers. Mel-Frequency Cepstrum Coefficients (MFCC) in combination with other speech features are used as feature vector. We use Vector Quantization (VQ) and Principal Component Analysis (PCA) for dimensionality reduction of the feature vector, followed by Gaussian Mixture Model (GMM) for classification. The validation of the short speech speaker identification is carried out on a set of Indian speakers in an uncontrolled indoor environment. An accuracy greater than 92% is achieved for speech samples as small as 1 second. A database of more than 50 different commands per speaker is also created for validation of the proposed virtual assistant. IBM’s Bluemix and Google’s cloud service is used for speech to text conversion.

...read moreread less

Journal Article•10.1109/TIP.2020.2984357•

Gaussian Lifting for Fast Bilateral and Nonlocal Means Filtering

[...]

Sean I. Young¹, Bernd Girod¹, David Taubman²•Institutions (2)

Stanford University¹, University of New South Wales²

13 Apr 2020-IEEE Transactions on Image Processing

TL;DR: This work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids and shows that it filters images more accurately and efficiently across many filter scales.

...read moreread less

Abstract: Recently, many fast implementations of the bilateral and the nonlocal filters were proposed based on lattice and vector quantization, e.g. clustering, in higher dimensions. However, these approaches can still be inefficient owing to the complexities in the resampling process or in filtering the high-dimensional resampled signal. In contrast, simply scalar resampling the high-dimensional signal after decorrelation presents the opportunity to filter signals using multi-rate signal processing techniques. This work proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering, appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately implementing the filter is important not only for image processing applications, but also for a number of recently proposed bilateral-regularized inverse problems, where the accuracy of the solutions depends ultimately on accurate filter implementations. We show that our Gaussian lifting approach filters images more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal means filtering are also explored.

...read moreread less

Journal Article•10.1016/J.INS.2019.06.038•

Smoothed self-organizing map for robust clustering

[...]

Pierpaolo D'Urso¹, Livia De Giovanni², Riccardo Massari¹•Institutions (2)

Sapienza University of Rome¹, Libera Università Internazionale degli Studi Sociali Guido Carli²

01 Feb 2020-Information Sciences

TL;DR: S-SOM improves the properties of input density mapping, vector quantization, and clustering of the standard SOM in the presence of outliers by upgrading the learning rule in order to smooth the representation of outlying input vectors onto the map.

...read moreread less

Journal Article•10.1109/TPAMI.2019.2906207•

Learning of Gaussian Processes in Distributed and Communication Limited Systems

[...]

Mostafa Tavassolipour¹, Seyed Abolfazl Motahari¹, Mohammad Taghi Manzuri Shalmani¹•Institutions (1)

Sharif University of Technology¹

01 Aug 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, the authors consider learning of Gaussian Processes in distributed systems and propose a vector quantization scheme to estimate the inner products of some Gaussian vectors across distributed machines.

...read moreread less

Abstract: It is of fundamental importance to find algorithms obtaining optimal performance for learning of statistical models in distributed and communication limited systems. Aiming at characterizing the optimal strategies, we consider learning of Gaussian Processes (GP) in distributed systems as a pivotal example. We first address a very basic problem: how many bits are required to estimate the inner-products of some Gaussian vectors across distributed machines? Using information theoretic bounds, we obtain an optimal solution for the problem which is based on vector quantization. Two suboptimal and more practical schemes are also presented as substitutes for the vector quantization scheme. In particular, it is shown that the performance of one of the practical schemes which is called per-symbol quantization is very close to the optimal one. Schemes provided for the inner-product calculations are incorporated into our proposed distributed learning methods for GPs. Experimental results show that with spending few bits per symbol in our communication scheme, our proposed methods outperform previous zero rate distributed GP learning schemes such as Bayesian Committee Model (BCM) and Product of experts (PoE).

...read moreread less

Journal Article•10.1109/TIP.2019.2936097•

A Framework of Reversible Color-to-Grayscale Conversion With Watermarking Feature

[...]

Yuk-Hee Chan¹, Zi-Xin Xu¹, Daniel P. K. Lun¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Jan 2020-IEEE Transactions on Image Processing

TL;DR: An information-embedding framework based on a vector quantization-based (VQ-based) RCGC algorithm recently proposed is developed and a palette generation algorithm is proposed to support the information embedding process such that the visual quality of the color-embedded grayscale images and the reconstructed color images can be significantly improved.

...read moreread less

Abstract: Reversible color-to-grayscale conversion (RCGC) is a method that embeds the chromatic information of a full color image into its grayscale version such that the original color image can be reconstructed in the future when necessary. In practical applications, it is required to provide a means to authenticate an information-embedded image such that its integrity can be guaranteed. However, none of the current RCGC algorithms take this factor into account. In this paper, to address this issue, we develop an information-embedding framework based on a vector quantization-based (VQ-based) RCGC algorithm recently proposed by us. Under this framework, we propose a RCGC algorithm that can embed both chromatic information and fragile watermark simultaneously into a grayscale image with the same technique to reduce the complexity and improve the efficiency. Like other VQ-based RCGC algorithms, the performance of the proposed RCGC algorithm highly relies on the palette it uses. We also propose a palette generation algorithm in this paper to support the information embedding process such that the visual quality of the color-embedded grayscale images and the reconstructed color images can be significantly improved.

...read moreread less

Journal Article•10.1109/LSP.2019.2961610•

Fast Steganalysis Method for VoIP Streams

[...]

Hao Yang¹, Zhongliang Yang¹, YongJian Bao¹, Sheng Liu¹, Yongfeng Huang¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

01 Jan 2020-IEEE Signal Processing Letters

TL;DR: Wang et al. as mentioned in this paper presented a fast steganalysis method for voice over IP (VOIP) streams, driven by the need for a quick and accurate detection of possible steganography in VoIP streams.

...read moreread less

Abstract: in this letter, we present a novel and extremely fast steganalysis method for voice over ip (voip) streams, driven by the need for a quick and accurate detection of possible steganography in VoIP streams. We firstly analyzed the correlations in carriers. To better exploit the correlations in code-words, we mapped vector quantization code-words into a semantic space. In order to achieve high detection efficiency, only one hidden layer was utilized to extract the correlations between these code-words. Finally, based on the extracted correlation features, we used the softmax classifier to categorize the input stream carriers. To boost the performance of this proposed model, we incorporate a simple knowledge distillation framework into the training process. Experimental results show that the proposed method achieves state-of-the-art performance both in detection accuracy and efficiency. In particular, the processing time of this method on average is only about 0.05% when sample length is as short as 0.1 s, attaching strong practical value to online serving of steganography monitor.

...read moreread less

Journal Article•10.1016/J.IFACOL.2020.12.006•

Convergence of Stochastic Vector Quantization and Learning Vector Quantization with Bregman Divergences

[...]

Christos N. Mavridis¹, John S. Baras¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 2020-IFAC-PapersOnLine

TL;DR: The theory of stochastic approximation is employed to study the conditions on the initialization and the Bregman divergence generating functions, under which, the algorithms converge to desired configurations, and formally support the use of Breg man divergences, such as the Kullback-Leibler divergence, in vector quantization algorithms.

...read moreread less

Proceedings Article•10.1109/ICISCT50599.2020.9351483•

Optimization of identification of micro-objects based on the use of characteristics of images and properties of models

[...]

Jumanov Isroil Ibragimovich¹, Djumanov Olimjon Isroilovich¹, Safarov Rustam Abdullayevich¹•Institutions (1)

Samarkand State University¹

4 Nov 2020

TL;DR: In this paper, a methodology has been developed for optimizing the identification of micro-objects based on the use of dynamic models, neural networks (NN) of various topologies, synthesis of mechanisms for extracting statistical, dynamic, specific characteristics of images, selecting and segmenting a contour, selecting reference points, reducing redundant points, and setting variables.

...read moreread less

Abstract: A methodology has been developed for optimizing the identification of micro-objects based on the use of dynamic models, neural networks (NN) of various topologies, synthesis of mechanisms for extracting statistical, dynamic, specific characteristics of images, selecting and segmenting a contour, selecting reference points, reducing redundant points, and setting variables. Identification mechanisms based on statistical relationships, many points and dynamics of change, formalization of the coordinate matrices of distorted points, approximations during the deformation of a sequence of segments of stationary contour sections are proposed. A comparative analysis of the effectiveness of tools for preliminary image processing, recognition, and classification on the examples of pollen grains is carried out. Modified component circuits, adaptive learning algorithms of the Kohonen NN. A software package for visualization, recognition, classification of images of pollen grains was developed, a hybrid identification model was implemented with non-linear effects of factors and conditions of a priori insufficiency and uncertainty of parameters. The implemented software package is based on a three-layer NN of forwarding and backward propagation, learning algorithms with and without a teacher, a modified Kohonen network with vector quantization, clustering, segmentation, and the formation of a “sliding window”. The mechanisms of image identification in the presence of “noise”, error filtering, and neural network approximation of the contour curve of micro-objects images are investigated.

...read moreread less

Journal Article•10.12928/TELKOMNIKA.V18I5.13717•

Gender voice classification with huge accuracy rate

[...]

Mustafa Sahib Shareef, Thulfiqar Abd, Yaqeen S. Mezaal

01 Oct 2020-TELKOMNIKA Telecommunication Computing Electronics and Control

TL;DR: This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample through Mel frequency cepstrum coefficient, vector quantization, and machine learning algorithm.

...read moreread less

Abstract: Gender voice recognition stands for an imperative research field in acoustics and speech processing as human voice shows very remarkable aspects. This study investigates speech signals to devise a gender classifier by speech analysis to forecast the gender of the speaker by investigating diverse parameters of the voice sample. A database has 2270 voice samples of celebrities, both male and female. Through Mel frequency cepstrum coefficient (MFCC), vector quantization (VQ), and machine learning algorithm (J 48), an accuracy of about 100% is achieved by the proposed classification technique based on data mining and Java script.

...read moreread less

Proceedings Article•10.21437/INTERSPEECH.2020-1785•

Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders.

[...]

Mingjie Chen¹, Thomas Hain¹•Institutions (1)

University of Sheffield¹

25 Oct 2020

TL;DR: In this article, a WaveNet auto-encoder is used to generate waveform data directly from the latent representation, and the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization.

...read moreread less

Abstract: Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the top position for naturalness (Mean Opinion Score 4.06), top position for intelligibility (Character Error Rate 0.15), and third position for the quality of the representation (ABX test score 12.5). These and further analysis in this paper illustrates that quality of the converted speech and the acoustic units representation can be well balanced.

...read moreread less

Journal Article•10.1016/J.PATREC.2019.11.011•

A bag of constrained informative deep visual words for image retrieval

[...]

Anindita Mukherjee, Jaya Sil¹, Abhimanyu Sahu², Ananda S. Chowdhury²•Institutions (2)

Indian Institute of Engineering Science and Technology, Shibpur¹, Jadavpur University²

01 Jan 2020-Pattern Recognition Letters

TL;DR: A bag (histogram) of constrained informative visual words is developed for image retrieval using the Linear-time Constrained Vector Quantization Error (LCVQE), a fast yet accurate constrained K-means algorithm.

...read moreread less

Journal Article•10.1109/TPAMI.2019.2925347•

Asymmetric Mapping Quantization for Nearest Neighbor Search

[...]

Weixiang Hong¹, Xueyan Tang², Jingjing Meng³, Junsong Yuan³•Institutions (3)

National University of Singapore¹, Nanyang Technological University², University at Buffalo³

01 Jul 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel addition-based vector quantization algorithm, Asymmetric Mapping Quantization (AMQ), to efficiently conduct ANN search and proposes Distributed Asymmetrical MappingQuantization (DAMQ) to enable AMQ to work on very large dataset by distributed learning.

...read moreread less

Abstract: Nearest neighbor search is a fundamental problem in computer vision and machine learning. The straightforward solution, linear scan, is both computationally and memory intensive in large scale high-dimensional cases, hence is not preferable in practice. Therefore, there have been a lot of interests in algorithms that perform approximate nearest neighbor (ANN) search. In this paper, we propose a novel addition-based vector quantization algorithm, Asymmetric Mapping Quantization (AMQ), to efficiently conduct ANN search. Unlike existing addition-based quantization methods that suffer from handling the problem caused by the norm of database vector, we map the query vector and database vector using different mapping functions to transform the computation of L-2 distance to inner product similarity, thus do not need to evaluate the norm of database vector. Moreover, we further propose Distributed Asymmetric Mapping Quantization (DAMQ) to enable AMQ to work on very large dataset by distributed learning. Extensive experiments on approximate nearest neighbor search and image retrieval validate the merits of the proposed AMQ and DAMQ.

...read moreread less

Journal Article•10.1109/TNNLS.2019.2935502•

Probability Density Rank-Based Quantization for Convex Universal Learning Machines

[...]

Zhengda Qin¹, Badong Chen¹, Yuantao Gu², Nanning Zheng¹, Jose C. Principe³ - Show less +1 more•Institutions (3)

Xi'an Jiaotong University¹, Tsinghua University², University of Florida³

01 Aug 2020-IEEE Transactions on Neural Networks

TL;DR: An efficient quantization method called Probability density Rank-based Quantization (PRQ) is proposed to decrease the computational complexity of CULMs and keeps the similarity of data distribution between the code book and data set but also reduces the computational cost by using the kd-tree.

...read moreread less

Abstract: The distributions of input data are very important for learning machines, such as the convex universal learning machines (CULMs). The CULMs are a family of universal learning machines with convex optimization. However, the computational complexity is a crucial problem in CULMs, because the dimension of the nonlinear mapping layer (the hidden layer) of the CULMs is usually rather large in complex system modeling. In this article, we propose an efficient quantization method called Probability density Rank-based Quantization (PRQ) to decrease the computational complexity of CULMs. The PRQ ranks the data according to the estimated probability densities and then selects a subset whose elements are equally spaced in the ranked data sequence. We apply the PRQ to kernel ridge regression (KRR) and random Fourier feature recursive least squares (RFF-RLS), which are two typical algorithms of CULMs. The proposed method not only keeps the similarity of data distribution between the code book and data set but also reduces the computational cost by using the kd-tree. Meanwhile, for a given data set, the method yields deterministic quantization results, and it can also exclude the outliers and avoid too many borders in the code book. This brings great convenience to practical applications of the CULMs. The proposed PRQ is evaluated on several real-world benchmark data sets. Experimental results show satisfactory performance of PRQ compared with some state-of-the-art methods.

...read moreread less

Posted Content•

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

[...]

Benjamin van Niekerk¹, Leanne Nortje, Herman Kamper¹•Institutions (1)

Stellenbosch University¹

19 May 2020-arXiv: Audio and Speech Processing

TL;DR: In this article, vector quantization is used to map continuous features to a finite set of codes for acoustic unit discovery, which can separate phonetic content from speaker-specific details.

...read moreread less

Abstract: In this paper, we explore vector quantization for acoustic unit discovery. Leveraging unlabelled data, we aim to learn discrete representations of speech that separate phonetic content from speaker-specific details. We propose two neural models to tackle this challenge - both use vector quantization to map continuous features to a finite set of codes. The first model is a type of vector-quantized variational autoencoder (VQ-VAE). The VQ-VAE encodes speech into a sequence of discrete units before reconstructing the audio waveform. Our second model combines vector quantization with contrastive predictive coding (VQ-CPC). The idea is to learn a representation of speech by predicting future acoustic units. We evaluate the models on English and Indonesian data for the ZeroSpeech 2020 challenge. In ABX phone discrimination tests, both models outperform all submissions to the 2019 and 2020 challenges, with a relative improvement of more than 30%. The models also perform competitively on a downstream voice conversion task. Of the two, VQ-CPC performs slightly better in general and is simpler and faster to train. Finally, probing experiments show that vector quantization is an effective bottleneck, forcing the models to discard speaker information.

...read moreread less

Journal Article•10.1007/S00034-019-01152-8•

Lossless Compression of CT Images by an Improved Prediction Scheme Using Least Square Algorithm

[...]

Subbiahpillai Neelakantapillai Kumar¹, A. Lenin Fred, H. Ajay Kumar, P. Sebastin Varghese•Institutions (1)

Sathyabama University¹

01 Feb 2020-Circuits Systems and Signal Processing

TL;DR: A prediction-based lossless compression algorithm using least square approach is proposed for the compression of CT images and was found to be efficient and tested on DICOM abdomen CT datasets.

...read moreread less

Abstract: The storage and transmission of medical data such as CT/MR DICOM images are an essential part of the telemedicine application. In this paper, a prediction-based lossless compression algorithm using least square approach is proposed for the compression of CT images. Prior to compression, the preprocessing was performed by neutrosophic median filter. The gradient adjusted prediction scheme was employed for the determination of prediction coefficients, and polynomial least square fitting approach was used for optimal selection of prediction coefficients. The selected prediction coefficients are finally encoded by Huffman coder for transmission. The quality of the reconstructed image was validated by performance metrics and compared with other compression techniques like JPEG, contextual vector quantization and vector quantization using bat optimization (BAT-VQ). The proposed neutrosophic set-based least square compression algorithm was found to be efficient and tested on DICOM abdomen CT datasets. The hardware implementation was done by Raspberry Pi processor using Java platform for transferring the data through cloud network for telemedicine application.

...read moreread less

Posted Content•

A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection.

[...]

Songyan Xue, Yi Ma, Na Yi, Terence E. Dodgson

01 Apr 2020-arXiv: Signal Processing

TL;DR: It is revealed that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure.

...read moreread less

Abstract: In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches.

...read moreread less

Journal Article•10.1109/TPWRD.2020.2964431•

Improved Transient Data Compression Algorithm Based on Wavelet Spectral Quantization Models

[...]

Francisco Assis de Oliveira Nascimento¹, Raimundo G. Saraiva¹, Jorge Cormane¹•Institutions (1)

University of Brasília¹

06 Jan 2020-IEEE Transactions on Power Delivery

TL;DR: A dedicated encoding algorithm is presented for current and voltage transient signals digitalized from electrical networks and a parametric configuration for optimization algorithm is proposed for the fixed spectral profile models in order to improve performance.

...read moreread less

Abstract: In this work a dedicated encoding algorithm is presented for current and voltage transient signals digitalized from electrical networks. New models for dynamic bit allocation based on adaptive and fixed spectral envelope estimation models for transformed coefficients are proposed. The fixed profiles adjust the number of bits to be used by the wavelet transform decomposition level, which is used in the transformed vector quantization. The models proposed to estimate the adaptive spectral envelope have the transformed spectrum segmented by subbands or by decomposition levels of the wavelet transform. The dynamic bit allocation is implemented according to the signal spectral behavior. The quantized coefficient vector is encoded using an entropy coder, and it is asynchronously packaged in a compressed file. A parametric configuration for optimization algorithm is also proposed for the fixed spectral profile models in order to improve performance. Simulation results are presented using a signal data bank with a set of reported events of electric power networks. Performance comparisons with other work are also presented.

...read moreread less

Book Chapter•10.1007/978-3-030-61616-8_16•

Hopfield Networks for Vector Quantization

[...]

Christian Bauckhage, Rajkumar Ramamurthy, Rafet Sifa

15 Sep 2020

TL;DR: This work considers the problem of finding representative prototypes within a set of data and solves it using Hopfield networks to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes to suggest that vector quantization can be accomplished via adiabatic quantum computing.

...read moreread less

Abstract: We consider the problem of finding representative prototypes within a set of data and solve it using Hopfield networks. Our key idea is to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes. We show that this objective can be cast as a quadratic unconstrained binary optimization problem which is equivalent to a Hopfield energy minimization problem. This result is of current interest as it suggests that vector quantization can be accomplished via adiabatic quantum computing.

...read moreread less

Proceedings Article•10.1109/RUSAUTOCON49822.2020.9208164•

Optimization of Identification of Images of Micro-Objects Taking Into Account Systematic Error Based on Neural Networks

[...]

Isroil I. Jumanov¹, Olimjon I. Djumanov¹, Rustam A. Safarov¹•Institutions (1)

Samarkand State University¹

1 Sep 2020

TL;DR: The results of image identification in the presence of "noise", optimization based on filtering systematic error and NN extrapolation of the trend of the contour curve of the images of pollen grains were obtained.

...read moreread less

Abstract: A methodology has been developed for optimizing the identification of micro-objects based on the use of neural networks (NN) of various topologies, synthesis of image processing mechanisms, extracting statistical, dynamic, specific characteristics, selecting and segmenting a contour, selecting reference points and reducing redundant points, taking into account systematic error factors, choosing an adequate model, setting variables and optimization. Methods and algorithms for determined and multivariate analysis, obtaining the coefficients of influence and elasticity of factors, approximating the contours represented by time series are proposed. Modified component schemes of the NN, training algorithms, developed a software package (SP) for visualization, recognition, classification of images of pollen grains, implemented a hybrid identification model taking into account the non-linearity of the effects of factors under the condition of a priori insufficiency and uncertainty of parameters. The efficiency of the SP was studied on the basis of a three-layer NN of forward and backward propagation of errors, learning algorithms with and without a teacher, Kohonen network with procedures for vector quantization, clustering and segmentation and the formation of a "sliding windows". The results of image identification in the presence of "noise", optimization based on filtering systematic error and NN extrapolation of the trend of the contour curve of the images of pollen grains were obtained.

...read moreread less

...

Expand