TL;DR: In this paper , Residual-Quantized VAE (RQ-VAE) and RQ-Transformer are proposed to generate high-resolution images with a fixed codebook size.
Abstract: For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a $256\times 256$ image as $8\times 8$ resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework out-performs the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.
TL;DR: A novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC is proposed, called AVQVC, and a new training method is applied to VQVC to separate content and timbre information from speech more effectively.
Abstract: Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timbre and the linguistic content information from a speech signal. Once successful, voice conversion will be feasible and straightforward. This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. A new training method is applied to VQVC to separate content and timbre information from speech more effectively. The result shows that this approach has better performance than VQVC in separating content and timbre to improve the sound quality of generated speech.
TL;DR: Zhang et al. as discussed by the authors proposed a vector quantization-based face restoration method (VQFR) which takes advantage of high-quality low-level feature banks extracted from high quality faces and can help recover realistic facial details.
Abstract: Although generative facial prior and geometric prior have recently demonstrated high-quality results for blind face restoration, producing fine-grained facial details faithful to inputs remains a challenging problem. Motivated by the classical dictionary-based methods and the recent vector quantization (VQ) technique, we propose a VQ-based face restoration method – VQFR. VQFR takes advantage of high-quality low-level feature banks extracted from high-quality faces and can thus help recover realistic facial details. However, the simple application of the VQ codebook cannot achieve good results with faithful details and identity preservation. Therefore, we further introduce two special network designs. 1). We first investigate the compression patch size in the VQ codebook and find that the VQ codebook designed with a proper compression patch size is crucial to balance the quality and fidelity. 2). To further fuse low-level features from inputs while not “contaminating” the realistic details generated from the VQ codebook, we proposed a parallel decoder consisting of a texture decoder and a main decoder. Those two decoders then interact with a texture warping module with deformable convolution. Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods.
TL;DR: Xiaosu et al. as discussed by the authors proposed a vectorized prior to model latent variables with priors and hyperpriors to reveal visual redundancies to improve rate-distortion performance and parallel processing ability.
Abstract: Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression. Formally, trade-off between rate and distortion is handled well if priors and hyperpriors precisely describe latent variables. Current practices only adopt univariate priors and process each variable individually. However, we find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. These findings reveal visual redundancies to improve rate-distortion performance and parallel processing ability to speed up compression. This encourages us to propose a novel vectorized prior. Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved. Furthermore, code books involved in quantization are extended to multi-codebooks for complexity reduction, which formulates an efficient compression procedure. Extensive experiments on benchmark datasets against state-of-the-art indicate our model has better rate-distortion performance and an impressive 3.18x compression speed up, giving us the ability to perform real-time, high-quality variational image compression in practice. Our source code is publicly available at https://github.com/xiaosu-zhu/McQuic.
TL;DR: In this article , the authors propose a quantization objective to minimize the distance between two distributions by leveraging the discrete property of hash functions, which can be integrated into any existing supervised hashing method to improve code balance and quantization error.
Abstract: Image hashing is a principled approximate nearest neighbor approach to find similar items to a query in a large collection of images. Hashing aims to learn a binary-output function that maps an image to a binary vector. For optimal retrieval performance, producing balanced hash codes with low-quantization error to bridge the gap between the learning stage's continuous relaxation and the inference stage's discrete quantization is important. However, in the existing deep supervised hashing methods, coding balance and low-quantization error are difficult to achieve and involve several losses. We argue that this is because the existing quantization approaches in these methods are heuristically constructed and not effective to achieve these objectives. This paper considers an alternative approach to learning the quantization constraints. The task of learning balanced codes with low quantization error is re-formulated as matching the learned distribution of the continuous codes to a pre-defined discrete, uniform distribution. This is equivalent to minimizing the distance between two distributions. We then propose a computationally efficient distributional distance by leveraging the discrete property of the hash functions. This distributional distance is a valid distance and enjoys lower time and sample complexities. The proposed single-loss quantization objective can be integrated into any existing supervised hashing method to improve code balance and quantization error. Experiments confirm that the proposed approach substantially improves the performance of several representative hashing methods.
TL;DR: Distill-VQ is proposed, which unifies the learning of IVF and PQ within a knowledge distillation framework and is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality.
Abstract: Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data. In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers'', which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students'', which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice. The evaluations are performed on MS MARCO and Natural Questions benchmarks, where Distill-VQ notably outperforms the SOTA VQ methods in Recall and MRR. Our code is avaliable at https://github.com/staoxiao/LibVQ.
TL;DR: This paper shows that for vehicle identification, the combination of morphological feature extraction and LVQ algorithm produces a model that can identify vehicles based on their shape and classify classes through competitive layers that are supervised by a single layer network architecture, this makes the computational process faster and does not burden the computational processes.
Abstract: The increase in the number of vehicles every year results in traffic jams. So it is necessary to identify the type of vehicle, so that the vehicle can be arranged according to the path. This study aims to develop a system that can identify the type of vehicle using the Learning Vector Quantization (LVQ) algorithm. In order for LVQ to work well in identifying, information in the form of characteristics of the object is needed. For this reason, the LVQ algorithm is combined with morphological feature extraction using the parameters of area, circumference, eccentricity, major axis length, and minor axis length to obtain shape features. Based on the test results using a confusion matrix by calculating precision, recall and accuracy, it is obtained that the precision value is 85%, recall is 82% and accuracy is 83%. This paper shows that for vehicle identification, the combination of morphological feature extraction and LVQ algorithm produces a model that can identify vehicles based on their shape and classify classes through competitive layers that are supervised by a single layer network architecture, this makes the computational process faster and does not burden the computational process.
TL;DR: In this article , a conditional entropy model is proposed to improve entropy coding by modeling the co-dependencies of the quantized latent codes, which is based on the Masked Image Modeling (MIM) framework.
Abstract: Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.
TL;DR: MeCoQ as discussed by the authors learns unsupervised binary descriptors by contrastive learning, which can better capture discriminative visual semantics and uncover that codeword diversity regularization is critical to prevent contrastive-learning-based quantization from model degeneration.
Abstract: The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems. To alleviate the reliance on expensive annotations, unsupervised deep hashing becomes an important research problem. This paper provides a novel solution to unsupervised deep quantization, namely Contrastive Quantization with Code Memory (MeCoQ). Different from existing reconstruction-based strategies, we learn unsupervised binary descriptors by contrastive learning, which can better capture discriminative visual semantics. Besides, we uncover that codeword diversity regularization is critical to prevent contrastive learning-based quantization from model degeneration. Moreover, we introduce a novel quantization code memory module that boosts contrastive learning with lower feature drift than conventional feature memories. Extensive experiments on benchmark datasets show that MeCoQ outperforms state-of-the-art methods. Code and configurations are publicly released.
TL;DR: In this article , the tabu search algorithm is employed to rearrange codewords by fully exploiting their neighboring correlations, yielding a moepressible rearranged indices, by combining more highly correlated indices of a to-be-predicted index into prediction, the improved linear regression method is then applied to achieve a sharper prediction error histogram and less required additional information.
TL;DR: This work proposes a new tensor quantization (TQ) framework which does not need to reduce the dimensionality of the original image data and destroy the original two-dimensional spatial relationship among data; these two serious drawbacks of vector quantization are well known.
Abstract: Quantization is an important technique to transform the input sample values from a large set (or a continuous range) into the output sample values in a small set (or a finite set). It has been applied broadly for lossy-data compression, pattern recognition, probability density estimation, and clustering. Vector quantization (VQ) is a prevalent image-compression technique, which treats image matrices as stretched vectors and then finds the representative stretched vectors accordingly for a given image data set. One can use tensor data representation to directly characterize the original two-dimensional image data rather than stretch the image matrix into a long vector so as to destroy the original two-dimensional data structure. In this work, we propose a new tensor quantization (TQ) framework which does not need to reduce the dimensionality of the original image data and destroy the original two-dimensional spatial relationship among data; these two serious drawbacks of vector quantization are well known. We first present tensor calculus and then propose a new parallel tensor-inversion algorithm for TQ thereupon. We also establish the pertinent theoretical proof to justify that our proposed new TQ approach is superior to the existing VQ approach especially as the image dimension becomes large. Finally, numerical experiments to evaluate the image-compression performances of VQ and TQ are demonstrated and their corresponding computational-complexities are also compared.
TL;DR: RepCONC as discussed by the authors jointly trains dual-encoders and the Product Quantization (PQ) method to learn discrete document representations and enables fast approximate nearest neighbor search with compact indexes.
Abstract: Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders and the Product Quantization (PQ) method to learn discrete document representations and enables fast approximate NNS with compact indexes. It models quantization as a constrained clustering process, which requires the document embeddings to be uniformly clustered around the quantization centroids and supports end-to-end optimization of the quantization method and dual-encoders. We theoretically demonstrate the importance of the uniform clustering constraint in RepCONC and derive an efficient approximate solution for constrained clustering by reducing it to an instance of the optimal transport problem. Besides constrained clustering, RepCONC further adopts a vector-based inverted file system (IVF) to support highly efficient vector search on CPUs. Extensive experiments on two popular ad-hoc retrieval benchmarks show that RepCONC achieves better ranking effectiveness than competitive vector quantization baselines under different compression ratio settings. It also substantially outperforms a wide range of existing retrieval models in terms of retrieval effectiveness, memory efficiency, and time efficiency.
TL;DR: In this article , the authors proposed a digital control scheme based on space vector pulse density modulation for two-level five-phase induction motor drive, which combines principles of digital signal processing techniques, such as sigma delta modulation and vector quantization, along with space vector modulation to form computationally efficient motor drives.
Abstract: Variable speed drives incorporating multiphase motors have started gaining attention in recent years due to their benefits over three-phase counter parts. In this article, we propose a digital control scheme based on space vector pulse density modulation for two-level five phase induction motor drive. The scheme combines principles of digital signal processing techniques, such as sigma delta modulation and vector quantization, along with space vector modulation to form computationally efficient motor drives. In this article, vector space is divided into ten nonoverlapping regions, with each region partitioned into three voronoi regions. The reference vector is vector quantized to the nearest switching vector. Absence of dwell time calculations, less memory requirements, and reduced computational complexity of the scheme along with reduced acoustic noise and electromagnetic interference in the drive are the core advantages of the scheme. The work has been experimentally implemented with 2 HP five-phase induction motor drive and the results are compared with space vector pulsewidth modulation scheme for validation purpose.
TL;DR: This work proposes to use counterfactual explanations for explaining rejects and investigates how to efficiently compute counterfactually explanations of different reject options for an important class of models, namely prototypebased classifiers such as learning vector quantization models.
Abstract: While machine learning models are usually assumed to always output a prediction, there also exist extensions in the form of reject options which allow the model to reject inputs where only a prediction with an unacceptably low certainty would be possible. With the ongoing rise of eXplainable AI, a lot of methods for explaining model predictions have been developed. However, understanding why a given input was rejected, instead of being classified by the model, is also of interest. Surprisingly, explanations of rejects have not been considered so far. We propose to use counterfactual explanations for explaining rejects and investigate how to efficiently compute counterfactual explanations of different reject options for an important class of models, namely prototypebased classifiers such as learning vector quantization models.
TL;DR: In this article , the spatially conditional normalization is incorporated to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images.
Abstract: Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
TL;DR: In this paper , a quantization method with finite data rate is proposed for handling the nested quantization and achieving the convergence to the origin of the system with input and output quantization.
Abstract: This paper considers a control problem of discrete-time MIMO linear systems involving input quantization (for the output of the controller) and output quantization (for the output of the plant). Based on spherical polar coordinate quantizer, a quantization method with finite data rate is proposed for handling the nested quantization and achieving the convergence to the origin of the system with input and output quantization. Further, an optimalization problem is developed for each quantizer, of which the optimal solution set contains a virtual augmented vector. The quantizers quantize their respective virtual augmented vectors to obtain the estimates of the outputs of the plant and the controller. We present the analytical expression of the optimal solution set for each quantizer. Finally, a method is presented to achieve the parameters of the quantizers and the controller.
TL;DR: In this paper , a k-means clustering algorithm-based quantizer, named KMQBlock, is proposed to find the appropriate quantization levels according to the distribution of feedback codewords by using scalar and vector quantization.
Abstract: Deep learning-based channel state information (CSI) feedback can provide high downlink throughput for massive multiple-input multiple-output (MIMO) system in frequency division duplex mode. The compression and quantization of CSI greatly reduce the feedback overhead. However, existing quantization methods design quantization levels without matching the distribution of feedback codewords, which impairs the reconstruction accuracy of CSI. In this letter, we propose a k-means clustering algorithm-based quantizer, named KMQBlock, which can find the appropriate quantization levels according to the distribution of feedback codewords by using scalar and vector quantization. To illustrate the efficacy of KMQBlock, we firstly propose a deep learning-based CSI feedback network MRNet to improve the CSI reconstruction accuracy. Then, we apply KMQBlock to MRNet to quantize the feedback codewords. Experimental results show that KMQBlock can provide at most 8.20 dB performance improvement for MRNet compared with conventional quantization methods.
TL;DR: Zhang et al. as mentioned in this paper introduced the vector quantization technique into the image-to-image translation framework to facilitate not only the translation, but also the unconditional distribution shared among different domains.
Abstract: Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the conditional contexts. In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. The vector quantized content representation can facilitate not only the translation, but also the unconditional distribution shared among different domains. Meanwhile, along with the disentangled style representation, the proposed method further enables the capability of image extension with flexibility in both intra- and inter-domains. Qualitative and quantitative experiments demonstrate that our framework achieves comparable performance to the state-of-the-art image-to-image translation and image extension methods. Compared to methods for individual tasks, the proposed method, as a unified framework, unleashes applications combining image-to-image translation, unconditional generation, and image extension altogether. For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities.
TL;DR: Experimental results show that the proposed reversible data-hiding scheme in encrypted, vector quantization (VQ) encoded images can achieve high hiding capacity and satisfactory directly decrypted image quality and guarantee security and reversibility simultaneously.
Abstract: In this paper, a reversible data-hiding scheme in encrypted, vector quantization (VQ) encoded images is proposed. During image encryption, VQ-encoded image, including codebook and index table, is encrypted by content owner with stream-cipher and permutation to protect the privacy of image contents. As for additional-data embedding, a baseline method is first proposed and its corresponding optimized method is then given. By grouping one high-occurrence index with one or multiple low-occurrence indices, a series of index groups are constructed. Thus, by modifying the high-occurrence index to the corresponding index within the same group according to the current to-be-embedded bits, data embedding can be realized. The optimal hiding capacity is obtained by optimizing the coefficient vector for different types of index groups. Separable operations of data extraction, image decryption, and recovery can be achieved on the receiver side based on the availability of the encryption and data-hiding keys. Experimental results show that our scheme can achieve high hiding capacity and satisfactory directly decrypted image quality and guarantee security and reversibility simultaneously.
TL;DR: Wang et al. as mentioned in this paper presented a codebook-softened product quantization (CSPQ) method to achieve more quantization levels by softening codebooks, which can be combined with other non-exhaustive frameworks to achieve fast search.
TL;DR: In this paper , the vector quantization error is replaced by product of the original error and a normalized noise vector, the samples of which are drawn from a zero-mean, unit-variance normal distribution.
Abstract: Machine learning algorithms have been shown to be highly effective in solving optimization problems in a wide range of applications. Such algorithms typically use gradient descent with backpropagation and the chain rule. Hence, the backpropagation fails if intermediate gradients are zero for some functions in the computational graph, because it causes the gradients to collapse when multiplying with zero. Vector quantization is one of those challenging functions for machine learning algorithms, since it is a piece-wise constant function and its gradient is zero almost everywhere. A typical solution is to apply the straight through estimator which simply copies the gradients over the vector quantization function in the backpropagation. Other solutions are based on smooth or stochastic approximation. This study proposes a vector quantization technique called NSVQ, which approximates the vector quantization behavior by substituting a multiplicative noise so that it can be used for machine learning problems. Specifically, the vector quantization error is replaced by product of the original error and a normalized noise vector, the samples of which are drawn from a zero-mean, unit-variance normal distribution. We test our proposed NSVQ in three scenarios with various types of applications. Based on the experiments, the proposed NSVQ achieves more accuracy and faster convergence in comparison to the straight through estimator, exponential moving averages, and the MiniBatchKmeans approaches.
TL;DR: This work proposes learning to dynamically select discretization tightness conditioned on inputs, based on the hypothesis that data naturally contains variations in complexity that call for different levels of representational coarseness.
Abstract: Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit. It has been theoretically and empirically shown that discretization of representations leads to improved generalization, including in reinforcement learning where discretization can be used to bottleneck multi-agent communication to promote agent specialization and robustness. The discretization tightness of most VQ-based methods is defined by the number of discrete codes in the representation vector and the codebook size, which are fixed as hyperparameters. In this work, we propose learning to dynamically select discretization tightness conditioned on inputs, based on the hypothesis that data naturally contains variations in complexity that call for different levels of representational coarseness which is observed in many heterogeneous data sets. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks with heterogeneity in representations.
TL;DR: In this article , the eigenvector-based CSI feedback was used for channel state information (CSI) reconstruction and a DL-based approach, referred to as MixerNet, was proposed.
Abstract: Deep learning (DL) methods have been widely used for channel state information (CSI) feedback to reduce the feedback overhead. CSI feedback mainly includes full CSI feedback and the eigenvector-based CSI feedback. This paper focuses on the eigenvector-based CSI feedback and designs a DL-based approach, referred to as MixerNet, where the joint eigenvector composed of multiple subbands is first compressed by an encoder at the transmitter and then recovered by a decoder at the receiver. On the other hand, the compressed information should be quantized before being transmitted to the decoder, therefore uniform quantization (UQ) and vector quantization (VQ) are respectively studied to improve the system performance. Experiment results indicate that the designed MixerNet could recover CSI with high reconstruction quality, however has fewer trainable parameters and lower computation complexity compared with existing DL-based methods. Moreover, VQ method in the MixerNet outperforms UQ method in terms of CSI reconstruction quality.
TL;DR: In this paper , a new method for designing matched digital filters with discrete valued coefficients is presented, where fuzzy particle swarm optimization vector quantization (FPSOVQ) has been applied to obtain the optimum codebook in design of matched wavelet function.
TL;DR: Zhang et al. as discussed by the authors proposed angular deep supervised vector quantization (ADSVQ) for image retrieval, which can simultaneously learn the discriminative feature representation and the updatable codebook, both lying on a hypersphere.
Abstract: Most of the deep quantization methods adopt unsupervised approaches, and the quantization process usually occurs in the Euclidean space on top of the deep feature and its approximate value. When this approach is applied to the retrieval tasks, since the internal product space of the retrieval process is different from the Euclidean space of quantization, minimizing the quantization error (QE) does not necessarily lead to a good performance on the maximum inner product search (MIPS). To solve these problems, we treat Softmax classification as vector quantization (VQ) with angular decision boundaries and propose angular deep supervised VQ (ADSVQ) for image retrieval. Our approach can simultaneously learn the discriminative feature representation and the updatable codebook, both lying on a hypersphere. To reduce the QE between centroids and deep features, two regularization terms are proposed as supervision signals to encourage the intra-class compactness and inter-class balance, respectively. ADSVQ explicitly reformulates the asymmetric distance computation in MIPS to transform the image retrieval process into a two-stage classification process. Moreover, we discuss the extension of multiple-label cases from the perspective of quantization with binary classification. Extensive experiments demonstrate that the proposed ADSVQ has excellent performance on four well-known image data sets when compared with the state-of-the-art hashing methods.
TL;DR: A reversible data hiding scheme for encrypted images that utilizes an all-permutation technique to embed data into encrypted images to provide a high embedding rate and reduce the hardware burden on the receiver is proposed.
Abstract: Due to its applications in cloud computing, research on reversible data hiding in encrypted images (RDHEI) is becoming more and more important. This paper proposes a reversible data hiding scheme for encrypted images that utilizes an all-permutation technique to embed data into encrypted images. The proposed scheme follows a block-wise data hiding process. Message extraction and image restoration are performed by the receiver using the trained vector quantization (VQ) codebook. This scheme can provide a high embedding rate and reduce the hardware burden on the receiver.
TL;DR: In this article , a two-level nested U-structure was developed for one-shot voice conversion, called U 2 -VC, which can convert a timbre of one speech from one source speaker to another target speaker even for unseen speakers in the training dataset.
TL;DR: In this article , the authors provide a general framework for incorporating recurrent structures in an LVQ network and derive two classification models as variants of Recurrent Learning Vector Quantization, namely RecLVQ and LVQRNN.
TL;DR: In this paper , a cross-scale scalable vector quantization scheme (CSVQ) is proposed, in which multi-scale features are encoded progressively with stepwise feature fusion and refinement.
Abstract: Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement. In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves the quality as more bits are available. The proposed CSVQ scheme can be flexibly applied to any neural audio coding network with a mirrored auto-encoder structure to achieve bitrate scalability. Subjective results show that the proposed scheme outperforms the classical residual VQ (RVQ) with scalability. Moreover, the proposed CSVQ at 3 kbps outperforms Opus at 9 kbps and Lyra at 3kbps and it could provide a graceful quality boost with bitrate increase.
TL;DR: In this article , the authors propose a method for joint privacy enhancement and quantization (JoPEQ), which unifies lossy compression and privacy enhancement for federated learning, and demonstrate that JoPEQ reduces the overall distortion compared to individual LDP and compression, which is translated into improved trained models.
Abstract: Federated learning (FL) is an emerging paradigm for training machine learning models using possibly private data available at edge devices. Among the key challenges associated with FL are first the need to preserve the privacy of the local data sets, and second the communication load due to the repeated exchange of updated models; both are often tackled individually with methods whose operation distorts the updated models, e.g., local differential privacy (LDP) mechanisms and lossy compres- sion, respectively. In this work we propose a method for joint privacy enhancement and quantization (JoPEQ), unifying lossy compression and privacy enhancement for FL. JoPEQ utilizes universal vector quantization, where distortion is statistically equivalent to additive noise, and augments the compression distortion with dedicated privacy preserving noise to simultaneously achieve compression and a desired privacy level. We numerically demonstrate that JoPEQ reduces the overall distortion compared to individual LDP and compression, which is translated into improved trained models.