Monaural Music Source Separation Using Convolutional Sparse Coding

doi:10.1109/TASLP.2016.2598323

Open AccessJournal Article10.1109/TASLP.2016.2598323

Monaural Music Source Separation Using Convolutional Sparse Coding

Ping-Keng Jao, +3 more

- 01 Nov 2016

- IEEE Transactions on Audio, Speech, and ...

- Vol. 24, Iss: 11, pp 2158-2170

18

TL;DR: The result shows that the proposed approach remains effective with a larger dictionary, and compares favorably with the state-of-the-art nonnegative matrix factorization approach, however, in the absence of the score and in the case of a small dictionary, the approach may not be better.

Abstract: We present a comprehensive performance study of a new time-domain approach for estimating the components of an observed monaural audio mixture. Unlike existing time-frequency approaches that use the product of a set of spectral templates and their corresponding activation patterns to approximate the spectrogram of the mixture, the proposed approach uses the sum of a set of convolutions of estimated activations with prelearned dictionary filters to approximate the audio mixture directly in the time domain. The approximation problem can be solved by an efficient convolutional sparse coding algorithm. The effectiveness of this approach for source separation of musical audio has been demonstrated in our prior work, but under rather restricted and controlled conditions, requiring the musical score of the mixture being informed a priori and little mismatch between the dictionary filters and the source signals. In this paper, we report an evaluation that considers wider, and more practical, experimental settings. This includes the use of an audio-based multipitch estimation algorithm to replace the musical score, and an external dataset of audio single notes to construct the dictionary filters. Our result shows that the proposed approach remains effective with a larger dictionary, and compares favorably with the state-of-the-art nonnegative matrix factorization approach. However, in the absence of the score and in the case of a small dictionary, our approach may not be better.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/ACCESS.2019.2950423

Noncontact Heart Rate Measurement Based on an Improved Convolutional Sparse Coding Method Using IR-UWB Radar

Pengfei Wang, +7 more

- 30 Oct 2019

- IEEE Access

TL;DR: Results obtained indicate that the proposed approach is effective for the extraction of low-amplitude heartbeat signals from the respiration signal, and that it significantly improves the accuracy of heart rate evaluation.

...read moreread less

38

Journal Article•10.1109/TNNLS.2019.2906074

Joint and Direct Optimization for Dictionary Learning in Convolutional Sparse Representation

Guan-Ju Peng

- 01 Feb 2020

- IEEE Transactions on Neural Networks

TL;DR: This paper deals with the nonconvex, nonsmooth constraints of the original CDL directly using the modified forward–backward splitting approach, in which the coefficients and dictionary are simultaneously updated in each iteration, and proposes a novel parameter adaption scheme to increase the speed of the algorithm used to obtain a usable dictionary.

...read moreread less

22

•Journal Article•10.1109/lsp.2021.3135196

Efficient ADMM-Based Algorithms for Convolutional Sparse Coding

01 Jan 2022

- IEEE Signal Processing Letters

TL;DR: In this paper , a convolutional least squares fitting subproblem is proposed to improve the efficiency of the state-of-the-art sparse coding methods. But the subproblem does not consider the global shift-invariant model.

...read moreread less

11

•Proceedings Article•10.1109/MMSP48831.2020.9287146

Mixing-Specific Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation

Ching-Yu Chiu, +4 more

- 21 Sep 2020

TL;DR: In this paper, Wang et al. extended data augmentation methods that consider more sophisticated mixing settings employed in the modern music production routine, the relationship between the tracks to be combined, and factors of silence.

...read moreread less

9

Journal Article•10.1109/TSP.2021.3064181

Null Space Component Analysis of One-Shot Single-Channel Source Separation Problem

Wen-Liang Hwang, +1 more

- 08 Mar 2021

- IEEE Transactions on Signal Processing

TL;DR: This paper characterizes the ambiguity of solutions to the source separation problem, and proposes a novel adaptive-operator-based approach to deriving solutions based on a combination of separation operators and domain-specific knowledge related to sources.

...read moreread less

9

...

Expand

References

•Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Stephen Boyd, +4 more

- 23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

20.5K

Journal Article•10.1038/44565

Learning the parts of objects by non-negative matrix factorization

Daniel D. Lee, +2 more

- 21 Oct 1999

- Nature

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

...read moreread less

14.2K

Journal Article•10.1137/S1064827596304010

Atomic Decomposition by Basis Pursuit

Scott Chen, +2 more

- 11 Dec 1998

- SIAM Journal on Scientific Computing

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

...read moreread less

11.3K

Learning parts of objects by non-negative matrix factorization

D. D. Lee

- 01 Jan 1999

TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.

...read moreread less

9.6K

Journal Article•10.1111/J.1467-9868.2005.00532.X

Model selection and estimation in regression with grouped variables

Ming Yuan, +1 more

- 01 Feb 2006

- Journal of The Royal Statistical Society...

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.

...read moreread less

8.8K