Conditional Drums Generation using Compound Word Representations

doi:10.1007/978-3-031-03789-4_12

Open AccessBook Chapter10.1007/978-3-031-03789-4_12

Conditional Drums Generation using Compound Word Representations

Dimos Makris, +3 more

- 09 Feb 2022

pp 179-194

11

TL;DR: In this paper , a sequence-to-sequence architecture was proposed for conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data, where a bidirectional long short-term memory (BiLSTM) encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences.

Abstract: The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they “groove” with the given accompaniment.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/ACCESS.2023.3244620

Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art

Lazaros Moysis, +9 more

- IEEE Access

TL;DR: In this paper , the most recent developments of deep learning in music signal processing are reviewed, and two main applications that are discussed are music information retrieval and music generation, which can fit a range of musical styles.

...read moreread less

15

•Journal Article•10.1109/access.2023.3244620

Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art

01 Jan 2023

- IEEE Access

TL;DR: In this paper , the most recent developments in deep learning in music signal processing are reviewed, and two main applications that are discussed are music information retrieval and music generation, which can fit a range of musical styles.

...read moreread less

15

Proceedings Article•10.48550/arXiv.2210.06007

JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE

Yueh-Kao Wu, +2 more

- 12 Oct 2022

TL;DR: A model that generates a drum track in the audio domain to play along to a user-provided drumfree recording is proposed, demonstrating that the model with beat information generates drum accompaniment that is rhythmically and stylistically consistent with the input audio.

...read moreread less

7

Journal Article•10.48550/arXiv.2210.15638

LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation

Olga Vechtomova, +1 more

- 27 Oct 2022

TL;DR: In this article , a bi-modal AI-driven approach uses generated lyric lines to find matching audio clips from the artist's past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics.

...read moreread less

3

Journal Article•10.1007/978-3-031-29956-8_19

LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation

Thomas Clavier

- 01 Jan 2023

- Lecture Notes in Computer Science

TL;DR: In this paper , a bi-modal AI-driven approach uses generated lyric lines to find compatible audio clips from the artist's past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics.

...read moreread less

2

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Proceedings Article•10.3115/V1/D14-1179

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

- 01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

28.6K

Journal Article•10.1016/J.NEUNET.2005.06.042

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Alex Graves, +1 more

- 01 Jun 2005

- Neural Networks

TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

...read moreread less

4.5K

•Posted Content

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Hao-Wen Dong, +3 more

- 19 Sep 2017

- arXiv: Audio and Speech Processing

TL;DR: In this article, three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs) are proposed, referred to as the jamming model, the composer model and the hybrid model.

...read moreread less

581

...

Expand

Conditional Drums Generation using Compound Word Representations

Chat with Paper

AI Agents for this Paper

Citations

Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art

Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art

JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE

LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation

LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation

References

Attention is All you Need

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TensorFlow: A system for large-scale machine learning

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Related Papers (5)

Fingerprint recognition using CNNs: fingerprint preprocessing

Glasses detection in face images using histogram of Oriented Gradients

Thermal Infrared Face Image Recognition Based on PCA and LDA

A face recognition system using template matching and neural network classifier

Formal Methods of Tokenization for Part-of-Speech Tagging