Conditional Drums Generation using Compound Word Representations
Dimos Makris,Guo Zixun,Maximos A. Kaliakatsos-Papakostas,Dorien Herremans +3 more
- 09 Feb 2022
pp 179-194
TL;DR: In this paper , a sequence-to-sequence architecture was proposed for conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data, where a bidirectional long short-term memory (BiLSTM) encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences.
read more
Abstract: The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they “groove” with the given accompaniment.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art
Lazaros Moysis,L. Iliadis,Sotirios P. Sotiroudis,Achilles D. Boursianis,Maria Papadopoulou,Konstantinos-Iraklis D. Kokkinidis,Christos Volos,Panagiotis Sarigiannidis,Spiridon Nikolaidis,Sotirios K. Goudos +9 more
TL;DR: In this paper , the most recent developments of deep learning in music signal processing are reviewed, and two main applications that are discussed are music information retrieval and music generation, which can fit a range of musical styles.
15
Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of-the-Art
01 Jan 2023
TL;DR: In this paper , the most recent developments in deep learning in music signal processing are reviewed, and two main applications that are discussed are music information retrieval and music generation, which can fit a range of musical styles.
15
JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE
Yueh-Kao Wu,Ching-Yu Chiu,Yi-Hsuan Yang +2 more
- 12 Oct 2022
TL;DR: A model that generates a drum track in the audio domain to play along to a user-provided drumfree recording is proposed, demonstrating that the model with beat information generates drum accompaniment that is rhythmically and stylistically consistent with the input audio.
LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation
Olga Vechtomova,Gaurav Sahu +1 more
- 27 Oct 2022
TL;DR: In this article , a bi-modal AI-driven approach uses generated lyric lines to find matching audio clips from the artist's past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics.
3
LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation
TL;DR: In this paper , a bi-modal AI-driven approach uses generated lyric lines to find compatible audio clips from the artist's past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics.
2
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
•Posted Content
TensorFlow: A system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Alex Graves,Jürgen Schmidhuber +1 more
TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.
4.5K
•Posted Content
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
TL;DR: In this article, three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs) are proposed, referred to as the jamming model, the composer model and the hybrid model.
581