MAID: A Conditional Diffusion Model for Long Music Audio inpainting

Kaiyang Liu1, Wendong Gan2, Chenchen Yuan1
1 Sichuan University, College of Computer Science, Chengdu, China
2 Wiz Holdings Pte Ltd, Singapore

Contents

1. Abstract

Recent works on long music audio inpainting has focused on unconditionally generating new segments to fill corrupted audio segments. However, the information about these segments may differ significantly from the original. To solve this problem, we propose MAID (Music Audio Inpainting DDPM), a model for music audio inpainting based on DDPM (Denoising Diffusion Probability Model). The model is capable of unconditional and conditional inpainting of music audio: (a) in the unconditional inpainting task, MAID is capable of inpainting gaps with a length from 200 ms to 1600 ms; (b) In the conditional inpainting task, the model can generate a new segments with similar information to the original segments based on the piano-rolls corresponding to the gaps. Experiments show that MAID performs better than baseline. The source code in https://github.com/FlyToYourMooN/DDPM-Midi2Performance-Model .

2. Comparison with baseline

Piano solo Wind quintet String quartet Violin solo
Index Ground Truth GACELA MAID-uncond MAID-cond Ground Truth GACELA MAID-uncond MAID-cond Ground Truth GACELA MAID-uncond MAID-cond Ground Truth GACELA MAID-uncond MAID-cond
1
2
3
4
5

3. Inpainted samples of different lengths

3.1 Piano solo


Table.1. Piano solo

Piano solo 1600ms 1400ms 1200ms 1000ms 800ms 600ms 400ms 200ms
Index Original gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond
1
2
3
4
5

3.2 Wind quintet


Table.2. Wind quintet

Wind quintet 1600ms 1400ms 1200ms 1000ms 800ms 600ms 400ms 200ms
Index Original gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond
1
2
3
4
5

3.3 String quartet


Table.3. String quartet

String quartet 1600ms 1400ms 1200ms 1000ms 800ms 600ms 400ms 200ms
Index Original gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond
1
2
3
4
5

3.4 Violin solo


Table.4. Violin solo

Violin solo 1600ms 1400ms 1200ms 1000ms 800ms 600ms 400ms 200ms
Index Original gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond gap uncond cond
1
2
3
4
5