MAID: A Conditional Diffusion Model for Long Music Audio inpainting
Contents
1. Abstract
Recent works on long music audio inpainting has focused on unconditionally generating new segments to fill corrupted audio segments. However, the information about these segments may differ significantly from the original. To solve this problem, we propose MAID (Music Audio Inpainting DDPM), a model for music audio inpainting based on DDPM (Denoising Diffusion Probability Model). The model is capable of unconditional and conditional inpainting of music audio: (a) in the unconditional inpainting task, MAID is capable of inpainting gaps with a length from 200 ms to 1600 ms; (b) In the conditional inpainting task, the model can generate a new segments with similar information to the original segments based on the piano-rolls corresponding to the gaps. Experiments show that MAID performs better than baseline. The source code in https://github.com/FlyToYourMooN/DDPM-Midi2Performance-Model .
2. Comparison with baseline
| Piano solo | Wind quintet | String quartet | Violin solo | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Ground Truth | GACELA | MAID-uncond | MAID-cond | Ground Truth | GACELA | MAID-uncond | MAID-cond | Ground Truth | GACELA | MAID-uncond | MAID-cond | Ground Truth | GACELA | MAID-uncond | MAID-cond |
| 1 | ||||||||||||||||
| 2 | ||||||||||||||||
| 3 | ||||||||||||||||
| 4 | ||||||||||||||||
| 5 | ||||||||||||||||
3. Inpainted samples of different lengths
3.1 Piano solo
Table.1. Piano solo
| Piano solo | 1600ms | 1400ms | 1200ms | 1000ms | 800ms | 600ms | 400ms | 200ms | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Original | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond |
| 1 | |||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||
3.2 Wind quintet
Table.2. Wind quintet
| Wind quintet | 1600ms | 1400ms | 1200ms | 1000ms | 800ms | 600ms | 400ms | 200ms | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Original | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond |
| 1 | |||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||
3.3 String quartet
Table.3. String quartet
| String quartet | 1600ms | 1400ms | 1200ms | 1000ms | 800ms | 600ms | 400ms | 200ms | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Original | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond |
| 1 | |||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||
3.4 Violin solo
Table.4. Violin solo
| Violin solo | 1600ms | 1400ms | 1200ms | 1000ms | 800ms | 600ms | 400ms | 200ms | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Original | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond | gap | uncond | cond |
| 1 | |||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||
| 4 | |||||||||||||||||||||||||
| 5 | |||||||||||||||||||||||||