A Survey of Data Augmentation for Audio Classification

Lucas Ferreira-Paiva; Elizabeth Alfaro-Espinoza; Vinicius M. Almeida; Leonardo B. Felix; Rodolpho V. A. Neves

doi:10.20906/CBA2022/3469

Lucas Ferreira-Paiva Núcleo Interdisciplinar de Análise de Sinais (NIAS), Universidade Federal de Viçosa, MG
Elizabeth Alfaro-Espinoza Programa de Pós-Graduação em Bioinformática, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, MG
Vinicius M. Almeida Centro de Ciências Exatas e Tecnológicas - Engenharia de Computação, Centro Universitário de Viçosa, MG
Leonardo B. Felix Núcleo Interdisciplinar de Análise de Sinais (NIAS), Universidade Federal de Viçosa, MG
Rodolpho V. A. Neves Núcleo Interdisciplinar de Análise de Sinais (NIAS), Universidade Federal de Viçosa, MG

DOI: https://doi.org/10.20906/CBA2022/3469

Keywords: Sound systems, Acoustic noise, Data processing, Artificial neural networks, Multimedia systems

Abstract

One of the most effective methods for reducing overfitting in deep learning models for audio classification is data augmentation. The range of techniques available, as well as a lack of understanding of the most efficient ones, can result in severe time and processing power costs. This survey covers numerous techniques, tools, and datasets for offline data augmentation to assist in the selection and implementation of data augmentation strategies to improve audio classification models in Environmental Sound Classification, Music Information Retrieval, and Automatic Speech Recognition. Finally, we present a short review of papers that apply data augmentation in Environmental Sound Classification which indicates that the use of spectrogram and audio augmentation has considerable potential for improving the performance of convolutional models, especially for small datasets with increases in accuracy of up to 30%. However, the accuracy gains achieved may be insufficient to justify the additional computer burden depending on the application. Furthermore, the usage of image data augmentation is unsuitable for audio data.