Aplicação de um Modelo Neural para Reconhecimento de Fala em Áudios com Características de Comunicação via Rádio

  • Lucas Grigoleto Scart Universidade Federal do Espírito Santo, ES
  • Raquel Frizera Vassallo Universidade Federal do Espírito Santo, ES
  • Jorge Leonid Aching Samatelo Universidade Federal do Espírito Santo, ES
Keywords: neural networks, automatic speech recognition, regularization, dataset construction, radio communication

Abstract

Automatic speech recognition is essential for machines to understand the content of words and sentences in a spoken language. Machine learning models known as deep neural networks are the focus of actual research in the artificial intelligence area, obtaining superior results compared with classical models and enabling the learning of features through unlabeled data. Despite the significant advance in applying these models to languages with a low volume of labeled data, there is still a barrier to the practical use of speech recognition models caused by the domain mismatch between training and inference data. This article proposes a methodology for simulating radio communication characteristics, enabling the development of datasets oriented to the robust training of neural models. The simulation was carried out through the implementation via software of a narrowband FM transmitter and receiver, together with the noisy communication channel. A state-of-the-art speech recognition architecture is also implemented and trained using advanced regularization techniques. When performing the training with the simulated data, it is was observed a relative reduction of 51.7% in the character error rate considering the most challenging noise level (SNR of 0 dB), with a similar decrease at all noise levels. We expected that the methodology developed in this work would open space to develop more robust speech recognition models with future applications in radio communication.
Published
2022-10-19
Section
Articles