Protótipo de Correlação de Eventos de Leitura Labial com Medidas Temporais e Legendas

Authors

  • João Marcelo S. Souza Centro de Supercomputação SENAI CIMATEC, Salvador, Bahia, Brasil; Programa de Pós-Graduação em Engenharia Elétrica, UFBA, Salvador, Bahia, Brasil
  • Caroline da Silva M. Alves Centro de Supercomputação SENAI CIMATEC, Salvador, Bahia, Brasil; Programa de Pós-Graduação em Engenharia Elétrica, UFBA, Salvador, Bahia, Brasil
  • Orlando Mota Pires Centro de Supercomputação SENAI CIMATEC, Salvador, Bahia, Brasil
  • Jés de Jesus F. Cerqueira Programa de Pós-Graduação em Engenharia Elétrica, UFBA, Salvador, Bahia, Brasil
  • Wagner Luiz A. de Oliveira Programa de Pós-Graduação em Engenharia Elétrica, UFBA, Salvador, Bahia, Brasil

Keywords:

lip reading, visual extraction, standardized measurements, subtitles, labeling

Abstract

Abstract: Computer vision has become a prominent scientific field in recent years due to the various applications enabled by artificial neural networks. However, to make these networks feasible, having qualified and labeled data is essential, which is often more complex for some application possibilities, especially those that evaluate the temporal evolution of movements. Therefore, this research proposes a prototype methodology to correlate the temporal effects of mouth movements obtained by visually extracted measures via video streaming with multimodally labeled speech and lip-reading events from video subtitles. With the universe of data extracted from public sources, indications were observed that it is possible to represent a repeated word over the same video via time series. The experiments brought new challenges regarding the synchronization quality between video images and the respective start and end times of words in subtitles. Also, the same word may be pronounced with different durations, making comparing the samples more difficult. The results enable future development of datasets of temporal series representing words visually through standardized measures and using captions from diverse videos to increase the range of sample classes. Consequently, such aspects contribute to creating applications using time series training.

Downloads

Published

2024-10-18

Issue

Section

Articles