Aprendizado por Reforço Baseado em Modelo Markoviano para Alocação de Recursos em Sistema Multiportadora com Ondas Milimétricas

  • Daniel P. Q. Carneiro EMC, Universidade Federal de Goiás
  • Álisson A. Cardoso EMC, Universidade Federal de Goiás
  • Flávio Henrique T. Vieira EMC, Universidade Federal de Goiás
Keywords: Markov Decision Process, OFDM, Learning, varying channels, channel model


In this article, a resource allocation algorithm based on reinforcement learning is presented for a multicarrier communication system considering multiple users, fading and multipath effects in a transmission assuming millimeter waves. To this end, it is proposed that the communication system can be described by a Markovian model represented by the states of the queue in the buffers and states of the channels. For the resource allocation algorithm in this work, we introduced a new reward functions used in the reinforcement learning Q-learning algorithm. The results obtained in the simulations show that the application of the proposed resource scheduling algorithm generally provides an improvement in the performance parameters of the considered communication system, such as an increase in its throughput and decrease of lost packets. Comparisons with variations of the Q-learning algorithm present in the literature are carried out, also showing that the use of the proposed reward function makes user scheduling and resource sharing more efficient.