Aplicação de Aprendizado por Reforço para Controle de Orientação e Posição de um Manipulador Robótico de 6 Graus de Liberdade

  • Felipe R. Campos Programa de Pós-Graduação em Instrumentação, Controle e Automação de Processos de Mineração, Universidade Federal de Ouro Preto, MG; Instituto Tecnológico Vale, Ouro Preto, MG
  • Aline X. Fidêncio Faculty of Electrical Engineering and Information Technology, Ruhr-University Bochum
  • Gustavo Pessin Instituto Tecnológico Vale, Ouro Preto, MG
  • Gustavo M. Freitas Departamento de Engenharia Elétrica, Universidade Federal de Minas Gerais, Belo Horizonte, MG
Keywords: Robotics, Machine Learning, Reinforcement Learning, DDPG, PPO


Applications with autonomous robots play an important role in the industry and in everyday life. Among them, the activities of manipulating and moving objects are highlighted by the wide variety of possible applications. These activities in static and known environments can be implemented through logic planned by the developer, but this is not feasible in dynamic environments. Machine learning techniques such as Reinforcement Learning (RL) algorithms have sought to replace the pre-defined programming by teaching the robot how to act. This paper presents the implementation of two RL algorithms, Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO), for orientation and position control of a 6-degree-of-freedom (6-DoF) robotic manipulator. The results demonstrated that the DDPG had a faster learning convergence in simpler activities, but if the complexity of the problem increases, it might not obtain a satisfactory behavior. On the other hand, PPO can solve more complex problems, however, it limits the convergence rate to the best result in order to avoid learning instability.