Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

Renata  Garcia Oliveira; Wouter  Caarls

doi:10.48011/asba.v2i1.1547

Renata Garcia Oliveira Pontifical Catholic University of Rio de Janeiro
Wouter Caarls Pontifical Catholic University of Rio de Janeiro

DOI: https://doi.org/10.48011/asba.v2i1.1547

Keywords: Machine learning, Reinforcement learning, Deep reinforcement learning, Hyperparameter optimization, Ensemble algorithms

Abstract

Deep Reinforcement Learning has been very promising in learning continuous control policies. For complex tasks, Reinforcement Learning with minimal human intervention is still a challenge. This article proposes a study to improve performance and to stabilize the learning curve using the ensemble learning methods. Learning a combined parameterized action function using multiple agents in a single environment, while searching for a better way to learn, regardless of the quality of the parametrization. The action ensemble methods were applied in three environments: pendulum swing-up, cart pole and half cheetah. Their results demonstrated that action ensemble can improve performance with respect to the grid search technique. This article also presents as contribution the comparison of the effectiveness of the aggregation techniques, the analysis considers the use of the separate or the combined policies during training. The latter presents better learning results when used with the data center aggregation strategy.