Hierarchical DDPG for manipulator motion planning in dynamic environments

Date

2022-08-03

Authors

Um, Dugan
Nethala, Prasad
Shin, Hocheol

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.


In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.

Description

Keywords

rl reinforcement learning, hrl hierarchical reinforcement learning, ddpg deep deterministic policy gradient, hddpg hierarchical deep deterministic policy gradient, ham hierarchical abstract machines, rl reinforcement learning, hrl hierarchical reinforcement learning, ddpg deep deterministic policy gradient, hddpg hierarchical deep deterministic policy gradient, ham hierarchical abstract machines

Sponsorship

Rights:

Attribution 4.0 International, Attribution 4.0 International

Citation

Um, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037
Um, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037