Um, DuganNethala, PrasadShin, Hocheol2022-09-072022-09-072022-08-03Um, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037Um, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037https://hdl.handle.net/1969.6/93950In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.en-USAttribution 4.0 InternationalAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/http://creativecommons.org/licenses/by/4.0/rl reinforcement learninghrl hierarchical reinforcement learningddpg deep deterministic policy gradienthddpg hierarchical deep deterministic policy gradientham hierarchical abstract machinesrl reinforcement learninghrl hierarchical reinforcement learningddpg deep deterministic policy gradienthddpg hierarchical deep deterministic policy gradientham hierarchical abstract machinesHierarchical DDPG for manipulator motion planning in dynamic environmentsHierarchical DDPG for manipulator motion planning in dynamic environmentsArticlehttps://doi.org/10.3390/ai3030037