Hierarchical DDPG for manipulator motion planning in dynamic environments

dc.contributor.authorUm, Dugan
dc.contributor.authorNethala, Prasad
dc.contributor.authorShin, Hocheol
dc.creator.orcidhttps://orcid.org/0000-0002-7489-2991en_US
dc.creator.orcidhttps://orcid.org/0000-0002-7489-2991
dc.date.accessioned2022-09-07T21:35:29Z
dc.date.available2022-09-07T21:35:29Z
dc.date.issued2022-08-03
dc.description.abstractIn this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.en_US
dc.description.abstractIn this paper, a hierarchical reinforcement learning (HRL) architecture, namely a “Hierar chical Deep Deterministic Policy Gradient (HDDPG)” has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.
dc.identifier.citationUm, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037en_US
dc.identifier.citationUm, D., Nethala, P., & Shin, H. (2022). Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments. AI, 3(3), 645–658. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ai3030037
dc.identifier.doihttps://doi.org/10.3390/ai3030037
dc.identifier.urihttps://hdl.handle.net/1969.6/93950
dc.language.isoen_USen_US
dc.language.isoen_US
dc.rightsAttribution 4.0 International*
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectrl reinforcement learningen_US
dc.subjecthrl hierarchical reinforcement learningen_US
dc.subjectddpg deep deterministic policy gradienten_US
dc.subjecthddpg hierarchical deep deterministic policy gradienten_US
dc.subjectham hierarchical abstract machinesen_US
dc.subjectrl reinforcement learning
dc.subjecthrl hierarchical reinforcement learning
dc.subjectddpg deep deterministic policy gradient
dc.subjecthddpg hierarchical deep deterministic policy gradient
dc.subjectham hierarchical abstract machines
dc.titleHierarchical DDPG for manipulator motion planning in dynamic environmentsen_US
dc.titleHierarchical DDPG for manipulator motion planning in dynamic environments
dc.typeArticleen_US
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments.pdf
Size:
6.13 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description: