Autonomous harvesting via hierarchical reinforcement learning in dynamic environments

Nethala, Prasad

Autonomous harvesting via hierarchical reinforcement learning in dynamic environments

Files

Nethala_Prasad_Thesis.pdf (1.04 MB)

Date

2023-05

Authors

Nethala, Prasad

ORCID

https://orcid.org/0000-0002-7489-2991

URI

https://hdl.handle.net/1969.6/96898

Abstract

Smart farming not only requires geospatial navigation but also uses various microprocessors and sensors to perform functions such as controlling temperature and irrigation systems. Advanced phenotyping modalities such as IoT and digital twin technologies revamped agriculture productivity to an extent hitherto unprecedented. However, matching crop cultivation and harvesting technology has yet to be further advanced to take advantage of data-driven crop production. Farming areas are often unstructured with dynamic objects such as human workers and farming machines. Therefore, a smart harvesting robot is in need of automatic navigation and obstacle avoidance. Due to conflicting objectives of goal-reaching and obstacle-avoidance, especially in a dynamic environment, harvesting is a challenging task for a robotic system. In this thesis, a novel Hierarchical Reinforcement Learning architecture is proposed, which is a robust multitask-capable AI model for an autonomous mobile manipulator to achieve both terrain coverage while assuring obstacle avoidance with dynamic objects. It is assumed that the manipulator is equipped with sensitive skin for omnidirectional sensitivity. The proposed Hierarchical Reinforcement Learning architecture is modeled with both Deep Deterministic Policy Gradient (DDPG) algorithm and Proximal Policy Optimization (PPO) algorithm. As a result, two different hierarchical architectures are developed as Hierarchical Deep Deterministic Policy Gradient (HDDPG) and Hierarchical Proximal Policy Optimization (HPPO) algorithms to autonomously manage two separate agents for both goal-reaching and obstacle-avoidance objectives. Transfer learning is adopted to assess if the trained models were overfit or underfit as well as for learning generalized policy. The algorithms were evaluated in a simulated environment by collecting fallen fruits in a crowded orchard farm environment with a variety of dynamic obstacles after being taught in a simple environment with fewer constraints. The metric used for this mission includes percent harvesting and the number of goal touch, the number of obstacle touch, navigation distance, and navigation time. HDDPG outperformed the remaining algorithms by 70% in terms of total average rewards and minimum pixel distance travel, whereas HPPO achieved the highest number of fruit collections, DDPG and PPO were unable to complete the test environment due to local minimum. Both Hierarchical architectures HDDPG and HPPO could successfully generalize to new situations beyond the training environments with robust performance.

Description

A Thesis Submitted In Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in Geospatial Surveying Engineering from Texas A&M University-Corpus Christi.

Keywords

HDDPG, hierarchical reinforcement learning, HPPO, reinforcement learning

Rights:

This material is made available for use in research, teaching, and private study, pursuant to U.S. Copyright law. The user assumes full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. Any materials used should be fully credited with its source. All rights are reserved and retained regardless of current or future development or laws that may apply to fair use standards. Permission for publication of this material, in part or in full, must be secured with the author and/or publisher., This material is made available for use in research, teaching, and private study, pursuant to U.S. Copyright law. The user assumes full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. Any materials used should be fully credited with its source. All rights are reserved and retained regardless of current or future development or laws that may apply to fair use standards. Permission for publication of this material, in part or in full, must be secured with the author and/or publisher.

Collections

Theses

Full item page

Autonomous harvesting via hierarchical reinforcement learning in dynamic environments

Files

Date

Authors

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

DOI

URI

Abstract

Description

Keywords

Sponsorship

Rights:

Citation

Collections