Regression analysis of UAV collected cotton crop data for yield prediction

Date

2022-05

Authors

Lopez, Bianca

Journal Title

Journal ISSN

Volume Title

Publisher

DOI

Abstract

Prediction of cotton yield can enable farmers to make more beneficial planning, budgeting, and intervention decisions. The objective of this thesis was to assess the performance of principal component regression (PCR), partial least squares regression (PLSR), Ridge regression, and least absolute shrinkage and selection operator (LASSO) regression for predicting cotton yield. During the 2016 growing season, excess greenness index (ExG), normalized difference vegetation index (NDVI), canopy height (CH), and canopy volume (CV) were calculated weekly from UAS (unmanned aerial systems) collected RGB (red, green, blue) and multispectral images of an experimental cotton field located at the Texas A&M AgriLife Research Center in Corpus Christi, Texas, USA ( 27◦ 46’ 57.08” N, 97◦ 33’ 40.94” W). Irrigation was taken as a categorical variable, with the field split into two approximately equal sections of dry and irrigated plots. Data were split into 80 percent training data and 20 percent testing data for all models and a 10-fold cross validation was performed to find the optimal number of principal components for the PCR, latent variables for PLSR, and the hyperparameters of the LASSO and Ridge regressions. All models were trained with the weekly time series variables ExG, NDVI, CH, and CV and the categorical variable irrigation. Each model was also trained with the same time series variables up to 67 days after planting and irrigation. The set of models trained on the entire season resulted in the following test set mean squared error values and R-squared scores, respectively; PCR with ∼2.83 and ∼0.48, PLSR with ∼1.00 and ∼0.80, LASSO regression with ∼0.94 and ∼0.81, and Ridge regression with ∼1.32 and ∼0.73. The models trained on the first 67 days subset obtained the following mean squared error values and R-squared scores, respectively; PCR with ∼2.88 and ∼0.47, PLSR with ∼1.54 and ∼0.70, LASSO regression with ∼1.60 and ∼0.67, and Ridge regression with ∼1.61 and ∼0.67. LASSO regression fit best out of the four regressions used to model the entire season’s data with the highest R-squared value and lowest MSE score. This model could be useful for decision-making in preparation for future growing seasons. The PLSR model trained on the first 67 days after planting subset resulted in the lowest MSE and the highest R-squared of ∼0.70. Decisions for intervention could be made with reasonable accuracy at 67 days after planting based on the PLSR model.

Description

Keywords

crop yield prediction, LASSO regression, Partial Least Squares Regression, Principal Component Regression, Ridge Regression, UAV

Sponsorship

Rights:

Attribution 4.0 International

Citation