Overcoming data limitation challenges in predicting tropical storm surge with interpretable machine learning methods

dc.contributor.advisorKing, Scott
dc.contributor.authorStanton, Carly
dc.contributor.committeeMemberTissot, Philippe
dc.contributor.committeeMemberWang, Wenlu
dc.creator.orcidhttps://orcid.org/0009-0003-5257-4655
dc.date.accessioned2023-10-24T20:51:18Z
dc.date.available2023-10-24T20:51:18Z
dc.date.issued2023-08
dc.descriptionA thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Scienceen_US
dc.description.abstractThe impacts of climate change have increased the risk of storm surge flooding in coastal areas. Tropical islands are especially vulnerable to the effects of sea level rise and the increase in frequency and intensity of tropical cyclones (TCs). Typically, storm surge prediction is performed using a combination of numerical forecasting models, synoptic forecasting, and statistical methods. Machine learning techniques, particularly convolutional neural networks (CNNs), have shown promise in accurately predicting storm surge levels in the short term. However, deep learning methods are computationally expensive and require large amounts of data to train their models. Often researchers must train neural network models on synthetic data generated by numerical models. The goal of this work is to study the effectiveness of simpler, interpretable models, including random forest (RF) regression, multiple linear regression (MLR), and support vector machine regression (SVR), to predict storm surge in San Juan Bay, Puerto Rico using limited local meteorological and tidal data and hurricane reanalysis data from actual storm events over the last few decades. These algorithms were used to predict surge at five different lead times from one hour to 24 hours and were trained on three different feature sets with two different types of training data windows. Models were trained using a leave-one-out cross-validation (LOOCV) approach, in which data for one TC was separated out for each model as a validation dataset. The performance of the models and different training methods was compared in terms of root mean square error (RMSE), normalized RMSE, and error at peak surge. It was found that an RF model trained on data from only eight TCs was able to predict the peak surge of Hurricane Irma to within about 0.03 m and predicted time of peak surge within three hours at lead times up to 12 hours as long as one extreme TC event, in this case Hurricane Maria, was included in the training data. However, all models failed to accurately predict surge for Hurricane Maria, even when including other high-surge storms in the training data. Other training methods achieved lower RMSE when validated against a peak surge window from the 12 hours prior to 12 hours after peak surge, but could not approach the accuracy of the RF model at predicting the time of peak surge.en_US
dc.description.collegeCollege of Engineering and Computer Scienceen_US
dc.description.departmentComputer Scienceen_US
dc.format.extent90 pagesen_US
dc.identifier.urihttps://hdl.handle.net/1969.6/97611
dc.language.isoen_USen_US
dc.rights.urihttps://creativecommons.org/licenses/by-nd/4.0/deed.en*
dc.subjectmachine learningen_US
dc.subjectpredictive analyticsen_US
dc.subjectrandom forestsen_US
dc.subjectstorm surgeen_US
dc.subjecttropical cycloneen_US
dc.titleOvercoming data limitation challenges in predicting tropical storm surge with interpretable machine learning methodsen_US
dc.typeTexten_US
dc.type.genreThesisen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorTexas A & M University--Corpus Christien_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Stanton_Carly_Thesis.pdf
Size:
12.54 MB
Format:
Adobe Portable Document Format

Collections