Real-time object detection for autonomous driving-based on deep learning
MetadataShow full item record
Optical vision is an essential component for autonomouscars. Accurate detection of vehicles, street buildings, pedestrians and road signs could assist self-driving cars the drive as safely as humans. However, object detection has been a challenging task for decades since images of objects in the real-world environment are affected by illumination, rotation, scale, and occlusion. In recent years, many Convolutional Neural Network (CNN) based classification-after-localization methods have improved detection results in various conditions. However, the slow recognition speed of these two-stage methods limits their usage in real-time situations. Recently, a unified object detection model, You Only Look Once (YOLO) , was proposed, which could directly regress from input image to object class scores and positions. Its single network structure processes images at 45 fps on PASCAL VOC 2007 dataset  and has higher detection accuracy than other current real-time methods. However, when applied to auto-driving object detection tasks, this model still has limitations. It processes images individually despite the fact that an object's position changes continuously in the driving scene. Thus, the model ignores alot of important information between continuous frames. In this research, we applied YOLO to three different datasets to test its general applicability. We fully analyzed its performance from various aspects on KITTI dataset  which is specialized for autonomous driving. We proposed a novel technique called memory map, which considers inter-frame information, to strengthen YOLO's detection ability in driving scene. We broadened the model's applicability scope by applying it to a new orientation estimation task. KITTI is our main dataset. Additionally, ImageNet  dataset is used for pre-training, and three other datasets. And Pascal VOC 2007/2012 , Road Sign , and Face Detection Dataset and Benchmark (FDDB)  were used for other class domains.
A thesis Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in COMPUTER SCIENCE from Texas A&M University-Corpus Christi in Corpus Christi, Texas.
RightsThis material is made available for use in research, teaching, and private study, pursuant to U.S. Copyright law. The user assumes full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. Any materials used should be fully credited with its source. All rights are reserved and retained regardless of current or future development or laws that may apply to fair use standards. Permission for publication of this material, in part or in full, must be secured with the author and/or publisher.