UCSD CSE 190 - Approaches to Autonomous Target Recognition - D2236598

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 190> Approaches to Autonomous Target Recognition

DOC PREVIEW

UCSD CSE 190 - Approaches to Autonomous Target Recognition

School name University of California, San Diego

Course Cse 190- Topics/Computer Sci & Engineer

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Approaches to Autonomous Target RecognitionShane GrantDepartment of Computer ScienceUniversity of California, San DiegoLa Jolla, CA [email protected] AndersonDepartment of Computer ScienceUniversity of California, San DiegoLa Jolla, CA [email protected] aerial vehicles are becoming the defactomethod of aerial surveillance for many applications. Thereis an ever increasing desire to fully automate these systems,from aerial navigation (autopilots) to sophisticated sensorsystems that allow the vehicle to perform without a humanoperator. In this paper, we address the issue of target recog-nition - using visual data to acquire a list of potential tar-gets and then identify targets of interests and recognize theircharacteristics. When detailing recognition, we discuss twoapproaches and their implementations, as well as present acomparison of their results.1. IntroductionThe inspiration for this paper stems from the task of tar-get recognition from an unmanned aerial vehicle (UAV).UAVs pose several additional difficulties for any conven-tional vision system. Some of these difficulties includenoise involved in signal transmission, hardware constraintsdue to weight and limited payload capacity, and the fact thatthe platform is an airplane - giving the frame of referencemany degrees of freedom.We briefly discuss the processing needed to correct formany of these inherent difficulties before addressing theissue of recognition. Our task specifically deals with rec-ognizing alphanumeric targets that fall within a fairly con-strained set of parameters, as seen in figure 1. Though ourobjects of interest are themselves constrained, the ideas pro-posed can easily be generalized for many object recognitionproblems.For performing target recognition, we focus on two ap-proaches. The first is based upon so called ”Hu Moments”[8], while the second is a more recent technique rootedin signal processing that employs a range of transforms toachieve similar results [2].Figure 1. An example of an alphanumeric target we aim to recog-nize (a green R against a yellow square). Each target consists of asolid colored alphanumeric against a solid colored background ofarbitrary shape.2. Data AcquisitionBefore object recognition may be performed, candidateregions must be found, which themselves must be extractedfrom images acquired by the UAV. We briefly describe thisprocess since it is relevant but not the focus of this paper.2.1. Image AcquisitionOur platform is the Falco UAV [1] developed by studentsat UCSD for performing aerial surveillance. The plane iscapable of supporting a large payload volume and weight,which we utilize by equipping the plane with a Sony FCB-H11 high definition block camera and digital transmissionsystem.While the camera is capable of supporting high defini-tion resolutions, we currently acquire data in NTSC formatdue to restrictions on the transmission system. This videodata is acquired on a ground station computer via a capturecard and then fed into programs to analyze it. NTSC data isinterlaced and uses the YUV color space - both of which arenon desirable for our purposes. The first step in acquiringour data is to deinterlace the video input. There are manymethods for deinterlacing video [6], but we opt for a verysimple and efficient method that discards half of the field(a) Original Image (b) Saliency Map w/ LABFigure 2. An example of the input/output of the saliency algorithm, before thresholding and bounding box generation.lines and doubles the remaining lines. This results in a lessdistorted image than the interlaced version, but loses clarityover a more sophisticated approach.2.2. Image RectificationOnce an image has been collected it must be transformedto correct for the roll, pitch, and heading of the camera whenthe image was first captured. In this paper we assume thatthe input images to our recognition pipeline have been rec-tified and cropped to an appropriate bounding box. The in-terested reader can refer to Koppel et al. [10].2.3. Candidate Region SelectionCandidate target regions must then be extracted from therectified image and cropped appropriately. In our imple-mentation, we utilize a saliency map to select candidate re-gions as seen in figure 2 , though the method used here couldeasily be another approach; the importance lies in extractingfairly tightly cropped regions of interest.The final step before target recognition can be performedis candidate region selection. This step involves taking afully pre-processed image and using some metric to deter-mine which portions of the image might contain target likeobjects. In our case, we use saliency as a means to partitionrelevant objects from the background. Saliency performsno discrimination on the regions it finds noteworthy - theysimply stand out to their surrounding [5]. The regions se-lected may be targets and may be distractors that pop out inan image but are nevertheless not what we are searching for- it is not the function of this step to perform this last stepof discrimination.To match the video frame rate, our saliency algorithm isrun on a graphics processing unit (GPU) using the NVIDIACUDA programming environment [12]. This allows usto achieve sufficient frame rates for real time analysis ofstreaming video. The particular method we use is an adap-tation of previous work by the author [5] with modificationsinspired by the SUN framework [13]. See figure 2 for an ex-ample of the input/output of this algorithm. As can be seen,the brighter regions have been selected as more important.At this stage we perform thresholding and find boundingboxes for each connected component in the final binary im-age.3. SegmentationThe approaches we consider for object recognition workon binary images. It is therefor necessary to develop a suffi-cient segmentation process to isolate the objects of interestfrom any background data.The first step in our segmentation process is to isolate theshape from the background. Initial efforts attempted to firstisolate the character, but this proved to be too unreliable.The input image is first reduced to a smaller size, in our case64x64, which serves as a first step in removing some of thecolor information from the image and smoothing it slightly.We utilize k-means clustering as our primary means of iso-lating the shape. Given the restricted domain of our targets,we know that, assuming an appropriately cropped boundingregion, there will be approximately three

View Full Document