Duke CPS 296.3 - Distortion Estimation Techniques in Solving Visual

Unformatted text preview:

Distortion Estimation Techniques in Solving VisualCAPTCHAsGabriel Moy, Nathan Jones, Curt Harkless, and Randall PotterAret´e AssociatesSherman Oaks, CA 91403Email: {moy,jones,harkless,potter}@arete.comAbstract— This paper describes two distortion estimation tech-niques for object recognition that solve EZ-Gimpy and Gimpy-r,two of the visual CAPTCHAs (“Completely Automated PublicTuring test to tell Computers and Humans Apart”) with highdegrees of success. A CAPTCHA is a program that generates andgrades tests that most humans can pass but current computerprograms cannot pass. We have developed a correlation algorithmthat correctly identifies the word in an EZ-Gimpy challengeimage 99% of the time and a direct distortion estimationalgorithm that correctly identifies the four letters in a Gimpy-rchallenge image 78% of the time.I. INTRODUCTIONMany computer vision applications rely on accurate objectrecognition for success. While there is no unifying objectrecognition technique, it is important to advance strategies totake care of specific problems such as accounting for noisefrom the addition of background clutter and distortions. VisualCAPTCHAs like EZ-Gimpy (Fig. 1) and Gimpy-r (Fig. 2)are good examples of simple objects with background clutterand distortions. Manuel Blum’s group at Carnegie MellonUniversity describes different CAPTCHAs based on visual oraudio information at http://www.captcha.net [1].The visual CAPTCHAs include the Gimpy family of tests,Bongo, and Pix. Gimpy involves identifying three of approx-imately seven distinct words in an image. EZ-Gimpy is asimpler version that only uses one word, while Gimpy-r hasfour random letters. The audio CAPTCHA, Sounds, is anaudio version of Gimpy. A sequence of letters or words isrendered, distorted, then played. The test is to determine thecontents of the sound clip.In both EZ-Gimpy and Gimpy-r, the user is presented witha 290 pixel × 80 pixel JPEG image and prompted to enter a“guess” as to what word or sequence of letters is shown. Theletters in EZ-Gimpy are from one font, and the backgroundclutter can consist of white noise, a grid, or other patternssuch as swirls. The letters in Gimpy-r are from two fonts, andthe background clutter is mostly different colored distortionpatterns such as boxes, waves, and ripples.Visual CAPTCHAs are used to prevent spammers fromperforming automated techniques in acquiring free email ac-counts from sites such as Yahoo and to stop automated ticketpurchases from Ticketmaster. Furthermore, any program thatpasses the tests generated by a CAPTCHA can be used tosolve a hard unsolved AI problem [2]. We take on the problemof EZ-Gimpy and Gimpy-r not only to show that they areFig. 1. Three EZ-Gimpy challenge images.ineffective deterrents but also to advance the progress on AIproblems.The EZ-Gimpy test uses a dictionary of 561 words whilethe Gimpy-r test uses a set of four random letters from adictionary of 19 letters. The approaches we use to solve EZ-Gimpy at a 99% level and Gimpy-r at a 78% level add to ourcollection of object recognition tools. The strategies considerthe specific problems of cluttered backgrounds and distortedletters, but do not take into account other issues such as sightangle, lighting effects, context, and camouflage. In the caseof EZ-Gimpy, we use a whole object recognition approachagainst each object in the dictionary, since the dictionaryis relatively small. Our approach is very different from thebigram approach of Mori and Malik [4] or the chamfermatching approach of Thayananthan, et. al, [7] each of whichachieve a 93% success rate. With Gimpy-r, the dictionary has194= 130, 321 entries. A comparison to each entry wouldbe too time consuming, so we break down the problem intofour individual letter recognition problems. Neither the holisticnor individual letter approach to word recognition is new.Madhvanath and Govindaraju [3] have used a holistic approachin handwriting recognition, while Plamondon and Srihari [5]present a survey of holistic and segmented approaches forhandwriting recognition.Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04) 1063-6919/04 $20.00 © 2004 IEEEFig. 2. Three Gimpy-r challenge image.Section II describes our correlation algorithm for solvingEZ-Gimpy. Section III describes our distortion estimationalgorithm for solving Gimpy-r. We discuss our conclusionsin Section IV.II. MATCHING WHOLE OBJECTS BY CORRELATIONGiven a small set of template images, such as those in EZ-Gimpy, we are able to test the challenge image against eachof the template images. Instead of trying to deduce whichindividual letters are in the challenge image, we find the bestcorrelated template image. Our distortion estimation with cor-relation approach uses what we call a “core” and “minipatch”framework. In each template image, we identify 3 cores and 24minipatches. We use variations of the cores and minipatchesto estimate distortions and then find which distorted templateimage best correlates to the challenge image.A. CoresWe define a core as an area that is most distinct, i.e., leastcorrelated with the rest of the image. Given an image that is290 pixels × 80 pixels, we choose a circular core with a 16pixel diameter. We compare each possible 16 pixel disc withevery other 16 pixel disc and find the three least correlatedsections that do not overlap. These cores represent the mostdistinctive features of the word (Fig. 3) and will be the anchorpoints between the template and challenge image.B. MinipatchesWe next split the template word into 24 small overlappingsections, which we call minipatches (Fig. 4). Starting from theoriginal minipatch, we create five rotated versions. With thefive rotations, we independently shrink or stretch the minipatchin the X and Y directions, giving 45 variations per minipatch.We also keep track of the core positions with respect tothe minipatch positions. The variations on the minipatchesrepresent the types of distortions encountered. If larger rangesFig. 3. The three cores of two template images.Fig. 4. A template image split into 24 minipatches.of distortions need to be estimated, we would use a larger setof minipatches with more variations, with the side effect ofincreased execution time.C. MatchingGiven a challenge image, we go through multiple steps toarrive at a metric of correlation to the template images. Thesteps are as follows:• Background removal•


View Full Document

Duke CPS 296.3 - Distortion Estimation Techniques in Solving Visual

Download Distortion Estimation Techniques in Solving Visual
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distortion Estimation Techniques in Solving Visual and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distortion Estimation Techniques in Solving Visual 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?