DOC PREVIEW
Berkeley COMPSCI 294 - Classifiers, Transformations and Shape Representation

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS294-6 (Spring 2004) Recognizing People, Objects and Actions Lecture: February 10, 2004Classifiers, Transformations and Shape RepresentationLecturer: Jitendra Malik Scribe: Parvez AhammadWhat we discussed earlier:• LeNet (Convolutional Neural Nets)• Tangent Distance approach (Nearest Neighbor Classifier methodology)Today’s topics:• Idea of an augmented data set• Broad categories in classifiers• Shap e• Transformations• Idea of deformable templates; in particular - shap e context methodIdea of an Augmented Dataset:(This was in response to the question about the augmented dataset used in LeNet paper). Let’s say we have6000 examples for training. For each example, generate ten copies by applying small transformations. So,effectively we now have 60,000 examples with some in-built invariance (vaguely) to the transformation thatwe used. This augmented dataset can now be used for training purposes.Couple of observations in relation to data augmentation:• Local minima related problems still remain• If we know that there ought to be certain invariances, why not build them into our recognition scheme?For example, the convolutional-nets used in LeNet paper are shift-invariant. We desire that scale in-variancebe built in. In other words, let us say we are given a detector which takes a 32x32 patch as input and findsthe digits at that scale. So, the question is : How do we attempt to build-in the scale-invariance? .Strategy: Scanning St rategies using PyramidsThe idea is to build multiple instances of the image at varying scales by using decimation(down-sampling)schemes. Let’s say we will decimate by a factor of 2 (When decimation is done by 2, the scales are consideredto be one octave apart). One of the things to note while building such pyramids is not to simply take theaverage of 4 pixels to make the next-level pixel (that will cause aliasing). It is better to first apply a low-passfilter on the image (example: a Gaussian) and then do the down-sampling. Once the pyramid is built,then run the detector at different scales (different levels of the pyramid) and claim success if we succeed indetecting the object any of the scales. There will be a problem if the jump from one scale to the next scale istoo large. In that scenario, our detector might miss the sc ale at which it can recognize the object. To avoid1this, we need to be careful about ensuring the presence reasonable scales- by using (σ/√2) steps between thestandard deviation of the Gaussian kernel, we can create n-octave spacing in the scales. Standard practice incomputer vision community (especially in face recognition applications) is to use scales that are one-quarteroctave spacing apart.The downside of this approach is that thes e schemes are computationally expensive. For naive implementa-tions, this computational cost will increase linearly with the number of objects (assuming that the locationand s cale are unknown throughout). As we can see, all scanning strategies are top-down approaches.Critique regarding Scanning Strategies: Computation is being wasted in areas that c ouldn’t poss ibly haveany instances of the objects of interest. So, what is the alternative? Cascade Classifiers!Broad categories of classifiers:• Cascade Classification Schemes: A sequence of tests placed in the increasing order of computationalcost - such that the simpler (computationally cheap) tests can be carried out first and the moreexpensive tests can be carried out on the images that passed the simpler tests. The justificationfor sequential testing schemes is that the average computational cost is much lower across the wholedataset. Clearly, a problem with this approach is that decisions are made sequentially - which means- if a bad decision was made early, it cannot be rectified at later stages.• Attention/Segmentation based Classification Schemes: These schemes operate by performingsome low-level processing that is independent of the detector mechanism - to find regions of interest(ROIs), and then run the detector in these ROIs. Potentially, one could also figure out the scale ofoptimal detection by looking at the size of the ROIs. An example of this type of scheme is a cornerdetector algorithm that usually precedes certain algorithms (example: SFM). A milder version of theseschemes are the over-segmentation schemes (assuming that all necessary edges or regions are preservedalong with s ome junk).Majority of today’s classification algorithms in computer vision follow the cascade approach.Shape :(Slide 1:) Biological Shape - D’ Arcy Thompson (1917) studied transformations between shapes of organisms.Key insight here is to note that description of shape is hard, but description of transformations between shapesis easier! The two steps involved in being able to do this, are:• Correspondence problem (In nature, correspondences are clear for landmarks and fiducial points)• Transformation problemTransformations:From R2→ R2, here are the kinds of transformations we come across:• Translation:x0y0=xy+∆x∆y(1)• Rotation:x0y0=cosθ −sinθsinθ cosθxy(2)2• Linear Transformations:x0y0=a11a12a21a22xy(3)• Affine Transformations:x0y0=a11a12a21a22xy+∆x∆y(4)• Euclidean Transformations:x0y0=cosθ −sinθsinθ cosθxy+∆x∆y(5)As we can clearly see, all of the above mentioned transformations are basically affine transformations.In general, we can classify transformations into two broad categories:• Continuous Transformation: An equivalence relation and one-to-one correspondence b e tween pointsin two geometric figures or topological spaces which is continuous in both directions, also c alled ahomeomorphism. A homeomorphism which also preserves distances is called an isometry. Affinetransformations are another type of common ge ometric home omorphism.• Smooth transformation: A map between manifolds which is differentiable and has a differentiableinverse (also called diffeomorphism).Linear transformations are quite useful because they involve solving a set of linear equations. Although a bitunmanageable, most smooth non-linear transformations can be treated as piecewise linear in the differentialsense. If we consider only the first two terms of the Taylor series expansion of any function, it looks like anaffine transformation ( of the form αx + β). So, any non-linear transformation can be


View Full Document

Berkeley COMPSCI 294 - Classifiers, Transformations and Shape Representation

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Classifiers, Transformations and Shape Representation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Classifiers, Transformations and Shape Representation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Classifiers, Transformations and Shape Representation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?