CALTECH EE 148A - A Coarse-to-Fine Strategy for Multi-Class Shape Detection

Unformatted text preview:

1A Coarse-to-Fine Strategy forMulti-Class Shape DetectionYali Amit, Donald Geman and Xiaodong FanYali Amit is with the Department of Statistics and the Department of Computer Science, University of Chicago, Chicago, IL,60637. Email: [email protected]. Supported in part by NSF ITR DMS-0219016.Donald Geman is with the Department of Applied Mathematics and Statistics, and the Whitaker Biomedical EngineeringInstitute, The Johns Hopkins University, Baltimore, MD 21218. Email:[email protected]. Supported in part by ONR undercontract N000120210053, ARO under grant DAAD19-02-1-0337, and NSF ITR DMS-0219016.Xiaodong Fan is with the Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore,MD 21218. Supported in part by by ONR under contract N000120210053. Email:[email protected] 2, 2004 DRAFT2AbstractMulti-class shape detection, in the sense of recognizing and localizing instances from multiple shapeclasses, is formulated as a two-step process in which local indexing primes global interpretation. Duringindexing a list of instantiations (shape identities and poses) is compiled constrained only by no misseddetections at the expense of false positives. Global information, such as expected relationships amongposes, is incorporated afterward to remove ambiguities. This division is motivated by computationalefficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in classand pose. This search can be interpreted as successive approximations to likelihood ratio tests arisingfrom a simple (“naive Bayes”) statistical model for the edge maps extracted from the original images.The key to constructing efficient “hypothesis tests” for multiple classes and poses is local OR’ing; inparticular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffsthen emerge between discrimination and the pattern of spreading. These are analyzed mathematicallywithin the model-based framework and the whole procedure is illustrated by experiments in readinglicense plates.Key words: Shape detection, multiple classes, statistical model, spread edges, coarse-to-fine search,online competition.I. INTRODUCTIONWe consider detecting and localizing shapes in cluttered grey level images when the shapesmay appear in many poses and there are many classes of interest. In many applications, a merelist of shape instantiations, where each item indicates the generic class and approximate pose,provides a useful global description of the image. (Richer descriptions, involving higher-levellabels, occlusion patterns, etc. are sometimes desired.) The set of feasible lists may be restrictedby global, structural constraints involving the joint configuration of poses; this is the situationin our application to reading license plates.In this paper, indexing will refer to non-contextual detection in the sense of compiling alist of shape instantiations independently of any global constraints; interpretation will referto incorporating any such constraints, i.e., relationships among instantiations. In our approachindexing primes interpretation.IndexingIdeally, we expect to detect all instances from all the classes of interest under a wide rangeof geometric presentations and imaging conditions (resolution, lighting, background, etc.). ThisSeptember 2, 2004 DRAFT3can be difficult even for one generic class without accepting false positives. For instance, allapproaches to face detection (e.g., [8], [31], [36]) must confront the expected variations in theposition, scale and tilt of a face, varying angles of illumination and the presence of complexbackgrounds; despite considerable activity, and marked advances in speed and learning, noapproach achieves a negligible false positive rate on complex scenes without missing faces.With multiple shape classes an additional level of complexity is introduced and subtle confusionsbetween classes must be resolved in addition to false positives due to background clutter.Invariant indexing, or simply invariance, will mean a null false negative rate during indexing,i.e., the list of reported instantiations is certain to contain the actual ones. Discrimination willrefer to false positive error – the extent to which we fantasize in our zeal to find everything. Weregard invariance as a hard constraint. Generally, parameters of an algorithm can be adjustedto achieve near-invariance at the expense of discrimination. The important tradeoff is thendiscrimination vs computation.Hypothetically, one could achieve invariance and high discrimination by looking separatelyfor every class at every possible pose (“templates for everything”). Needless to say, with a largenumber of possible class/pose pairs, this would be extremely costly, and massive parallelism isnot the answer. Somehow we need to look for many things at once, which seems at odds withachieving high discrimination.Such observations lead naturally to organizing multi-class shape detection as a coarse-to-fine(CTF) computational process. Begin by efficiently eliminating entire subsets of class/pose pairssimultaneously (always maintaining invariance) at the expense of relatively low discrimination.From the point of view of computation, rejecting many explanations at once with a single,relatively inexpensive “test” is clearly efficient; after all, given an arbitrary subimage, the mostlikely hypothesis by far is “no shape of interest” or “background,” and initially testing againstthis allows for early average termination of the search. If “background” is not declared, proceedto smaller class/pose subsets at higher levels of discrimination, and finally entertain highlydiscriminating procedures but dedicated to specific classes and poses. Accumulated false positivesare eventually removed by more intense, but focused, processing. In this way, the issue ofcomputation strongly influences the very development of the algorithms, rather than being anafterthought.A natural control parameter for balancing discrimination and computation is the degree ofSeptember 2, 2004 DRAFT4invariance of local features, not in the sense of fine shape attributes, such as geometric singu-larities of curves and surfaces, but rather coarse, generic features which are common in a setof class/pose pairs. “Spread features” ([1], [3], [8]) provide a simple example: a local feature issaid to be detected at a given location if the response of the feature


View Full Document

CALTECH EE 148A - A Coarse-to-Fine Strategy for Multi-Class Shape Detection

Download A Coarse-to-Fine Strategy for Multi-Class Shape Detection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Coarse-to-Fine Strategy for Multi-Class Shape Detection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Coarse-to-Fine Strategy for Multi-Class Shape Detection 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?