1Object Recognition23Lighting affects appearanceThe “Margaret Thatcher Illusion” by Peter Thompson4The “Margaret Thatcher Illusion” by Peter Thompson5Recognition Problems• What is it?– Object detection• Who is it?– Recognizing identity• Object recognition• Category recognition• What are they doing?– Activity recognition• All of these are classification problems– Choose one class from a list of possible candidatesFace Detection6Face RecognitionP. Sinha and T. Poggio, Last but not least, Perception 31, 2002, 133.7What Makes Recognition Hard?• Intrinsic variability within each class• Pose variability• Illumination variability• Background variability• Segmentation problem– What region within an image contains the object?• Feature selection problem– What features describe shape and appearance?89101112Approximate Invariants• Over a limited range of viewpoint variation• Parallelism• Collinearity• Angle between a pair of lines• Co-termination1314Pose consistency• Correspondences between image features and model features are not independent• A small number of correspondences yields a camera --- the others must be consistent with this• Strategy:– Generate hypotheses using small numbers of correspondences (e.g. triples of points for a calibrated perspective camera, etc., etc.)– Backproject and verify• Notice that the main issue here is camera calibration• Appropriate groups are “frame groups”151617181920Voting on Pose• Each model leads to many correct sets of correspondences, each of which has the same pose– Vote on pose, in an accumulator array– This is a Hough transform, with all its issues2122Figure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEEFrom “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. ICCV, 199023Figure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEEFigure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEE24Figure from “The evolution and testing of a model-based object recognition system”, J.L. Mundy and A. Heller, Proc. Int. Conf. Computer Vision, 1990 copyright 1990 IEEE25Invariants• There are geometric properties that are invariant to camera transformations• Easiest case: view a plane object in weak perspective• Assume we have three base points P_i on the object– then any other point on the object can be written as• Now image points are obtained by multiplying by a plane affine transformation, soPk=P1+µkaP2−P1()+µkbP3−P1()pk=APk= A P1+µkaP2− P1( )+µkbP3− P1( )()= p1+µkap2− p1( )+µkbp3− p1( )Invariants• This means that, if I know the base points in the image, I can read off the µ values for the object– they’re the same in object and in image ---invariant• Suggests a strategy rather like the Hough transform– search correspondences, form µ’s and vote26Geometric Hashing• Vote on identity and correspondence using invariants– Take hypotheses with large enough votes• Fill up a table, indexed by µ’s, with – the base points and fourth point that yield those µ’s– the object identity272829Recognizing Human Actions• Movement and posture change– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …• Object manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)…• Conversational gesture– point, …• Sign Language30Activities and Situation Assessment• Example: Withdrawing money from an ATM• Activities constructed by composing actions. Partial order plans may be a good model.• Activities may involve multiple agents• Detecting unusual situations or activity patterns is facilitated by the video activity transformObjects in Space Actions in Space-Time• Segment/Region-of-interest• Features (points, curves, wavelet coefficients..)• Correspondence and deform into alignment• Recover parameters of generative model• Discriminative classifier• Segment/volume-of-interest• Features (points, curves, wavelets, motion vectors..)• Correspondence and deform into alignment• Recover parameters of generative model• Discriminative classifier31Key Cues for Action Recognition• “Morpho-kinesics” of action (shape and movement of the body)• Identity of the object/s• Activity contextImage/Video Stick figure Action• Stick figures can be specified in a variety of ways or at various resolutions (degrees of freedom)– 2D joint positions– 3D joint positions– Joint angles• Complete representation• Evidence that it is effectively computable32Human Body ConfigurationsHuman Body Configurations33Mathematical Challenges• Modeling shape variation• Nearest neighbor search in high dimensions• Combining statistical optimality with computational efficiency• Reconstruction algorithms for novel sensing
View Full Document