MIT 16 412J - Context-based Object Recognition - D1358255

Home> Schools> Massachusetts Institute of Technology> (16) > 16 412J> Context-based Object Recognition

DOC PREVIEW

MIT 16 412J - Context-based Object Recognition

School name Massachusetts Institute of Technology

Course 16 412j- Cognitive Robotics

Pages 42

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Using the Forest to See the Trees: Context-based Object Recognition Bill Freeman Computer Science and Artificial Intelligence Laboratory MIT A computer vision goal • many viewing conditions in unconstrained settings. • restricted cases: • But the general problem is difficult and unsolved. Joint work with Antonio Torralba and Kevin Murphy Recognize many different objects under There has been progress on – one object and one pose (frontal view faces) – Isolated objects on uniform backgrounds.• • How we hope to make progress on this hard problem Classify image patches/features at each location and scale features No car Classifier p( car | VL ) VL Local (bottom-up) approach to object detection Various technical improvements Exploit scene context: – “if this is a forest, these must be trees”. LocalProblem 1: Local features can be ambiguous Solution 1: Context can disambiguate local featuresEffect of context on object detection car pedestrian Identical local image features! Even high-resolution images can be locally ambiguous Images by Antonio TorralbaObject in context (Courtesy of Fredo Durand and William Freeman. Used with permission.)Isolated object Object in contextProblem 2: search space is HUGE x 1,000,000 images/day Plus, we want to do this for ~ 1000 objects y s positive rate) “Like finding needles in a haystack” Need to search over x,y locations and scales s - Error prone (classifier must have very low false - Slow (many patches to examine) 10,000 patches/object/imageSolution 2: context can provide a prior on what to look for, and where to look for it People most likely here Torralba, IJCV 2003 cars 1.0 0.0 n Talk outline • Context-based vision • • pedestriacomputer desk Computers/desks unlikely outdoors Feature-based object detection Graphical model to combine both sourcesTalk outline • Context-based vision • • Context-based vision • • • Combine with bottom-up object detection • training set acquisition. Feature-based object detection Graphical model to combine both sources Measure overall scene context or “gist” Use that scene context for: – Location identification – Location categorization – Top-down info for object recognition Future focus:Contextual machine-vision system • Low-dimensional representation of overall scene: – Gabor-filter outputs at multiple scales, orientations, locations – Dimensionality reduction via PCA Feature vector for an image: the “gist” of the scene – Compute 12 x 30 = 360 dim. feature vector – over 4x4 regions = 384 dim. feature vector – Reduce to ~ 80 dimensions using PCA The “Visual Gist” System Or use steerable filter bank, 6 orientations, 4 scales, averaged Oliva & Torralba, IJCV 2001Low-dimensional representation for image context Images 80-dimensional representation Hardware set-up • • Computer: Sony laptop • Wearable system – Gives immediate feedback to the user – Must handle general camera view – Capable of wireless link for audience display Designed for utility, not fashion…Our mobile rig, version 1 Kevin Murphy Our mobile rig, version 2. Antonio Torralba (Courtesy of Kevin Murphy. Used with permission.)(Courtsey of Antonio Torralba. Used with permission.)Experiments – th floor of 200 Tech. Square – • Test: – th floor (seen in training) – – • – – Specific location Location category Indoor/outdoor Ground truth System estimate Location recognition for mobile vision system •Train: Rooms and halls on 9Outdoors Interior of 200 Tech. Square, 9Interior of 400 Tech. Square (unseen) Outdoors (unseen places) Goals: Identify previously seen locations Identify category of previously unseen locationsClassifying isolated scenes can be hard Corridors Offices Correct recognition misses Correct recognition misses Scene recognition over time PCt-1 Ok k1… Pkn PCt … Os s1… Psn VsVk1 Vkn VGVs1 n P(Ct|Ct-1) is a transition matrix, P(vG|C) is a mixture of Gaussians Cf. topological localization in robotics Torralba, Murphy, Freeman, Rubin, ICCV 2003Benefit of using temporal integration G Place recognition demo p( qt | vt ) Instantaneous detection P( qt | v1:t )G Using HMM over timeCategorization of new places frame Specific location Location category Indoor/outdoor Top-down information for object detectionTalk outline • Context-based vision • • Bottom-up object recognition • • (each view of an object) Feature-based object detection Graphical model to combine both sources Use labelled training set Use local features to categorize each objectTraining data •Hand-annotated 1200 frames of video from a wearable webcam •Trained detectors for 9 types of objects: bookshelf, desk, screen (frontal) , steps, building facade, etc. •100-200 positive patches, > 10,000 negative patches Feature vector for a patch: step 1 derivatives Laplacian Corner Long edges convolve bank of 12 filters GaussianFeature vector for a patch: step 2 exponentiate γ = 2 (variance) or 4 (4th moment) Kurtosis Useful for texture analysis Feature vector for a patch: step 3 dictionary of 30 spatial masks .* mask characterizes shape of filter response bank of 12 filtersFeature vector for a patch: step 4 dictionary of 30 spatial masks .* 57.3 Average response γk = 2 (variance) or 4 (4th moment) Summary: Features image 12 x 30 x 2 = 720 features. Special cases include: -gk = delta function, wk -fi(γ)=4/ fi(γ=2) gives kurtosis for texture analysis -wk mask to capture spatial arrangement of parts dictionary of 12 filtersdictionary of 30 masks bank of 12 filters k’th feature of i’th patch i’th patch = Haar wavelets – Viola & Jones, Poggio et al Rectangular masks support integral image trick for fast computationClassifier: boosted features where – –ht(f) = output of weak classifier at round t −αt = weight assigned by boosting • ht(f) picks best feature and threshold: • • • Viola & Jones, IJCV 2001 Boosting demo •Output is f = feature vector for patch Weak learners are single features: ~500 rounds of boosting ~200 positive patches, ~ 10,000 negative patches No cascade (yet)Examples of learned features Example detections deskscreenExample detections desk screen bookshelf Bottom-up detection: ROC curvesTalk outline • Context-based vision • • Probabilistic models: graphical models • • Build up complex models from simple components describing conditional independence assumptions. • combine evidence from different parts of the

View Full Document