DOC PREVIEW
MIT 16 412J - Study Notes

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

SIFT SLAM Vision Details MIT 16.412J Spring 2004 Vikash K. Mansinghka 1Outline • Lightning Summary • Black Box Model of SIFT SLAM Vision System • Challenges in Computer Vision • What these challenges mean for visual SLAM • How SIFT extracts candidate landmarks • How landmarks are tracked in SIFT SLAM • Alternative vision-based SLAM systems • Open questions 2Lightning Summary • Motivation: SLAM without modifying the environment • Landmark candidates are extracted by the SIFT process • Candidates matched between cameras to get 3D positions • Candidates pruned according to consistency w/ robot’s expectations • Survivors sent off for statistical processing 3Review of Robot Specifications • Triclops 3-camera “stereo” vision system • Odometry system which produces [p, q, �] • Center camera is “reference” 4Black Box Model of Vision System • For now, based on black-magic (SIFT). Produces landmarks. • Assume landmarks globally indexed by i. • Per frame inputs: – [p, q, �] - odometry input (x, z, bearing deltas.) – List of (i, xi) - new landmark pos (from SLAM) • Per frame output is a list of (i, xlandmark i where: �) for each visible , x , r , ci i ii – x � i is its measured 3D pos (w.r.t. camera pos) – xi is its map 3D pos (w.r.t. initial robot pos), if it isn’t new – (ri, ci) is its pixel coordinates in center camera 5Challenges in Computer Vision • Intuitively appealing �= computationally realizable • Stable feature extraction is hard; results rarely general • Extracted features are sparse • Matching requires exponential time • Matches are often wrong 6Implications for Visual SLAM • Hard to reliably find landmarks • Really Hard to reliably find landmarks • Really Really Hard to reliably find landmarks • Data association is slow and unreliable • False matches introduce substantial errors • Accurate probabilistic models unavailable 7Remarks on SIFT approach • For visual SLAM, landmarks must be identifiable across: – Large changes in distance – Small changes in view direction – (Bonus) Changes in illumination • Solution: – Produce “scale-invariant” image representation – Extract points with associated scale information – Use matcher empirically capable of handling small displacements 8The Scale-Invariant Feature Transform • Described in Lowe, IJCV 2004 (preprint; use Google) • Four stages: – Scale-space extrema extraction – Keypoint pruning and localization (not used in SLAM) – Orientation assignment – Keypoint descriptor (not used in SLAM) 9Lightning Introduction to Scale Space • Motivation: – Objects can be recognized at many levels of detail – Large distances correspond to low l.o.d. – Different kinds of information are available at each level 10Lightning Introduction to Scale Space• Idea: Extract information content from an image at each l.o.d.• Detail reduction typically done by Gaussian blurring• Long history in both machine and human vision– Marr in late 1970s– Henkel in 2000• Analogous concepts used in speech processing051015200510152000.0020.0040.0060.0080.010.01211Scale Space in SIFT • I(x, y) is input image. L(x, y, �) is rep. at scale �. • G(x, y, �) is 2D Gaussian with variance �2 • L(x, y, �) = G(x, y, �) � I(x, y) (“only” choice; see Koenderink 1984) • D(x, y, �) = L(x, y, k�) − L(x, y, �) • D approximates �2�2G � I (see Mikolajczyk 2002 for significance) • D also edge-detector-like; newest SIFT “corrects” for this • Details of discretization (e.g. resampling, k choice) unimportant 12Scale Space in SIFT • Compute local extrema of D as above • Each such (x, y, �) is a feature • (x, y) part “should” be scale and planar rotation invariant 13SIFT Orientation Assignment • For each feature (x, y, �): – Find fixed-pixel-area patch in L(x, y, �) around (x, y) – Compute gradient histogram; call this bi – For bi within 80% of max, make feature (x, y, �, bi) • Enables matching by including illumination-invariant feature content (Sinha 2000) 14� SIFT Stereopsis • Apply SIFT to image from each camera. • Match center feature (x, y, �, �) and right feature (x , y�, ��, ��) if: 1. |y − y�| � 1 2. 0 < |x� − x| � 20 3. |� − ��| � 20 degrees 2 � � � 3 4. 3 �� 2 5. No other matches consistent with above exist • Match similarly for left and top; discard all not matched twice • Compute 3D positions (trig) as average from horiz. and vert. 15Recapitulation (in G Minor) • Procedure so far: 1. For each image: (a) Produce scale-space representation (b) Find extrema (c) Compute gradient orientation histograms 2. Match features from center to right and center to top 3. Compute relative 3D positions for survivors • This gives us potential features from a given frame • How do we use them? 17Landmark Tracking • Predict where landmarks should appear (reliability, speed) • Note: Robot moves in xz plane • Given [p, q, �] and old relative position [X, Y, Z], find expected position [X �, Y �, Z�] by: X� = (X − p)cos(�) − (Z − q)sin(�) Y � = Y Z� = (X − p)sin(�) − (Z − q)cos(�) • By pinhole camera model ((u0, v0) image center coords, I interocular distance, f focal length): � = v0 − f Y � rZ� � = u0 + f X� Z� I d� = f Z� �� = � Z Z� 19 cLandmark Tracking • V is camera field of view angle (60 degrees) • A landmark is expected to be in view if: Z� > 0 | Vtan−1( |X� ) < 2Z� −1( |Y � | tanZ� ) < V 2 • An expected landmark matches an observed landmark if: – Obs. center within a 10x10 region around expected – Obs. scale within 20% of expected – Obs. orientation within 20 degrees of expected – Obs. disparity within 20% of expected 20Landmark Tracking • A SIFT view is: (SIFT feature, relative 3D pos, absolute view dir) • Each landmark is: (3D position, list of views, misses) • Algorithm: For each frame, find expected landmarks w/ odometry For each observed view v: If v matches an expected landmark l: Set l.misses = 0 Add v to view list for l Else add l to DB For each expected, unobserved landmark l: If one view direction within 20 degrees of current: l.misses++ If


View Full Document

MIT 16 412J - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?