MIT 16 412J - Study Notes - D2109090

Home> Schools> Massachusetts Institute of Technology> (16) > 16 412J> Study Notes

DOC PREVIEW

MIT 16 412J - Study Notes

School name Massachusetts Institute of Technology

Course 16 412j- Cognitive Robotics

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

SIFT SLAM Vision Details MIT 16.412J Spring 2004 Vikash K. Mansinghka 1Outline • Lightning Summary • Black Box Model of SIFT SLAM Vision System • Challenges in Computer Vision • What these challenges mean for visual SLAM • How SIFT extracts candidate landmarks • How landmarks are tracked in SIFT SLAM • Alternative vision-based SLAM systems • Open questions 2Lightning Summary • Motivation: SLAM without modifying the environment • Landmark candidates are extracted by the SIFT process • Candidates matched between cameras to get 3D positions • Candidates pruned according to consistency w/ robot’s expectations • Survivors sent oﬀ for statistical processing 3Review of Robot Speciﬁcations • Triclops 3-camera “stereo” vision system • Odometry system which produces [p, q, �] • Center camera is “reference” 4Black Box Model of Vision System • For now, based on black-magic (SIFT). Produces landmarks. • Assume landmarks globally indexed by i. • Per frame inputs: – [p, q, �] - odometry input (x, z, bearing deltas.) – List of (i, xi) - new landmark pos (from SLAM) • Per frame output is a list of (i, xlandmark i where: �) for each visible , x , r , ci i ii – x � i is its measured 3D pos (w.r.t. camera pos) – xi is its map 3D pos (w.r.t. initial robot pos), if it isn’t new – (ri, ci) is its pixel coordinates in center camera 5Challenges in Computer Vision • Intuitively appealing �= computationally realizable • Stable feature extraction is hard; results rarely general • Extracted features are sparse • Matching requires exponential time • Matches are often wrong 6Implications for Visual SLAM • Hard to reliably ﬁnd landmarks • Really Hard to reliably ﬁnd landmarks • Really Really Hard to reliably ﬁnd landmarks • Data association is slow and unreliable • False matches introduce substantial errors • Accurate probabilistic models unavailable 7Remarks on SIFT approach • For visual SLAM, landmarks must be identiﬁable across: – Large changes in distance – Small changes in view direction – (Bonus) Changes in illumination • Solution: – Produce “scale-invariant” image representation – Extract points with associated scale information – Use matcher empirically capable of handling small displacements 8The Scale-Invariant Feature Transform • Described in Lowe, IJCV 2004 (preprint; use Google) • Four stages: – Scale-space extrema extraction – Keypoint pruning and localization (not used in SLAM) – Orientation assignment – Keypoint descriptor (not used in SLAM) 9Lightning Introduction to Scale Space • Motivation: – Objects can be recognized at many levels of detail – Large distances correspond to low l.o.d. – Diﬀerent kinds of information are available at each level 10Lightning Introduction to Scale Space• Idea: Extract information content from an image at each l.o.d.• Detail reduction typically done by Gaussian blurring• Long history in both machine and human vision– Marr in late 1970s– Henkel in 2000• Analogous concepts used in speech processing051015200510152000.0020.0040.0060.0080.010.01211Scale Space in SIFT • I(x, y) is input image. L(x, y, �) is rep. at scale �. • G(x, y, �) is 2D Gaussian with variance �2 • L(x, y, �) = G(x, y, �) � I(x, y) (“only” choice; see Koenderink 1984) • D(x, y, �) = L(x, y, k�) − L(x, y, �) • D approximates �2�2G � I (see Mikolajczyk 2002 for signiﬁcance) • D also edge-detector-like; newest SIFT “corrects” for this • Details of discretization (e.g. resampling, k choice) unimportant 12Scale Space in SIFT • Compute local extrema of D as above • Each such (x, y, �) is a feature • (x, y) part “should” be scale and planar rotation invariant 13SIFT Orientation Assignment • For each feature (x, y, �): – Find ﬁxed-pixel-area patch in L(x, y, �) around (x, y) – Compute gradient histogram; call this bi – For bi within 80% of max, make feature (x, y, �, bi) • Enables matching by including illumination-invariant feature content (Sinha 2000) 14� SIFT Stereopsis • Apply SIFT to image from each camera. • Match center feature (x, y, �, �) and right feature (x , y�, ��, ��) if: 1. |y − y�| � 1 2. 0 < |x� − x| � 20 3. |� − ��| � 20 degrees 2 � � � 3 4. 3 �� 2 5. No other matches consistent with above exist • Match similarly for left and top; discard all not matched twice • Compute 3D positions (trig) as average from horiz. and vert. 15Recapitulation (in G Minor) • Procedure so far: 1. For each image: (a) Produce scale-space representation (b) Find extrema (c) Compute gradient orientation histograms 2. Match features from center to right and center to top 3. Compute relative 3D positions for survivors • This gives us potential features from a given frame • How do we use them? 17Landmark Tracking • Predict where landmarks should appear (reliability, speed) • Note: Robot moves in xz plane • Given [p, q, �] and old relative position [X, Y, Z], ﬁnd expected position [X �, Y �, Z�] by: X� = (X − p)cos(�) − (Z − q)sin(�) Y � = Y Z� = (X − p)sin(�) − (Z − q)cos(�) • By pinhole camera model ((u0, v0) image center coords, I interocular distance, f focal length): � = v0 − f Y � rZ� � = u0 + f X� Z� I d� = f Z� �� = � Z Z� 19 cLandmark Tracking • V is camera ﬁeld of view angle (60 degrees) • A landmark is expected to be in view if: Z� > 0 | Vtan−1( |X� ) < 2Z� −1( |Y � | tanZ� ) < V 2 • An expected landmark matches an observed landmark if: – Obs. center within a 10x10 region around expected – Obs. scale within 20% of expected – Obs. orientation within 20 degrees of expected – Obs. disparity within 20% of expected 20Landmark Tracking • A SIFT view is: (SIFT feature, relative 3D pos, absolute view dir) • Each landmark is: (3D position, list of views, misses) • Algorithm: For each frame, find expected landmarks w/ odometry For each observed view v: If v matches an expected landmark l: Set l.misses = 0 Add v to view list for l Else add l to DB For each expected, unobserved landmark l: If one view direction within 20 degrees of current: l.misses++ If

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

MIT 16 412J - Study Notes

Sign up for free to view:

Please select your school