Berkeley COMPSCI 294 - Shape and Motion from Image Streams under Orthography - D2202944

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> Shape and Motion from Image Streams under Orthography

DOC PREVIEW

Berkeley COMPSCI 294 - Shape and Motion from Image Streams under Orthography

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

International Journal of Computer Vision, 9:2, 137-154 (1992) © 1992 Kluwer Academic Publishers, Manufactured in The Netherlands. Shape and Motion from Image Streams under Orthography: a Factorization Method CARLO TOMASI Department of Computer Science, Cornell University, Ithaca, NY 14850 TAKEO KANADE School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 Received Abstract Inferring scene geometry and camera motion from a stream of images is possible in principle, but is an ill-conditioned problem when the objects are distant with respect to their size. We have developed a factorization method that can overcome this difficulty by recovering shape and motion under orthography without computing depth as an intermediate step. An image stream can be represented by the 2FxP measurement matrix of the image coordinates of P points tracked through F frames. We show that under orthographic projection this matrix is of rank 3. Based on this observation, the factorization method uses the singular-value decomposition technique to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively. Two of the three translation components are computed in a preprocessing stage. The method can also handle and obtain a full solution from a partially filled-in measurement matrix that may result from occlusions or tracking failures. The method gives accurate results, and does not introduce smoothing in either shape or motion. We demonstrate this with a series of experiments on laboratory and outdoor image streams, with and without occlusions. 1 Introduction The structure-from-motion problem--recovering scene geometry and camera motion from a sequence of images--has attracted much of the attention of the vi- sion community over the last decade. Yet it is common knowledge that existing solutions work well for perfect images, but are very sensitive to noise. We present a new method called thefactorization method which can robustly recover shape and motion from a sequence of images under orthographic projection. The effects of camera translation along the optical axis are not ac- counted for by orthography. Consequently, this com- ponent of motion cannot be recovered by our method and must be small relative to the scene distance. However, this restriction to shallow motion improves dramatically the quality of the computed shape and of the remaining five motion parameters. We demonstrate this with a series of experiments on laboratory and out- door sequences, with and without occlusions. In the factorization method, we represent an image sequence as a 2FxP measurement matrix W, which is made up of the horizontal and vertical coordinates of P points tracked through F frames. If image coordinates are measured with respect to their centroid, we prove the rank theorem: under orthography, the measurement matrix is of rank 3. As a consequence of this theorem, we show that the measurement matrix can be factored into the product of two matrixes R and S. Here, R is a 2Fx3 matrix that represents camera rotation, and S is a 3 xP matrix that represents shape in a coordinate system attached to the object centroid. The two compon- ents of the camera translation along the image plane are computed as averages of the rows of W. When features appear and disappear in the image sequence because of occlusions or tracking failures, the resulting measure- ment matrix W is only partially filled in. The factoriza- tion method can handle this situation by growing a par- tial solution obtained from an initial full submatrix into a complete solution with an iterative procedure.138 Tomasi and Kanade The rank theorem captures precisely the nature of the redundancy that exists in an image sequence, and permitsa large number of points and frames to be proc- essed in a conceptually simple and computationally ef- ficient way to reduce the effects of noise. The resulting algorithm is based on the singular-value decomposition, which is numerically well behaved and stable. The robustness of the recovery algorithm in turn enables us to use an image sequence with a very short interval between frames (an image stream), which makes feature tracking relatively simple and the assumption of orthography easier to approximate. 2 Relation to Previous Work In Ullman's original proof of existence of a solution (Ullman 1979) for the structure-from-motion problem, the coordinates of feature points in the world are ex- pressed in a world-centered system of reference and an orthographic projection model is assumed. Since then, however, most computer vision researchers opted for perspective projection and a camera-centered represen- tation of shape (Prazdny 1980; Bruss & Horn 1983; Tsai & Huang 1984; Adiv 1985; Waxman & Wohn 1985; Bolles et al. 1987; Horn et al 1988; Heeger & Jepson 1989; Heel 1989; Matthies et al. 1989; Spetsakis & Aloimonos 1989; Broida et al. 1990). With this repre- sentation, the position of feature points is specified by their image coordinates and by their depths, defined as the distance between the camera center and the feature points, measured along the optical axis. Unfor- tunately, although a camera-centered representation simplifies the equations for perspective projection, it makes shape estimation difficult, unstable, and noise sensitive. There are two fundamental reasons for this. First, when camera motion is small, effects of camera rota- tion and translation can be confused with each other: for example, a small rotation about the vertical axis and a small translation along the horizontal axis can gen- erate very similar changes in an image. Any attempt to recover or differentiate between these two motions, though possible mathematically, is naturally noise sen- sitive. Second, the computation of shape as relative depth, for example, the height of a building as the dif- ference of depths between the top and the bottom, is very sensitive to noise, since it is a small difference between large values. These difficulties are especially magnified when the objects are distant from the camera relative to their sizes, which is often the case for interesting applications such as site modeling. The factorizaiton method we present here takes ad- vantage of the fact that both difficulties disappear when the problem is reformulated in world-centered coordin- ates and under

View Full Document

Berkeley COMPSCI 294 - Shape and Motion from Image Streams under Orthography

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Berkeley COMPSCI 294 - Shape and Motion from Image Streams under Orthography

Sign up for free to view:

Please select your school