##
This **preview** shows page *1-2*
out of 6 **pages**.

*View Full Document*

End of preview. Want to read all 6 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

1 Abstract—This paper presents a straight forward implementation of the Lucas Kanade feature tracker and modified RANSAC model estimation to model and group objects with different linear velocities in video scenes. The objects in the scenes are assumed to be moving with roughly linear motion and able to be characterized with affine models. The data set acquisition function is changed for the traditional RANSAC algorithm and results in a fairly accurate segmentation result. Both projective and affine models are used, with more accurate grouping and segmentation results coming from the latter. The models were run initially on simple videos for proof of concept, and also proved fairly successful on a real world scene. When first looking at a video sequence, it may not seem apparent how many different motions or planes of motion are present in the sequence. However, a simple urban or college scene may contain a dozen or more different static and dynamic objects. When trying to grasp how to get a computer to separate these into coherent groups, the task seems daunting. One helpful assumption to can make is that “trajectories of neighboring features typically exhibit a stron g correlation because they follow the motion of the same surface in the world” [1]. In the realm of feature identification and tracking, the Lucas-Kanade algorithm has seen widespread use for over twenty five years [5]. Due to its relative simplicity in implementation, it has seen frequent use in trying to solve the problem of estimating and track motion of objects in scenes [1, 4]. Lucas-Kanade is simply the foundation for this previous work, and the actual method of grouping feature points into recognizable groups varies from paper to paper. In several of the reviewed papers, the usage of the affine model to categorize motion came up again and again [1,2,4]. The affine Motion Identification and Grouping using Lucas Kanade and Geometric Models Matthew E. Pepper, Member, IEEE, Justin P. Mattimore, Member, IEEE I. INTRODUCTION2 model for motion “provides greater flexib ility in modeling local motion, being able to represent rotation, dilation, and shear as well as translation.. [wh ich leads to] more robust estimation” [2]. How to calculate such a model le d to the investigation of the RANdom SAmple Consensus (RANSAC) method developed in [6]. This process is based however on the fact that all the data that needs to be modeled is linked to only one object. When applied to a scene with multiple models, its tradition implementation would not be sufficient. Relying on the statement above regarding trajectories of neighboring features, it was determined that the RANSAC algorithm could best fit data if it took samples from a region around a random point, rather than multiple random feature throughout the image. This would improve the probability of all features being from the same object, having the same trajectories, and fitting the same model, which is one of the foundation assumptions associated with RANSAC. The random determination of data sets from which to draw models is felt to be important to preserving the robustness of the algorithm. II. METHOD The algorithm for this paper consists of three parts, the first uses the first frame of the video sequence to find a list of feature points, the second part uses the Lucas-Kanade algorithm to track these points across all subsequent frames, and the last looks at the initial set of points along with those in later frames to group points by either an affine or projective model. A. Feature Identification In order to identify good features in an image, the neighboring region of each pixel must be considered, as well as each pixel individually. Since Lucas-Kanade is at its heart, a minimization algorithm using gradient descent, there needs to be a way to look at the intensity gradients around each feature. Therefore to look at this neighborhood, the image is separated into two images by convolving with a Gaussian and smoothing with its derivative. Each convolution is performed with a normalized kernel of length five in the vertical direction to give the Gradient X image or in the horizontal direction to give Gradient Y. These two images indicate the behavior of the pixel intensities in the X or Y direction. It is these gradient images which will be used to provide the neighborhood data for each pixel value. This neighborhood data is compiled into a gradient covariant matrix, also known as the Hessian, and two eigenvalues are found. The ratio of these two eigenvalues will be used to gauge each particular point’s usefulness for tracking. If both are sufficiently s ma ll, the area doesn’t contain enough texture to track. If one is large and the other an order of magnitude or more smaller, that could indicate the point is near an edge, which could lead to future inaccuracy when try to extrapolate its next position. They must both be of sufficient magnitude and yet one not more than five or less times the other, to be declared ideal for tracking. If this is the case, the smaller of the two is assigned to the x y value of the point. Once all points have been analyzed, they are selected based on fitness and then sorted in descending order. The desired n features is chosen from the top per some threshold and used for motion analysis.3 B. Feature Tracking The tracking algorithm used in this paper is also standard Lucas-Kanade [5]. Two frames are taken and the next frame in the sequence is subtracted from the original to create a difference image. Since temporal persistence is assumed, a window around each feature point in this later frame should match that of the earlier frame but be in a different position. There is assumed then to be some set of unknowns that when multiplied with the intensity of the x y value from the first frame should yield the intensity of the second. These parameters are unknown, but can be solved by using the gradient information from both images. Theoretically, the gradient window of the first image should be large enough to contain the pixel value of where the feature actually is in the following frame. Through an iterative process, the pixel locations and intensity values are compared between the two different images, a difference is created by interpolation, added to the old point, and run again until the difference is negligible. This process is done for all points through all frames. C.

View Full Document