View Full Document

14 views

Unformatted text preview:

Stereo Matching with Nonparametric Smoothness Priors in Feature Space Brandon M Smith University of Wisconsin Madison Li Zhang University of Wisconsin Madison Hailin Jin Adobe Systems Inc bmsmith cs wisc edu lizhang cs wisc edu hljin adobe com Abstract We propose a novel formulation of stereo matching that considers each pixel as a feature vector Under this view matching two or more images can be cast as matching point clouds in feature space We build a nonparametric depth smoothness model in this space that correlates the image features and depth values This model induces a sparse graph that links pixels with similar features thereby converting each point cloud into a connected network This network defines a neighborhood system that captures pixel grouping hierarchies without resorting to image segmentation We formulate global stereo matching over this neighborhood system and use graph cuts to match pixels between two or more such networks We show that our stereo formulation is able to recover surfaces with different orders of smoothness such as those with high curvature details and sharp discontinuities Furthermore compared to other single frame stereo methods our method produces more temporally stable results from videos of dynamic scenes even when applied to each frame independently Left Cloth3 image 17 Our depth map Woodford et al 31 flat plane discontinuity high curvature Smoothness types 2 01 bad pixels 6 33 bad pixels Figure 1 Different image regions correspond to 3D surfaces with different types of smoothness as shown on the left Such smoothness properties are often highly correlated with local image features such as intensity gradients and shading We propose a nonparametric smoothness prior for global stereo matching that models the correlation between image features and depth values Depth maps estimated using this model preserve both high curvature surfaces and sharp discontinuities at object boundaries as shown in the middle Our method compares favorably to an existing state of the art method 31 that uses a fixed 2nd order smoothness prior shown on the right Our method also has the advantage of being able to generate stable depth maps for videos of dynamic scenes Bad pixels black are those whose absolute depth errors are greater than one Best viewed in color 1 Introduction Stereo matching has been one of the core challenges in computer vision for decades Two categories of solutions have been proposed local methods and global methods Local methods use a larger neighborhood 7 7 for example around each pixel they have the flexibility to model parametric surfaces such as a quadratic patch within the neighborhood but have difficulties in handling occlusion which is a global property of the scene Global methods use a smaller neighborhood often a pair of pixels to impose surface smoothness they are good at reasoning about occlusion but are often limited to modeling piecewise planar scenes In this work we seek to combine the advantages of both approaches by designing a global stereo matching method that uses a large neighborhood to define a depth smoothness prior Using a large neighborhood gives us the opportunity to model complex local shapes however it is also challenging Take the image in Figure 1 as an example Different patches have different types of smoothness flat planes discontinuous segments and high curvature folds Assuming a single parametric surface type would not be a robust solution in all cases we argue that a nonparametric smoothness model should be used for a large neighborhood Furthermore it is well accepted that image features such as intensity edges 4 and color seg ments 25 provide important cues for depth estimation We therefore hope that this nonparametric model will also be able to represent the correlation between image features and depth values in a large neighborhood Toward this end we build a nonparametric depth smoothness prior model that correlates the image features and depth values Our key idea is to consider each pixel as a feature vector and view each image as a point cloud in this feature space In general the feature vector for each pixel can include its position shading texture filter bank coefficients etc which provide cues that are often correlated with surface continuity curvatures etc Under this view matching two images can be cast as matching two point clouds in feature space In this space we introduce a nonparametric model that correlates feature vectors and depth values For each image this model 1 induces a dense graph with weighted edges that connect pixels Given a pair of such graphs that represent two images we match pixels between them using the graph cuts method 5 Our work makes the following three major contributions Nonparametric Smoothness in a Large Neighborhood We propose to use kernel density estimation in a large neighborhood to correlate image features with depth values Using this correlation prior in a global matching framework our method is able to preserve both highcurvature shape details and sharp discontinuities at object boundaries as shown in Figure 1 Sparse Graph Approximation Our large neighborhood smoothness prior yields an energy function defined over a dense graph that is challenging to minimize We propose novel techniques to simplify the energy function and approximate the dense graph with a sparse one that contains its dominant edges Applying graph cuts for stereo matching over such sparse graphs has the same computational complexity as matching over regular image grids Stereo Matching with Implicit Segmentation Our sparse graph differs from the original image grid in that it connects pixels with similar feature vectors Such a graph encodes an image segmentation hierarchy In practice matching pixels over such graphs preserves the discontinuity boundaries well but without requiring segmentation as a preprocessing step Segmentation is often temporally inconsistent when applied to videos by avoiding it as a preprocessing step our method recovers a more temporally stable depth estimate for dynamic scenes even when applied to each frame independently Our method is simple to implement by replacing rectangular image grids with our sparse graphs one can use our idea in most of the global stereo matching methods We use it in classic graph cuts stereo 5 in this paper We show that our stereo formulation clearly improves upon existing methods on both the Middlebury benchmark dataset 17 19 and on a range of real world


Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view cvpr09stereo and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view cvpr09stereo and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?