DOC PREVIEW
UT EE 381K - Probability models for Visual Search

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Probability models for Visual SearchLiterature SurveyUmesh RajashekarEE381K - Multidimensional Digital Signal ProcessingFALL 2000The University of Texas at AustinAbstractThe development of efficient artificial machine vision systems depends on the ability tomimic aspects of the human visual system. Humans scan the world using a high-resolution central region called the fovea and a low resolution surrounding area to guidethe search. A direct consequence of this non-uniform sampling is the active nature inwhich human visual system gathers data in the real world using fixations and saccades. Inthis report, we look at a few techniques that attempt to mimic the visual search strategiesof the human visual system.1.INTRODUCTIONThe human visual system uses a dynamic process of actively scanning the visualenvironment. The active nature of scanning is reflected in the eye scanpath pattern. Thesesequence of fixations and saccades (constituting the scanpaths) are attributed to thedistribution of the photoreceptors on the retina. The photoreceptors are packed densely atthe point of focus on the retina - fovea and the sampling rate drops almost exponentiallyfrom the fovea. Fig 1 shows a typical retinal sampling grid. As a result, humans see withvery high resolution at the fixation point and the resolution falls away from the fixationpoint. Fig 2 shows a typical image on the retina - the fixation point being the middle ofthe image. In order to build a detailed representation of the image, the eye scans the scenewith a series of fixations and jumps (saccades) to new fixation points. Information isgathered by the eye during fixations while no information is gathered during thesaccades. The fixation duration is about 200ms. Fig 3 shows a typical scan path of thehuman eye while looking at the image [1].The active nature of looking has its advantages in terms of speed and reducedstorage requirements (due to the non-uniform resolution across the image) in buildingartificial vision systems. It also have significant applications in the area of videocompression where the region around the fixation point in the video sequence istransmitted with high resolution and regions away from the fixation point are blurred.The development of foveation based artificial vision systems and video compressionschemes depends on the ability to determine the fixation points/area of interest regions inthe image. However, in general, we cannot predict a person’s scanpath while viewing ascene in a realistic way. One common solution to determine the eye scan path is the useof eye trackers. An alternative solution is to develop models for the fixation problem.Since deterministic solution to the fixation point prediction problem is impossible(different people look at the same image using different scan paths based on the motive),I propose to investigate the possibility of building a probabilistic model for eye fixationsin a visual search environment.Besides the applications already mentioned, the development of such a fixationmodel has significant applications in computer vision applications such as pictorial imagedatabase query and image understanding.2. Previous models for fixation point selectionThe primary goal of many machine vision systems has been the development ofalgorithms that interpret visual data from cameras to help computers to see. Most of theactive vision systems developed are developed for a specific task and hence perform onlyin constrained scenarios. In this section we will briefly go over three such techniques.1. Image features and fixations Privetera and Stark [2] propose a computational model for human scanpaths based onintelligent image processing of digital images. The basic idea is to define algorithmicregions of interest (aROI) generated by the image processing algorithms and compare theresult with human regions of interest (hROI). The comparison of the aROI and hROI isaccomplished by analyzing their spatial/structural binding (location similarity) andtemporal/sequential binding (order of fixations). Based on their experiments, the aROIgenerated by wavelet decomposition of the image (which is inherently multi-resolution)seemed to match the hROIs well. Symmetry and contrast also seemed to be stronglycorrelated with fixation. Their results also indicate that the fixation point prediction canbe no better 50% i.e. only half the predictions made are accurate. While the results of thispaper are definitely promising, the techniques to determine fixation points do not seem toaccount for the fact that the next fixation point selection is dependent on the currentfixation point. Further, a weighted result of using multiple image processing algorithmsmight produce better prediction of aROIs.2. Probability modelsKlarquist and Bovik [3] propose an alternative technique for fixation point selectionin 3D space. The fixation point selection was developed for FOVEA - "an active visionsystem platform with capabilities similar to sophisticated biological vision systems"[3].FOVEA uses a probabilistic approach to fixation point selection and hence makes theselection of the fixation point less rigid and also contingent on the features around thecurrent fixation point. The probability model is developed using a number of criteria. Thefixation point selection process is independent of the criteria and hence creates a cleardichotomy between the selection criterion and the selection process. The selectioncriterion is based on local information content (gradient information), proximity of thecandidate fixation point to the current fixation point and the surface map in the vicinity ofthe current fixation point. However no indication of the performance of their system withhuman scanpaths is provided.3. Saliency models for image understandingHenderson [4] proposes a more robust method towards fixation point selection inimages. The model incorporates the cognition factor involved in fixation point selection.The initial fixation map is derived by analyzing low-level features (contrast, edges) in theimage. Based on the task at hand (search for a target), the model is trained to"understand" the image. Incorporating cognition into a model is a difficult task sincecognition is task specific. However the proposed model facilitates both the prediction ofthe fixation points and the duration of fixations.3. Probability models for Visual SearchFor my project, I propose to investigate probability models in a constrained visualsearch


View Full Document

UT EE 381K - Probability models for Visual Search

Documents in this Course
Load more
Download Probability models for Visual Search
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probability models for Visual Search and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probability models for Visual Search 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?