Biologically Inspired Bottom-Up Visual Attention Model Laurent Itti, et al, 1998OverviewPart 1: ModelOverviewDetailsStep 1: Linear FilteringStep 2: Extract Feature MapsCenter-Surround OperationsCenter-Surround OperationsCenter-Surround OperationsStep 2a: Extract Intensity MapsStep 2b: Color MapsStep 2b: Color MapsStep 2b: Extract Color MapsStep 2b: Extract Color MapsStep 2c: Orientation MapsStep 2c: Orientation MapsStep 2c: Orientation MapsStep 3: Combine Feature Maps Into Conspicuity MapsStep 3: Combine Feature Maps Into Conspicuity MapsStep 3: Combine Feature Maps Into Conspicuity MapsStep 4: Combine Conspicuity Maps Into Saliency MapStep 5: Process regions in order of saliencyStep 5: Process regions in order of saliencyPart 2: ResultsExperimentsExperimentsNoise Sensitivity ExperimentNoise Sensitivity ExperimentSpatial Frequency Content ModelsSFC Comparison ExperimentSFC Comparison ExperimentSFC Comparison ExperimentSFC Comparison ExperimentSFC Comparison ExperimentResultsSFC Comparison ExperimentMilitary Vehicle ExperimentMilitary Vehicle ExperimentMilitary Vehicle ExperimentMilitary Vehicle ExperimentMilitary Vehicle ExperimentNatural ImagesNatural ImagesWhy the Model is EffectiveWhy it Models the Primate Visual System CloselyCriticisms of the ModelCriticisms of the ExperimentsAdditional ExperimentsBiologically Inspired Bottom-Up Visual Attention Model Laurent Itti, et al, 1998Mike OnoratoCOS 598BOverviewz Task: Detect most salient regions.z Decrease dimensionalitySource: [Laurent Itti, USC iLab: http://ilab.usc.edu/bu/] (Same for all images unless otherwise specified.)Part 1: ModelOverviewz Bottom-Up MethodDetailsStep 1: Linear Filteringz 9 scales from 1:1 to 1:256G0 256 x 256G1 128 x 128G2 64 x 64G3 32 x 32Source: [www.singularsys.com/research/courses/ 616/funk-project-pres.ppt]Step 2: Extract Feature Mapsz Center-surround operations at multiple scalesCenter-Surround Operationsz Difference between value of pixel at two different scales:z c: center pixel scalez s: surround pixel scalec sCenter-Surround Operationsz c = pixel at scale c Є {2,3,4} z s = pixel at scale c+δ, where δЄ{3,4} z 6 different scale combinationsc = 2δ = 3s = 5δ = 4s = 6δ = 3s = 6δ = 4s = 7c = 3c = 4δ = 3s = 7δ = 4s = 8Center-Surround Operationsz c = pixel at scale c Є {2,3,4} z s = pixel at scale c+δ, where δЄ{3,4} z 6 different scale combinationsc = 2 s = c+3 = 5 s = c+4 = 6Step 2a: Extract Intensity MapsI(x): intensity map at scale xѲ: pixel difference operationI(c,s) = |I(c) Ѳ I(s)|Step 2b: Color MapsRed: R(x) Green: G(x) Blue: B(x) Yellow: Y(x)0 0.20.40.60.81 R50 100 150 200 25050100150200250Step 2b: Color Maps0 0.20.40.60.81 R50 100 150 200 250501001502002500 0.20.40.60.81 B50 100 150 200 250501001502002500 0.20.40.60.81 Y50 100 150 200 25050100150200250R G B YIntensitySource: [www.singularsys.com/research/courses/616/funk-project-pres.ppt]Step 2b: Extract Color MapsRG(c,s) = |( R(c) - G(c) ) Ѳ ( G(c) - R(c) )|BY(c,s) = |( B(c) - Y(c) ) Ѳ ( Y(c) - B(c) )|z Create a red-green and blue-yellow color map+R-G+R-G+G-R+G-R+B-Y+B-Y+Y-B+Y-BSource: [www.singularsys.com/research/courses/616 /funk-project-pres.ppt]Step 2b: Extract Color MapsRGBY+R-G+R-G+G-R+G-R +B-Y+B-Y+Y-B+Y-BRed: R(x) Green: G(x) Blue: B(x) Yellow: Y(x)Step 2c: Orientation Maps0° 45° 90° 135°Source: [http://www.cs.rug.nl/~imaging/simplecell.html]z Gabor Filtering:z Difference between image and Gabor filterz Gabor filters for 4 different orientationsStep 2c: Orientation MapsO(x,θ):θ = 90°ORx = 2x = 5Step 2c: Orientation MapsO(c,s,θ) = |O(c,θ) Ѳ O(s,θ)|O(c,θ) O(s,θ)θ = 90°s = 2c = 2+3 = 5Step 3: Combine Feature Maps Into Conspicuity MapsStep 3: Combine Feature Maps Into Conspicuity Mapsz Normalize map values to range [0..1]z m = avg local maxz = (1-m)2Step 3: Combine Feature Maps Into Conspicuity Mapsz Conspicuity Maps:Step 4: Combine Conspicuity Maps Into Saliency MapStep 5: Process regions in order of saliencyIntegrate-and-fire neuronsWinner-take-all neuronsInhibition-of-returnFOAStep 5: Process regions in order of saliencyFOAWTAInhibition-of-returnI&F30-70ms500-900msPart 2: ResultsExperimentsz Same shape, different contrast, orientation or colorExperimentsz Pop-out:Noise Sensitivity ExperimentNoise Sensitivity Experimentz Only one imagez # trials per density not statedSpatial Frequency Content Modelsz Eye-tracking study shows certain locations are attended to more than others [Reinagel and Zador]z Measured spatial frequency content (SFC) by:z At each image location, extract 16x16 patch of I(2), R(2), G(2), B(2), and Y(2)z Apply 2D Fast Fourier TransformSFC Comparison Experimentz Dataset:z Natural scenes with traffic signs (90 images)SFC Comparison Experimentz Dataset:z Natural scenes with traffic signs (90 images)SFC Comparison Experimentz Dataset:z Natural scenes with traffic signs (90 images)z Red soda can (104 images)SFC Comparison Experimentz Dataset:z Natural scenes with traffic signs (90 images)z Red soda can (104 images)SFC Comparison Experimentz Dataset:z Natural scenes with traffic signs (90 images)z Red soda can (104 images)z Vehicle’s emergency triangle (64 images)ResultsSpatial Frequency Content Maps (Red)Saliency Maps (Yellow)SFC Comparison Experimentz Results:z 1stlocation: SFC 2.5 ± 0.05 times the average SFCz …z 8thlocation: SFC 1.6 ± 0.05 times the average SFCMilitary Vehicle Experimentz Time taken to attend to military vehiclez Compare to 62 human observersMilitary Vehicle Experimentz Time taken to attend to military vehiclez Compare to 62 human observersMilitary Vehicle Experimentz Time taken to attend to military vehiclez Compare to 62 human observersMilitary Vehicle Experimentz Time taken to attend to military vehiclez Compare to 62 human observersMilitary Vehicle Experimentz Results:z Itti’s model finds target in fewer attentional shifts in 75% of trialsNatural ImagesNatural ImagesWhy the Model is Effectivez Fastz Parallel processingz No top-down knowledgez Similar to primate visual systemWhy it Models the Primate Visual System Closelyz Parallel and bottom-up mapsz Maps of orientation, intensity and colorz Linear filteringz Center-surround operationsz Winner-take-allz Slow sequential attention shiftingCriticisms of the Modelz Cannot detect junctions of featuresz Cannot detect features other than color, intensity and orientationz No content completion or closurez Does not include
View Full Document