Image Annotation and Feature Extraction Guest LectureOutlineHow do we retrieve images?MotivationSlide 5Slide 6AnnotationSlide 8Annotation: Segmentation & ClusteringAnnotation: Correspondence/LinkingAuto AnnotationSegmentation: Image VocabularyConstruction of Visual TermsDiscrete Visual termsVisual termsGrid vs SegmentationFeature Extraction & ClusteringCo-Occurrence ModelsCorrespondence: Translation Model (TM)Translation ModelsCorrespondence (TM )Slide 22ResultsSlide 24Slide 25Slide 26Slide 27Slide 28Slide 29Results: Precision, Recall and EResults: Precision, Recall and E-measureSlide 3225-1Image Annotation and Feature Extraction Guest LectureLei Wang (Microsoft Corporation)Latifur Khan, Bhavani Thuraisingham (UTD)October 8, 2008Digital Forensics:25-2OutlineHow do we retrieve Images?MotivationAnnotationCorrespondence: ModelsEnhancementFuture WorkResultsReference25-3How do we retrieve images?Use Google image search !Google uses filenames, surrounding text and ignores contents of the images.25-4MotivationHow to retrieve images/videos? CBIR is based on similarity search of visual featuresDoesn’t support textual queriesDoesn’t capture “semantics”Automatically annotate images then retrieve based on the textual annotations. Example Annotations:Tiger, grass.25-5MotivationThere is a gap between perceptual issue and conceptual issue.Semantic gap: Hard to represent semantic meaning using low-level image features like color, texture and shape.It’s possible to answer query ‘Red ball’ with ‘Red Rose’. Query by CBIRRetrieved image25-6MotivationMost of current automatic image annotation and retrieval approaches consider KeywordsLow-level image features for visual token/region/objectCorrespondence between keywords and visual tokensOur goal is to develop automated image annotation tecniques with better accuracy25-7Annotation25-8AnnotationMajor steps:Segmentation into regionsClustering to construct blob-tokensAnalyze correspondence between key words and blob-tokens Auto Annotation25-9Annotation: Segmentation & Clustering Images Segments Blob-tokens25-10Annotation: Correspondence/LinkingOur purpose is to find correspondence between words and blob-tokens.P(Tiger|V1), P(V2|grass)…25-11Auto AnnotationTigerGrassLion??….…25-12Segmentation: Image Vocabulary Can we represent all the images with a finite set of symbols? Text documents consist of wordsImages consist of visual termsV123 V89 V988V4552 V12336 V2V765 V9887 copyright © R. Manmatha25-13Construction of Visual TermsSegmented images ( e.g., Blobworld, Normalized-cuts algorithm.)Cluster segments. Each cluster is a visual term/blob-tokenVisterms/blobtoken… …ImagesSegmentsV1V2V3V4V1V5V625-14Discrete Visual termsRectangular partition works better!Partition keyframe, clusters across images.Segmentation problem can be avoided at some extent.copyright © R. Manmatha25-15Visual termsOr partition using a rectangular grid and cluster.Actually works better.25-16Grid vs SegmentationSegmentation vs Rectangular Partition.Results - Rectangular Partition better than segmentation!Model learned over many images. Segmentation over one image.25-17Feature Extraction & ClusteringFeature Extraction:ColorTextureShapeK-means clustering: To generate finite visual terms. Each cluster’s centroid represents a visual term.25-18Co-Occurrence ModelsMori et al. 1999Create the co-occurrence table using a training set of annotated imagesTend to annotate with high frequency wordsContext is ignoredNeeds joint probability modelsw1 w2 w3 w4V1 12 2 0 1V2 32 40 13 32V3 13 12 0 0V4 65 43 12 0P( w1 | v1 ) = 12/(12+2+0+1)=0.8P( v3 | w2 ) = 12/(2+40+12+43)=0.1225-19Correspondence: Translation Model (TM)Pr(f|e) = ∑ Pr(f,a|e)aPr(w|v) = ∑ Pr(w,a|v)a25-20Translation ModelsDuygulu et al. 2002Use classical IBM machine translation models to translate visterms into wordsIBM machine translation modelsNeed a bi-lingual corpus to train the modelsV2 V4 V6Mary did not slap the green witchMaui People DanceMary no daba una botefada a la bruja verde……V1 V34 V321 V21Tiger grasssky…………25-21Correspondence (TM )WX=NNBWB25-22Correspondence (TM )NWNBWiBj25-23ResultsDatasetCorel Stock Photo CDs. 600 CDs, each of them consists of 100 images under same topic.We select 5000 images (4500 training, 500 testing). Each image has manual annotation. 374 words and 500 blobs. sun city sky mountaingrizzly bear meadow water25-24ResultsExperimental Context3,000 training objects300 images for testingEach object is represented by a vector of 30 dimensions: color, texture, and shape25-25ResultsEach Image Object/Blob-token has 30 features: Size -- portion of the image covered by the region.Position -- coordinates of the region center of mass normalized by the image dimensions.Color -- average and standard deviation of (R,G, B), (L, a, b) over the region.Texture -- average and variance of 16 filter responses, four differences of Gaussian filters with different sigmas, and 12 oriented filters, aligned in 30-degree increments. For shape, we use six features (i.e., area, x, y, boundary, convexity, and moment of inertia). 25-26Results Examples for automatic annotation25-27Results The number of segments annotated correctly among 299 testing segments for different models25-28ResultsCorrespondence based on K-means--- PTK.Correspondence based on Weighted Feature Selection --- PTS.With GDR dimensionality of image object will be reduced (say from 30 to 20) and then apply K-means and so on.25-29ResultsPrecision p Recall r NumCorrect means the number of retrieved images which contain query keyword in its original annotationNumRetrieved is the number of retrieved images NumExist is the total number of images in test set containing query keyword in annotation Result of Common E measureE=1-2/(1/p+1/r) trievedCorrectNumNumpRe/ExistCorrectNumNumr /NumExistNumRetrievedNumCorrect25-30Results: Precision, Recall and EPrecision of retrieval for different models25-31Results: Precision, Recall and E-measureRecall of retrieval for different models25-32Results: Precision, Recall and E-measureE Measure of retrieval for different
View Full Document