UTD CS 4398 - Image Annotation and Feature Extraction - D2285910

Home> Schools> The University of Texas at Dallas> Computer Science (CS) > CS 4398> Image Annotation and Feature Extraction

DOC PREVIEW

UTD CS 4398 - Image Annotation and Feature Extraction

School name The University of Texas at Dallas

Course Cs 4398- Digital Forensics

Pages 32

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Image Annotation and Feature Extraction Guest LectureOutlineHow do we retrieve images?MotivationSlide 5Slide 6AnnotationSlide 8Annotation: Segmentation & ClusteringAnnotation: Correspondence/LinkingAuto AnnotationSegmentation: Image VocabularyConstruction of Visual TermsDiscrete Visual termsVisual termsGrid vs SegmentationFeature Extraction & ClusteringCo-Occurrence ModelsCorrespondence: Translation Model (TM)Translation ModelsCorrespondence (TM )Slide 22ResultsSlide 24Slide 25Slide 26Slide 27Slide 28Slide 29Results: Precision, Recall and EResults: Precision, Recall and E-measureSlide 3225-1Image Annotation and Feature Extraction Guest LectureLei Wang (Microsoft Corporation)Latifur Khan, Bhavani Thuraisingham (UTD)October 8, 2008Digital Forensics:25-2OutlineHow do we retrieve Images?MotivationAnnotationCorrespondence: ModelsEnhancementFuture WorkResultsReference25-3How do we retrieve images?Use Google image search !Google uses filenames, surrounding text and ignores contents of the images.25-4MotivationHow to retrieve images/videos? CBIR is based on similarity search of visual featuresDoesn’t support textual queriesDoesn’t capture “semantics”Automatically annotate images then retrieve based on the textual annotations. Example Annotations:Tiger, grass.25-5MotivationThere is a gap between perceptual issue and conceptual issue.Semantic gap: Hard to represent semantic meaning using low-level image features like color, texture and shape.It’s possible to answer query ‘Red ball’ with ‘Red Rose’. Query by CBIRRetrieved image25-6MotivationMost of current automatic image annotation and retrieval approaches consider KeywordsLow-level image features for visual token/region/objectCorrespondence between keywords and visual tokensOur goal is to develop automated image annotation tecniques with better accuracy25-7Annotation25-8AnnotationMajor steps:Segmentation into regionsClustering to construct blob-tokensAnalyze correspondence between key words and blob-tokens Auto Annotation25-9Annotation: Segmentation & Clustering Images Segments Blob-tokens25-10Annotation: Correspondence/LinkingOur purpose is to find correspondence between words and blob-tokens.P(Tiger|V1), P(V2|grass)…25-11Auto AnnotationTigerGrassLion??….…25-12Segmentation: Image Vocabulary Can we represent all the images with a finite set of symbols? Text documents consist of wordsImages consist of visual termsV123 V89 V988V4552 V12336 V2V765 V9887 copyright © R. Manmatha25-13Construction of Visual TermsSegmented images ( e.g., Blobworld, Normalized-cuts algorithm.)Cluster segments. Each cluster is a visual term/blob-tokenVisterms/blobtoken… …ImagesSegmentsV1V2V3V4V1V5V625-14Discrete Visual termsRectangular partition works better!Partition keyframe, clusters across images.Segmentation problem can be avoided at some extent.copyright © R. Manmatha25-15Visual termsOr partition using a rectangular grid and cluster.Actually works better.25-16Grid vs SegmentationSegmentation vs Rectangular Partition.Results - Rectangular Partition better than segmentation!Model learned over many images. Segmentation over one image.25-17Feature Extraction & ClusteringFeature Extraction:ColorTextureShapeK-means clustering: To generate finite visual terms. Each cluster’s centroid represents a visual term.25-18Co-Occurrence ModelsMori et al. 1999Create the co-occurrence table using a training set of annotated imagesTend to annotate with high frequency wordsContext is ignoredNeeds joint probability modelsw1 w2 w3 w4V1 12 2 0 1V2 32 40 13 32V3 13 12 0 0V4 65 43 12 0P( w1 | v1 ) = 12/(12+2+0+1)=0.8P( v3 | w2 ) = 12/(2+40+12+43)=0.1225-19Correspondence: Translation Model (TM)Pr(f|e) = ∑ Pr(f,a|e)aPr(w|v) = ∑ Pr(w,a|v)a25-20Translation ModelsDuygulu et al. 2002Use classical IBM machine translation models to translate visterms into wordsIBM machine translation modelsNeed a bi-lingual corpus to train the modelsV2 V4 V6Mary did not slap the green witchMaui People DanceMary no daba una botefada a la bruja verde……V1 V34 V321 V21Tiger grasssky…………25-21Correspondence (TM )WX=NNBWB25-22Correspondence (TM )NWNBWiBj25-23ResultsDatasetCorel Stock Photo CDs. 600 CDs, each of them consists of 100 images under same topic.We select 5000 images (4500 training, 500 testing). Each image has manual annotation. 374 words and 500 blobs. sun city sky mountaingrizzly bear meadow water25-24ResultsExperimental Context3,000 training objects300 images for testingEach object is represented by a vector of 30 dimensions: color, texture, and shape25-25ResultsEach Image Object/Blob-token has 30 features: Size -- portion of the image covered by the region.Position -- coordinates of the region center of mass normalized by the image dimensions.Color -- average and standard deviation of (R,G, B), (L, a, b) over the region.Texture -- average and variance of 16 filter responses, four differences of Gaussian filters with different sigmas, and 12 oriented filters, aligned in 30-degree increments. For shape, we use six features (i.e., area, x, y, boundary, convexity, and moment of inertia). 25-26Results Examples for automatic annotation25-27Results The number of segments annotated correctly among 299 testing segments for different models25-28ResultsCorrespondence based on K-means--- PTK.Correspondence based on Weighted Feature Selection --- PTS.With GDR dimensionality of image object will be reduced (say from 30 to 20) and then apply K-means and so on.25-29ResultsPrecision p Recall r NumCorrect means the number of retrieved images which contain query keyword in its original annotationNumRetrieved is the number of retrieved images NumExist is the total number of images in test set containing query keyword in annotation Result of Common E measureE=1-2/(1/p+1/r) trievedCorrectNumNumpRe/ExistCorrectNumNumr /NumExistNumRetrievedNumCorrect25-30Results: Precision, Recall and EPrecision of retrieval for different models25-31Results: Precision, Recall and E-measureRecall of retrieval for different models25-32Results: Precision, Recall and E-measureE Measure of retrieval for different

View Full Document