DOC PREVIEW
UTD CS 4398 - Image Annotation and Feature Extraction

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Image Annotation and Feature Extraction Guest LectureOutlineHow do we retrieve images?MotivationSlide 5Slide 6AnnotationSlide 8Annotation: Segmentation & ClusteringAnnotation: Correspondence/LinkingAuto AnnotationSegmentation: Image VocabularyConstruction of Visual TermsDiscrete Visual termsVisual termsGrid vs SegmentationFeature Extraction & ClusteringCo-Occurrence ModelsCorrespondence: Translation Model (TM)Translation ModelsCorrespondence (TM )Slide 22ResultsSlide 24Slide 25Slide 26Slide 27Slide 28Slide 29Results: Precision, Recall and EResults: Precision, Recall and E-measureSlide 3225-1Image Annotation and Feature Extraction Guest LectureLei Wang (Microsoft Corporation)Latifur Khan, Bhavani Thuraisingham (UTD)October 8, 2008Digital Forensics:25-2OutlineHow do we retrieve Images?MotivationAnnotationCorrespondence: ModelsEnhancementFuture WorkResultsReference25-3How do we retrieve images?Use Google image search !Google uses filenames, surrounding text and ignores contents of the images.25-4MotivationHow to retrieve images/videos? CBIR is based on similarity search of visual featuresDoesn’t support textual queriesDoesn’t capture “semantics”Automatically annotate images then retrieve based on the textual annotations. Example Annotations:Tiger, grass.25-5MotivationThere is a gap between perceptual issue and conceptual issue.Semantic gap: Hard to represent semantic meaning using low-level image features like color, texture and shape.It’s possible to answer query ‘Red ball’ with ‘Red Rose’. Query by CBIRRetrieved image25-6MotivationMost of current automatic image annotation and retrieval approaches consider KeywordsLow-level image features for visual token/region/objectCorrespondence between keywords and visual tokensOur goal is to develop automated image annotation tecniques with better accuracy25-7Annotation25-8AnnotationMajor steps:Segmentation into regionsClustering to construct blob-tokensAnalyze correspondence between key words and blob-tokens Auto Annotation25-9Annotation: Segmentation & Clustering Images Segments Blob-tokens25-10Annotation: Correspondence/LinkingOur purpose is to find correspondence between words and blob-tokens.P(Tiger|V1), P(V2|grass)…25-11Auto AnnotationTigerGrassLion??….…25-12Segmentation: Image Vocabulary Can we represent all the images with a finite set of symbols? Text documents consist of wordsImages consist of visual termsV123 V89 V988V4552 V12336 V2V765 V9887 copyright © R. Manmatha25-13Construction of Visual TermsSegmented images ( e.g., Blobworld, Normalized-cuts algorithm.)Cluster segments. Each cluster is a visual term/blob-tokenVisterms/blobtoken… …ImagesSegmentsV1V2V3V4V1V5V625-14Discrete Visual termsRectangular partition works better!Partition keyframe, clusters across images.Segmentation problem can be avoided at some extent.copyright © R. Manmatha25-15Visual termsOr partition using a rectangular grid and cluster.Actually works better.25-16Grid vs SegmentationSegmentation vs Rectangular Partition.Results - Rectangular Partition better than segmentation!Model learned over many images. Segmentation over one image.25-17Feature Extraction & ClusteringFeature Extraction:ColorTextureShapeK-means clustering: To generate finite visual terms. Each cluster’s centroid represents a visual term.25-18Co-Occurrence ModelsMori et al. 1999Create the co-occurrence table using a training set of annotated imagesTend to annotate with high frequency wordsContext is ignoredNeeds joint probability modelsw1 w2 w3 w4V1 12 2 0 1V2 32 40 13 32V3 13 12 0 0V4 65 43 12 0P( w1 | v1 ) = 12/(12+2+0+1)=0.8P( v3 | w2 ) = 12/(2+40+12+43)=0.1225-19Correspondence: Translation Model (TM)Pr(f|e) = ∑ Pr(f,a|e)aPr(w|v) = ∑ Pr(w,a|v)a25-20Translation ModelsDuygulu et al. 2002Use classical IBM machine translation models to translate visterms into wordsIBM machine translation modelsNeed a bi-lingual corpus to train the modelsV2 V4 V6Mary did not slap the green witchMaui People DanceMary no daba una botefada a la bruja verde……V1 V34 V321 V21Tiger grasssky…………25-21Correspondence (TM )WX=NNBWB25-22Correspondence (TM )NWNBWiBj25-23ResultsDatasetCorel Stock Photo CDs. 600 CDs, each of them consists of 100 images under same topic.We select 5000 images (4500 training, 500 testing). Each image has manual annotation. 374 words and 500 blobs. sun city sky mountaingrizzly bear meadow water25-24ResultsExperimental Context3,000 training objects300 images for testingEach object is represented by a vector of 30 dimensions: color, texture, and shape25-25ResultsEach Image Object/Blob-token has 30 features: Size -- portion of the image covered by the region.Position -- coordinates of the region center of mass normalized by the image dimensions.Color -- average and standard deviation of (R,G, B), (L, a, b) over the region.Texture -- average and variance of 16 filter responses, four differences of Gaussian filters with different sigmas, and 12 oriented filters, aligned in 30-degree increments. For shape, we use six features (i.e., area, x, y, boundary, convexity, and moment of inertia). 25-26Results Examples for automatic annotation25-27Results The number of segments annotated correctly among 299 testing segments for different models25-28ResultsCorrespondence based on K-means--- PTK.Correspondence based on Weighted Feature Selection --- PTS.With GDR dimensionality of image object will be reduced (say from 30 to 20) and then apply K-means and so on.25-29ResultsPrecision p Recall r NumCorrect means the number of retrieved images which contain query keyword in its original annotationNumRetrieved is the number of retrieved images NumExist is the total number of images in test set containing query keyword in annotation Result of Common E measureE=1-2/(1/p+1/r) trievedCorrectNumNumpRe/ExistCorrectNumNumr /NumExistNumRetrievedNumCorrect25-30Results: Precision, Recall and EPrecision of retrieval for different models25-31Results: Precision, Recall and E-measureRecall of retrieval for different models25-32Results: Precision, Recall and E-measureE Measure of retrieval for different


View Full Document

UTD CS 4398 - Image Annotation and Feature Extraction

Documents in this Course
Botnets

Botnets

33 pages

Botnets

Botnets

33 pages

Load more
Download Image Annotation and Feature Extraction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Image Annotation and Feature Extraction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Image Annotation and Feature Extraction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?