DOC PREVIEW
UT CS 395T - Improved Image Annotation and Labelling through Multi-label Boosting

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Improved Image Annotation and Labellingthrough Multi-label BoostingMatthew Johnson and Roberto CipollaDepartment of Engineering,University of [email protected],[email protected] majority of machine learning systems for object recognition is limited bytheir requirement of single labelled images for training, which are difficult tocreate or obtain in quantity. It is therefore impractical to use methods ortechniques which require such data to build object recognizers for more thana relatively small subset of object classes. Instead, far more abundant multi-label data provides a ready means to create object recognition s ystems whichare able to deal with large numbers of classes. In this paper we present a newobject recognition system named MLBoost which learns from multi-labeldata through boosting and improves on state-of-the-art multi-label annotationand labelling systems. The system is trained on images with accompanyingtext and at no time is told which parts of each image correspond to whichwords, and as such the process i s unsupervised. Having once been trained itis able to give segment labels and a list of descriptive words (an annotation)for any novel image.1 IntroductionThe majority of machine learning systems for object recognition are limited by their re-quirement of single labelled images for training, which are difficult to create or obtain inquantity. This restriction is a major barrier towards building large scale multi-class objectrecognition systems using these techniques as every class to be learned by the systemrequires its own set of hand-labelled and sometimes hand-segmented training images. Inorder to move forward in the field of multi-class object recognition, new techniques mustbegin to utilize more abundant and easily acquirable types of data. One such type ismulti-label data, in the form of images with accompanying text. Among corpora like theCorel image database, newspaper photograph archives with captions, stock advertisingphotographs and a bevy of other sources there is more than enough data to build the nextgeneration of multi-class object recognizers.In this paper we present a system called MLBoost which is able to learn enoughfrom 1500 or so annotated images of the form s hown in figure 1 to perform labelling andannotation on novel images with better results than a state-of-the-art recognizer of thesame type [1]. It achieves this by learning the correlation between image segments andthe accompanying text in a set of training images. Having learnt this, when given any newimage it is able to translate it into words, giving both a labelling for the segments and anannotation for the image as a whole as shown in figure 2.Figure 1: Typical examples of the training data for ML-Boost. All images were taken from the Corel databaseand had anywhere from 1 to 5 accompanying keywordswhich described the image contents. The images were au-tomatically segmented using Normalized Cuts [9] and afeature vector including color, texture and other cues wasextracted from each segment for use in the learning pro-cess. There was no correspondence given between seg-ments and text.Figure 2: Output from the MLBoostalgorithm. The word shown is themost probable label for that area of theimage, however a full distribution overthe vocabulary is produced for eachsegment. The words below the imageare the annotation produced by the al-gorithm based on the segment labels.2 Matching Words and PicturesVisualize a system which sees various images that contain a large segment of solid blueand that always have the word “sky” as one of their labels. Over time, it would be ableto learn that the features of those segments and the word “sky” are both different expres-sions of the same underlying concept, and is then able to translate between them. Thealgorithms and techniques which pertain to this kind of learning come largely from themachine translation community, and have been adapted for use in computer vision byBarnard et. al [1, 4] who show it to be quite effective at a variety of vision tasks. Theinitial model they used was inspired by [7], though a full range of models is evaluatedin [1] with recent work focusing on latent Dirichlet allocations and probabilistic latentsemantic analysis [10, 2].Barnard et al.’s method in [1] can best be understood by one of the more straight-forward models presented there, I-2, and it is that model that we will be describing herefor the use of comparison and later for improvement. Their method is designed to workwith feature vectors, or “blobs”, representing the segments of an image. These are relatedthrough annotations to associated words which describe the image as a whole. The ac-curacy of the segmentation method isn’t vital provided that it is consistent, allowing theuse of automatic segmentation techniques such as normalized cuts [9] to prepare the data,regardless of their lack of accuracy. Once an image has been segmented, a feature vectoris extracted for each segment containing color (mean and variance of RGB, Lab, and nor-malized r/(R+G+B) and g/(R+G+B)), texture (mean and variance of 16 Gabor-like filterresponses) and other cues (shape, position, s ize, etc.) as described in [6].To statistically link blobs with words, it is assumed that there are hidden factors (con-cepts) which are responsible for generating both the words and blobs associated with thatfactor. By generating both words and blobs, the concepts can then be used to link thetwo, learning how they relate. The observations (image and associated text) are assumedto be generated from multiple draws from the hidden factors, as otherwise all possiblecombinations of words and blobs would need to be modelled. The joint probability of aparticular blob, b, and a word w, is modelled asP(w,b) =∑cP(w|c)P(b|c)P(c) (1)where c indexes over the concepts, P(c) is the concept prior, P(w|c) is a frequency table,and P(b|c) is a normal distribution over features. The normal distribution over featuresis assumed to have diagonal covariance to simplify calculation and avoid overfitting. Theprobability of the observed data, W ∪ B, given the model, is then:P(W ∪ B) =∏b∈B∑cP(b|c)P(c)∏w∈W ∑lP(w|c)P(c|B)!(2)where W is the set of all annotated words, B is the set of blobs and P(c|B) ∝∑bP(c|b),normally limited to the N largest blobs (typically 8 or 10).This model is fit using the Expectation/Maximization technique [3] treating the con-cepts as hidden


View Full Document

UT CS 395T - Improved Image Annotation and Labelling through Multi-label Boosting

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Improved Image Annotation and Labelling through Multi-label Boosting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Improved Image Annotation and Labelling through Multi-label Boosting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Improved Image Annotation and Labelling through Multi-label Boosting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?