CS395T‐Visual.Recogni5on.and.Search.Gautam.S..Muralidhar.Today’s.Theme.• Unsupervised.discovery.of.images.• Main.mo5va5on.behind.unsupervised.discovery.is.that.supervision.is.expensive.• Common.tasks.include.–..– Detec5ng.objects.and.their.loca5ons.– Segmenta5on.– Ac5vity.recogni5on.– Irregulari5es.in.images.and.videos.Detec5ng.Objects.and.Segmenta5on.From.Sivic.et.al.Ac5on.Class.Recogni5on.From.Wang.et.al.Detec5ng.Irregulari5es.From.Boiman.and.Irani.Recipes.• Usually.–.– Process.images.and.detect.interest.points.– Extract.low.level.features./descriptors.(e.g.,.SIFT).– Cluster.the.image.based.on.the.descriptors.– Learn.sta5s5cal.models.to.infer.object.categories./.ac5vity.classes.• An.alterna5ve.–..– To.make.use.of.an.exis5ng.database.as.evidence.for.a.task,.for.e.g.,.detec5ng.irregulari5es.. .Detec5ng.objects.and.their.loca5ons.in.images.. . ‐.Sivic.et.al.Analogy.between.text.documents.and.images.• Text.documents.‐.composed.of.words,.Images.–.composed.of.visual.words.• Both.can.be.represented.by.a.bag.of.words.approach.• Associated.with.each.(visual).word.is.an.(object).topic.category.• .Text.documents.–.mixture.of.topics,.Images.–.mixture.of.object.categories.pLSA.• The.joint.probability.P(wi,dj,zk).is.assumed.to.have.the.following.graphical.model:.• Goal.of.pLSA.–.find.topic.specific.word.distribu5on.P(w|z),.document.specific.mixing.propor5ons.P(z|d ).and.from.these,.the.document.specific.word.distribu5on.P(w|d).From.Sivic.et.al.pLSA.model.• Fi\ng.the.model.involves.determining.the.topics,.which.are.common.to.all.documents.and.mixture.of.coefficients,.which.are.specific.to.each.document.• Maximizing.the.objec5ve.func5on.. yields.the.maximum.likelihood.es5mate.of.the.parameters.of.the.model.that.gives.high.probability.to.words.that.appear.in.a.corpus.From.Sivic.et.al.Obtaining.Visual.Words.• SIFT.descriptors.extracted.from.ellipi5cal.shape.adapta5on.about.an.interest.point.and.maximally.stable.extremal.regions.• SIFT.has.all.the.nice.proper5es..• The.SIFT.descriptors.are.then.vector.quan5zed.(k‐means).into.visual.words.• Total.vocabulary.size.=.2237.words.Doublets.of.Visual.Words.• Black.Ellipse.represents.the.visual.word.whose.doublets.we.want.to.es5mate,.ellipses.that.are.red.and.green.are.candidate.neighbors.• .The.large.red.ellipse.significantly.overlaps.with.the.black.ellipse.and.is.discarded.• Likewise,.the.smaller.red.ellipse.is.‘too.small’.compared.to.the.black.ellipse.and.is.discarded.• The.Green.ellipses.are.returned.as.doublets.for.the.black.ellipse.From.Sivic.et.al.Model.Learning.and.Baseline.Method.• EM.algorithm.for.pLSA‐..converges.in.40‐100K.itera5ons.• For.the.baseline.method.k‐means.was.employed.on.the.same.features.of.the.word.frequency.vectors.for.each.image.Experiments.and.Datasets.• Three.experiments:.1. Topic.discovery.–.categories.are.discovered.by.pLSA.clustering.on.all.available.images.2. Classifica5on.of.unseen.images.–.topics.on.one.set.of.images.are.learnt.to.determine.the.topics.in.another.set.3. Object.detec5on.to.determine.the.loca5on.and.approximate.segmenta5on.of.the.objects.• Dataset.–.Caltech.101.(5.categories).and.MIT.Topic.Discovery.Experiment.• Case.1.–.Images.of.4.object.categories.with.clugered.background.• When.number.of.topics.K.=.4,.98%.of.the.4.different.categories.are.accurately.discovered.• K.=.5.splits.the.car.dataset.into.twp.subtopics.as.the.data.consists.of.sets.of.many.repeated.images.of.the.same.car.• K.=.6,.splits.the.motorbike.data.into.sets.with.plain.and.clugered.background.• K.=.7.and.8,.discovers.two.more.sub‐groups.of.the.car.data.containing.again.other.repeated.images.of.the.same/similar.cars..From.Sivic.et.al.Most.probable.visual.words.• Visual.words.with.high.topic.specific.probability.‐.P(wi|zk).From.Sivic.et.al.Topic.Discovery.Experiment..‐.Case.2,.with.Background.Topics.From.Sivic.et.al.Classifying.New.Images.Experiment.• P(w|z).–.topic.specific.distribu5ons.are.learned.from.a.separate.set.of.training.images.• When.observing.a.new,.previously.unseen.test.image,.the.document.specific.mixing.coefficients.P(z|test).are.computed.• Achieved.by.EM.with.only.coefficients.P(z|test).updated.in.each.M‐step.and.the.learned.P(w|z).are.kept.fixed.Classifica5on.Results.From.Sivic.et.al.Segmenta5on.Results.from.the.Posteriors.Mixing.coefficients.From.Sivic.et.al.Segmenta5on.Results.–.with.Doublets.From.Sivic.et.al.MIT.dataset.results.from.Sivic.et.al.MIT.dataset.results.from.Sivic.et.al.Conclusion.• Visual.object.categories.can.be.discovered.using.an.unsupervised.approach.• Tasks.such.as.segmenta5on.can.be.performed.using..simple.bag.of.features.combined.with.sta5s5cal.models.• However,.bag.of.features.does.not.take.into.account.seman5c.context.–.Can.models.from.sta5s5cal.text.literature.handle.contex t?.• Are.models.from.tex t.really.appropriate?.–.Unlike.text,.Images.have.a.strong.spa5al.structure.Moving.on…..Unsupervised.Discovery.of.Ac5on.Classes.By.Wang.et.al..Basic.Idea.• Cluster.images.that.depict.similar.ac5ons.together.and.label.these.clusters.(ac5on.classes).• Assign.a.new.image.to.an.ac5on.class.based.on.its.distance.from.the.centroids.of.the.clusters..Approach.• Human.shape.as.a.cue.to.determine.the.ac5on.• The.similarity.measure.for.clustering.has.to.take.into.account.the.deforma5ons.when.comparing.two.images.of.different.people.performing.different.ac5ons.Similarity.Measure.• Requirement.‐.yield.a.high.value.on.a.pair.of.images.when.similar.poses.are.depicted.and.a.low.value.on.dissimilar.poses.• Spectral.Clustering‐..Affinity.Matrix.W.(n.x.n).where.Wij.=.affinity.between.images.i.and.j,.Wij.=.exp(‐(dij2.+.dji2)/2)..Deformable.Template.Matching.• Algorithm.to.match.ac5ons.in.images.by.measuring.affinity.(similarity).• Posed.as.an.Integer.Linear.Programming.Problem.• Computa5onally.not.feasible.when.required.to.compute.n.x.n.affinity.measures.(required.to.do.so.as.the.affinity.measure.is.not.symmetric).• A.fast.pruning.algorithm.based.on.shape.contexts.is.used.to.address.this.issue..Fast.Pruning.using.Representa5ve.Shape.Contexts.Slide.From.Grauman,.Original.Source.from.Belongie,.Malik.and.
View Full Document