Transfer Learning for Image ClassificationTransfer Learning ApproachesSharing Features: efficient boosting procedures for multiclass object detectionSnapshot of the ideaTraining a single boosted classifierSlide 6Standard Multiclass Case: No SharingMulticlass Case: Sharing FeaturesSharing features for multiclass object detectionLearning efficiencySlide 11How the features are shared across objectsUncovering Shared Structures in Multiclass ClassificationStructure Learning FrameworkMulticlass Loss FunctionSlide 16Low Rank Regularization PenaltyPutting it all togetherSlide 19Slide 20Transfer Learning for Image Classification via Sparse Joint RegularizationTraining visual classifiers when a few examples are availableSnapshot of the idea:Semi-supervised learningSemi-supervised learning:Learning visual representations using unlabeled data onlyLearning visual representations from unlabeled data + labeled data from related categoriesOur contributionOverview of the methodStep I: Compute a representation using the unlabeled dataSlide 31SidetrackStep II: Discover relevant features by joint sparse approximationSlide 34Slide 35Slide 36Slide 37Slide 38Slide 39Step III: Compute the discriminative features representationExperiments: DatasetDatasetBaseline RepresentationBaseline representationSettingExperimentsResults:Slide 48ResultsConclusionFuture workJoint Sparse ApproximationOutlineJoint Sparse Approximation as a Constrained Convex Optimization ProblemEuclidean Projection into the L1-∞ ballSlide 56Slide 57An Efficient Algorithm For Finding μComplexitySlide 60Synthetic ExperimentsSlide 62Dataset: News story predictionImage Classification ExperimentsConclusion and Future WorkFuture Work: Online Multitask Classification [Cavallanti et al. 2008]Slide 67LagrangianSlide 69Transfer Learning forImage ClassificationTransfer Learning Approaches Leverage data from related tasks to improve performance: Improve generalization. Reduce the run-time of evaluating a set of classifiers. Two Main Approaches: Learning Shared Hidden Representations. Sharing Features.Sharing Features: efficient boosting procedures for multiclass object detection Antonio TorralbaKevin MurphyWilliam FreemanSnapshot of the ideaGoal: Reduce the computational cost of multiclass object recognition Improve generalization performanceApproach: Make boosted classifiers share weak learnersdiRx)},(),...,,(),{(221,1 nnT yxyxyx],...,,[21 miiiiyyyy Some Notation:}1,1{ kiyTraining a single boosted classifier Consider training a single boosted classifier:Candidate weak learnersWeighted stumpsFit an additive modelTraining a single boosted classifierGreedy Approach Minimize the exponential lossGentle BoostingStandard Multiclass Case: No SharingAdditive model for each classMinimize the sum of exponential lossesEach class has its own set of weak learners:Multiclass Case: Sharing Features subset of classescorresponding additiveclassifierclassifier for the k classAt iteration t add one weak learner to oneof the additive models:Minimize the sum of exponential lossesNaive approach:Greedy Heuristic:Sharing features for multiclass object detectionTorralba, Murphy, Freeman. CVPR 2004Learning efficiencySharing features shows sub-linear scaling of features with objects (for area under ROC = 0.9).Red: shared features Blue: independent featuresHow the features are shared across objectsBasic features: Filter responses computed at different locations of the imageUncovering Shared Structuresin Multiclass ClassificationYonatan AmitMichael FinkNathan SrebroShimon UllmanStructure Learning FrameworkClass ParametersStructural Parameters Find optimal parameters:Linear ClassifiersLinear Transformations)()(),,(minarg211,cWcWLossniiiWyxMulticlass Loss Function)1(max),,(1,1:,xxyx TjTpyyyywwWLosspjpj Maximal Hinge Loss:)1,0max( xtyw Hinge Loss:Snapshot of the idea Main Idea: Enforce Sharing by finding low rank parameter matrix WTransformation on x Transformation on w Consider the m by d parameter matrix: Can be factored: Rows of theta form a basis)1(max),,(1,1:,xxyx TjTpyyyywwWLosspjpjLow Rank Regularization Penalty Rank of a d by m matrix is the smallest z, such that: A regularization penalty designed to minimize the rank of W’ wouldtend to produce solutions where a few basis are shared by all classes. Minimizing the rank would lead to a hard combinatorial problem Instead use a trace norm penalty:eigen value of W’Putting it all togetherNo longer in the objective For optimization they use a gradient based methodthat minimizes a smooth approximation of the objective)(),,(minarg'1''WcWLossniiiWyxMammals DatasetResultsTransfer Learning for Image Classification via Sparse Joint RegularizationAriadna QuattoniMichael CollinsTrevor DarrellTraining visual classifiers when a few examples are availableProblem:Image classification from a few examples can be hard.A good representation of images is crucial. Solution: We learn a good image representation using: unlabeled data + labeled data from related problemsSnapshot of the idea:Use unlabeled dataset + kernel function to compute a new representation:Complex features, high dimensional spaceSome of them will be very discriminative (hopefully)Most will be irrelevantIf we knew the relevant features we could learn from fewer examples.Related problems may share relevant features.We can use data from related problems to discover them !!Semi-supervised learningLarge dataset of unlabeled dataSmall training set of labeled images h-dimensional training seth:F I R�Visual RepresentationUnsupervised learningCompute Train ClassifierStep 1: Learn representationStep 2: Train Classifierh:F I R�Semi-supervised learning:Raina et al. [ICML 2007] proposed an approach that learns a sparse set of high level features (i.e. linear combinations of the original features) from unlabeled data using a sparse coding technique. Balcan et al. [ML 2004] proposed a representation based on computing kernel distances to unlabeled data points.Learning visual representations using unlabeled dataonlyUnsupervised learning in data spaceGood thing:Lower dimensional representation preserves relevant statistics of the data sample.Bad things:The representation might still contain irrelevant
View Full Document