Introduction to Predictive LearningOUTLINESparse High-Dimensional DataSparse High-Dimensional DataHow to improve generalization for HDLSS?Formalizing Application RequirementsPhilosophical MotivationPhilosophical Motivation (cont’d)VC-theoretical approachContrast these two approachesExamples of Nonstandard SettingsSVM-style Framework for New SettingsSlide 13Transduction (Vapnik, 1982, 1995)Transduction vs InductionTransduction based on size of marginMargin-based Local LearningSlide 18Loss function for unlabeled samplesOptimization formulation for SVM transductionOptimization formulation (cont’d)Many applications for transductionExample applicationSemi-Supervised Learning (SSL)SSL and Cluster AssumptionToy Example for Text ClassificationSelf-Learning Method (example of SSL)Illustration (using 1-nearest neighbor classifier)Illustration (after 50 iterations)Illustration (after 100 iterations)Comparison: SSL vs T-SVMComparison 1: SSL vs T-SVM and SVMComparison 2: SSL vs T-SVM and SVMExplanation of T-SVM for digits data setExplanation of T-SVM (cont’d)Universum Learning (Vapnik 1998, 2006)Cultural Interpretation of the UniversumSlide 38Cultural Interpretation of the UniversumMore on Marcel DuchampMain Idea of Universum LearningLearning with the UniversumInference through contradictionsUniversum SVM Formulation (U-SVM)-insensitive loss for Universum samplesApplication Study (Vapnik, 2006)Universum U3 via random averaging (RA)Random Averaging for digits 5 and 8Application Study: gender of human facesMale Faces: examplesFemale Faces: examplesUniversum Faces: neither male nor femaleEmpirical Study (Bai and Cherkassky 2008)Empirical Study (cont’d)Universum generation: examplesResults of gender classificationSlide 57Random Averaging UniversumHistogram of projections and RA UniversumSlide 60Slide 61Example: Handwritten digits 5 vs 8Histogram of projections for MNIST dataAnalyzing ‘other digits’ UniversumDiscussionLearning Using Privileged Info (LUPI)Slide 67LUPI (Vapnik, 2006)Many Application Settings for LUPILUPI and SVM+ (Vapnik 2006)LUPI Formalization and ChallengesSVM+ Technology for LUPIIllustration of SVM+ (hidden info = group label)Two Objectives of SVM+ LearningSVM+ Formulation akaSVM+ FormulationApplication Study (Liang and Cherkassky, 2007)fMRI Application Study (cont’d)Multi-Task Learning (MTL)Multi-task LearningContrast Inductive Learning, MTL and LWSDDifferent Ways of Using Group InformationProblem setting for MTLSVM+ for Multi-Task LearningSVM+MTL FormulationEmpirical ValidationSlide 87Synthetic DataExperimental ResultsSlide 90Slide 91Advantages/Limitations of nonstandard settingsVapnik’s ImperativeCourse Summary: Knowledge Discovery and Philosophical ConnectionsScientific KnowledgeThe Nature of KnowledgeDigital Age(Over) Promise of ScienceClassical StatisticsPhilosophical Views on SciencePredictive Data Modeling PhilosophyPredictive Modeling: Technical AspectsNon-Technical AspectsPredictive Learning and HumanitiesReferences1Introduction toPredictive LearningElectrical and Computer EngineeringLECTURE SET 9Nonstandard Learning Approaches2OUTLINE•Motivation for non-standard approaches- Learning with sparse high-dimensional data- Formalizing application requirements- Philosophical motivation•New Learning Settings- Transduction- Universum Learning- Learning Using Privileged Information- Multi-Task Learning•Summary3Sparse High-Dimensional Data •Recall standard inductive learning•High-dimensional, low sample size (HDLSS) data: • Gene microarray analysis• Medical imaging (i.e., sMRI, fMRI)• Object and face recognition• Text categorization and retrieval• Web search•Sample size is smaller than dimensionality of the input space, d ~ 10K–100K, n ~ 100’s•Standard learning methods usually fail for such HDLSS data.4Sparse High-Dimensional Data•HDLSS data looks like a porcupine: the volume of a sphere inscribed in a d-dimensional cube gets smaller as the volume of d-cube gets larger !•A point is closer to an edge than to another point•Pairwise distances between points are the same5How to improve generalization for HDLSS?Conventional approaches use Standard inductive learning + a priori knowledge:•Preprocessing and feature selection (preceding learning)•Model parameterization (~ selection of good kernels)•Informative prior distributions (in statistical methods)Non-standard learning formulations•Seek new generic formulations (not methods!) that better reflect application requirements•A priori knowledge + additional data are used to derive new problem formulations6Formalizing Application Requirements•Classical statistics: parametric model is given (by experts)•Modern applications: complex iterative process Non-standard (alternative) formulation may be better!APPLICATION NEEDSLossFunctionInput, output,other variablesTraining/test dataAdmissibleModelsFORMAL PROBLEM STATEMENTLEARNING THEORY7Philosophical Motivation•Philosophical view 1 (Realism): Learning ~ search for the truth (estimation of true dependency from available data)System identification ~ Inductive Learningwhere a priori knowledge is about the true model8Philosophical Motivation (cont’d)•Philosophical view (Instrumentalism): Learning ~ search for the instrumental knowledge (estimation of useful dependency from available data)VC-theoretical approach ~ focus on learning formulation9VC-theoretical approach•Focus on the learning setting (formulation), not on a learning method•Learning formulation depends on:(1) available data(2) application requirements(3) a priori knowledge (assumptions)•Factors (1)-(3) combined using Vapnik’s Keep-It-Direct (KID) Principle yield a learning formulation10Contrast these two approaches•Conventional (statistics, data mining): a priori knowledge typically reflects properties of a true (good) model, i.e.a priori knowledge ~ parameterization•Why a priori knowledge is about the true model?•VC-theoretic approach:a priori knowledge ~ how to use/ incorporate available data into the problem formulationoften a priori knowledge ~ available data samples of different type new learning settings),( wf x11Examples of Nonstandard Settings •Standard Inductive setting, e.g., digits 5 vs. 8Finite training set Predictive model derived using only training dataPrediction for all possible test inputs•Possible modifications- Transduction: Predict only for given test points - Universum Learning: available labeled data ~
View Full Document