Unformatted text preview:

Introduction to Predictive LearningOUTLINESparse High-Dimensional DataSparse High-Dimensional DataHow to improve generalization for HDLSS?Formalizing Application RequirementsPhilosophical MotivationPhilosophical Motivation (cont’d)VC-theoretical approachContrast these two approachesExamples of Nonstandard SettingsSVM-style Framework for New SettingsSlide 13Transduction (Vapnik, 1982, 1995)Transduction vs InductionTransduction based on size of marginMargin-based Local LearningSlide 18Loss function for unlabeled samplesOptimization formulation for SVM transductionOptimization formulation (cont’d)Many applications for transductionExample applicationSemi-Supervised Learning (SSL)SSL and Cluster AssumptionToy Example for Text ClassificationSelf-Learning Method (example of SSL)Illustration (using 1-nearest neighbor classifier)Illustration (after 50 iterations)Illustration (after 100 iterations)Comparison: SSL vs T-SVMComparison 1: SSL vs T-SVM and SVMComparison 2: SSL vs T-SVM and SVMExplanation of T-SVM for digits data setExplanation of T-SVM (cont’d)Universum Learning (Vapnik 1998, 2006)Cultural Interpretation of the UniversumSlide 38Cultural Interpretation of the UniversumMore on Marcel DuchampMain Idea of Universum LearningLearning with the UniversumInference through contradictionsUniversum SVM Formulation (U-SVM)-insensitive loss for Universum samplesApplication Study (Vapnik, 2006)Universum U3 via random averaging (RA)Random Averaging for digits 5 and 8Application Study: gender of human facesMale Faces: examplesFemale Faces: examplesUniversum Faces: neither male nor femaleEmpirical Study (Bai and Cherkassky 2008)Empirical Study (cont’d)Universum generation: examplesResults of gender classificationSlide 57Random Averaging UniversumHistogram of projections and RA UniversumSlide 60Slide 61Example: Handwritten digits 5 vs 8Histogram of projections for MNIST dataAnalyzing ‘other digits’ UniversumDiscussionLearning Using Privileged Info (LUPI)Slide 67LUPI (Vapnik, 2006)Many Application Settings for LUPILUPI and SVM+ (Vapnik 2006)LUPI Formalization and ChallengesSVM+ Technology for LUPIIllustration of SVM+ (hidden info = group label)Two Objectives of SVM+ LearningSVM+ Formulation akaSVM+ FormulationApplication Study (Liang and Cherkassky, 2007)fMRI Application Study (cont’d)Multi-Task Learning (MTL)Multi-task LearningContrast Inductive Learning, MTL and LWSDDifferent Ways of Using Group InformationProblem setting for MTLSVM+ for Multi-Task LearningSVM+MTL FormulationEmpirical ValidationSlide 87Synthetic DataExperimental ResultsSlide 90Slide 91Advantages/Limitations of nonstandard settingsVapnik’s ImperativeCourse Summary: Knowledge Discovery and Philosophical ConnectionsScientific KnowledgeThe Nature of KnowledgeDigital Age(Over) Promise of ScienceClassical StatisticsPhilosophical Views on SciencePredictive Data Modeling PhilosophyPredictive Modeling: Technical AspectsNon-Technical AspectsPredictive Learning and HumanitiesReferences1Introduction toPredictive LearningElectrical and Computer EngineeringLECTURE SET 9Nonstandard Learning Approaches2OUTLINE•Motivation for non-standard approaches- Learning with sparse high-dimensional data- Formalizing application requirements- Philosophical motivation•New Learning Settings- Transduction- Universum Learning- Learning Using Privileged Information- Multi-Task Learning•Summary3Sparse High-Dimensional Data •Recall standard inductive learning•High-dimensional, low sample size (HDLSS) data: • Gene microarray analysis• Medical imaging (i.e., sMRI, fMRI)• Object and face recognition• Text categorization and retrieval• Web search•Sample size is smaller than dimensionality of the input space, d ~ 10K–100K, n ~ 100’s•Standard learning methods usually fail for such HDLSS data.4Sparse High-Dimensional Data•HDLSS data looks like a porcupine: the volume of a sphere inscribed in a d-dimensional cube gets smaller as the volume of d-cube gets larger !•A point is closer to an edge than to another point•Pairwise distances between points are the same5How to improve generalization for HDLSS?Conventional approaches use Standard inductive learning + a priori knowledge:•Preprocessing and feature selection (preceding learning)•Model parameterization (~ selection of good kernels)•Informative prior distributions (in statistical methods)Non-standard learning formulations•Seek new generic formulations (not methods!) that better reflect application requirements•A priori knowledge + additional data are used to derive new problem formulations6Formalizing Application Requirements•Classical statistics: parametric model is given (by experts)•Modern applications: complex iterative process Non-standard (alternative) formulation may be better!APPLICATION NEEDSLossFunctionInput, output,other variablesTraining/test dataAdmissibleModelsFORMAL PROBLEM STATEMENTLEARNING THEORY7Philosophical Motivation•Philosophical view 1 (Realism): Learning ~ search for the truth (estimation of true dependency from available data)System identification ~ Inductive Learningwhere a priori knowledge is about the true model8Philosophical Motivation (cont’d)•Philosophical view (Instrumentalism): Learning ~ search for the instrumental knowledge (estimation of useful dependency from available data)VC-theoretical approach ~ focus on learning formulation9VC-theoretical approach•Focus on the learning setting (formulation), not on a learning method•Learning formulation depends on:(1) available data(2) application requirements(3) a priori knowledge (assumptions)•Factors (1)-(3) combined using Vapnik’s Keep-It-Direct (KID) Principle yield a learning formulation10Contrast these two approaches•Conventional (statistics, data mining): a priori knowledge typically reflects properties of a true (good) model, i.e.a priori knowledge ~ parameterization•Why a priori knowledge is about the true model?•VC-theoretic approach:a priori knowledge ~ how to use/ incorporate available data into the problem formulationoften a priori knowledge ~ available data samples of different type  new learning settings),( wf x11Examples of Nonstandard Settings •Standard Inductive setting, e.g., digits 5 vs. 8Finite training set Predictive model derived using only training dataPrediction for all possible test inputs•Possible modifications- Transduction: Predict only for given test points - Universum Learning: available labeled data ~


View Full Document

U of M EE 4389W - Nonstandard Learning Approaches

Documents in this Course
Load more
Download Nonstandard Learning Approaches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Nonstandard Learning Approaches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Nonstandard Learning Approaches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?