DOC PREVIEW
U of I CS 466 - Lecture

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Support Vector Machines andGene Function PredictionBrown et al. 2000 PNAS.CS 466Saurabh SinhaOutline• A method of functionally classifyinggenes by using gene expression data• Support vector machines (SVM) usedfor this task• Tests show that SVM performs betterthan other classification methods• Predict functions of some unannotatedyeast genesMotivation for SVMsUnsupervised Learning• Several ways to learn functional classificationof genes in an unsupervised fashion• “Unsupervised learning” => learning in theabsence of a teacher• Define similarity between genes (in terms oftheir expression patterns)• Group genes together using a clusteringalgorithm, such as hierarchical clusteringSupervised Learning• Support vector machines belong to the classof “supervised learning” methods• Begin with a set of genes that have acommon function (the “positive set”)• … and a separate set of genes known not tobe members of that functional class (the“negative set”)• The positive and negative sets form the“training data”– Training data can be assembled from the literatureon gene functionsSupervised learning• The SVM (or any other supervised learner)will learn to discriminate between the genesin the two classes, based on their expressionprofiles• Once learning is done, the SVM may bepresented with previously unseen genes(“test data”)• Should be able to recognize the genes asmembers or non-members of the functionalclassSVM versus clustering• Both use the notion of “similarity” or“distance” between pairs of genes• SVMs can use a larger variety of suchdistance functions– Distance functions in very high-dimensional space• SVMs are a supervised learning technique,can use prior knowledge to good effectData sets analyzedData sets• DNA microarray data• Each data point in a microarray experiment isa ratio:– Expression level of the gene in the condition of theexperiment– Expression level of the gene in some referencecondition• If considering m microarray experiments,each gene’s expression profile is an m-dimensional vector• Thus, complete data is n x m matrix (n =number of genes)Data sets• “Normalization” of data• Firstly, take logarithms of all values inthe matrix (positive for over-expressedgenes, negative for repressed genes)• Then, transform each gene’sexpression profile into a unit lengthvector• How?Data sets• n = 2467, m = 79.• Data from earlier paper on geneexpression measurement, by Eisen et al.• What were the 79 conditions?– Related to cell cycle, “sporulation”,temperature shocks, etc.Data sets• Training sets have to include functionallabeling (annotation) of genes• Six functional classes taken from a geneannotation database for yeast• One of these six was a “negative control”– No reason to believe that members of this classwill have similar expression profiles– This class should not be “learnable”Support Vector MachineSVM intro• Each vector (row) X in the gene expressionmatrix is a point in an m-dimensional “expressionspace”• To build a binary classifier, the simple thing to dois to construct a “hyperplane” separating classmembers from non-members– What is a hyperplane?– Line for 2-D, plane for 3-D, … hyperplane for any-DInseparability• Real world problems: there may not exist ahyperplane that separates cleanly• Solution to this “inseparability” problem: mapdata to higher dimensional space– Example discussed in class (mapping from 2-Ddata to 3-D)– Called the “feature space”, as opposed to theoriginal “input space”– Inseparable training set can be made separablewith proper choice of feature spaceGoing to high-d spaces• Going to higher-dimensional spacesincurs costs• Firstly, computational costs• Secondly, risk of “overfitting”• SVM handles both costs.SVM handles high-D problems• Overfitting is avoided by choosing“maximum margin” separatinghyperplane.• Distance from hyperplane to nearestdata point is maximizedSVM handles high-D problems• Computational costs avoided becauseSVM never works explicitly in thehigher-dimensional feature space– The “kernel trick”The kernel trick in SVM• Recall that SVM works with some distancefunction between any two points (geneexpression vectors)• To do its job, the SVM only needs “dotproducts” between points• A possible dot product between input vectorsX and Y is– represents the cosine of the angle between thetwo vectors (if each is of unit length)! XiYi"The kernel trick in SVM• The “dot product” may be taken in a higherdimensional space (feature space) and theSVM algorithm is still happy– It does NOT NEED the actual vectors in thehigher-dimensional (feature) space, just their dotproduct• The dot product in feature space is somefunction of the original vectors X and Y, and iscalled the “kernel function”Kernel functions• A simple kernel function is the dotproduct in the input space• The feature space = …• … the input space• Another kernel:– A quadratic separating surface in the inputspace (a separating hyperplane in somehigher dimensional feature space)! ( XiYi)2"Soft margins• For some data sets, SVM may not find aseparating hyperplane even in the higherdimensional feature space• Perhaps the kernel function is not properlychosen, or data contains mislabeledexamples• Use a “soft margin”: allow some trainingexamples to fall on the “wrong” side of theseparating hyperplane• Have some penalty for such wrongly placedexamplesData analysisData recap• 79 conditions (m), ~2500 genes (n), sixfunctional classes• Test how well each of the functionalclasses can be learned and predicted• For each class, test separately; treat itas positive, everything else as negativeThree-way cross validation• Divide all positive examples into three equalsets, do the same for all negative examples• Take two sets of positives, two sets ofnegatives, and train the classifier• Present the remaining (one set each of)positives and negatives as “test data” andcount how often the classifications werecorrectMeasures of accuracy• False positives, True positives• False negatives, True negatives• This paper uses cost function “C(M)” oflearning method M as:• C(M) = FP(M) + 2*FN(M)• Ad hoc choice• This paper defines “cost savings” of methodM as• S(M) = C(N) - C(M), where N is the “null learningprocedure” (call everything


View Full Document

U of I CS 466 - Lecture

Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?