DOC PREVIEW
UMD CMSC 423 - Epitope Prediction with Sequence and Structure Features

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

BackgroundLearning MethodFeaturesResultsLearning from DiversityEpitope Prediction with Sequence and Structure Featuresusing an Ensemble of Support Vector MachinesRob Patro and Carl KingsfordCenter for Bioinformatics and Computational BiologyUniversity of MarylandNov. 16, 2010NOverviewChallenge: epitope-antibody recognitionSolution: ensemble of support vector machinesITrained with probabilistic extensionIVariety of feature classes : physicochemical properties, stringkernels, structureIPerformance of individual methods and ensembleNProblem OverviewThe ChallengeBinding with linear epitopes“Simpler” sequence → affinity relationThe DetailsMeasure binding affinity aff(pi) ∈ [0, 65536]C+= {pi| aff(pi) ∈ [10000, 65536]}6, 841 bindersC−= {pi| aff(pi) ∈ [0, 1000]}20, 437 non-bindersLearn a function to predict bindingf : P → [0, 1]f (pi) ≥ 0.5 =⇒ p ∈ C+f (pi) < 0.5 =⇒ p ∈ C−NSystem Overview?C-f0f1. . . fM0 0.5 1.0}}C+Individual classifierstrained on various featuresDecision Trees, Boosted / Bagged /Random Forests, Naive Bayes, LogisticRegression, Maximum Entropy Classification,(Balanced) Winnow Classifiers, etc.Support Vector Machines (SVM)Aggregate scores of classifiersProduces prediction for binding classUnlikely Binder Likely BinderNProbabilistic SVMsIdeally we want a confidence in each prediction (Platt:1999)For each prediction, we obtaina posterior probabilityAllows ranking of predictions byposteriorAids in classifier combinationNCombining PredictionsProbabilistic SVMs trainedon various featuresusing various kernels,with various parametersCombined by weighted voting:NChoosing FeaturesTo train SVMs we translate each peptide piinto a feature vector xiGood features are essentialGood features shouldBe discriminativeLead to class separabilityBe efficient to computeReal featuresCapture partial informationSeparate data subsetsAre often complementaryConsider many useful features → predictive powerNWhich Features?SequenceFeaturesPhysico|BiochemicalFeaturesStructureFeaturesk-spectrum kernelmismatch kernelsubstitution kernelstring subsequence kernelsparse spatial sample kernelBLOSUM encodingAAIndex encodingLocal compositionPeptide/Structureshape complementarityNSequence Features (String Kernels)String kernels assign a similarity to a pair of stringsK-spectrum kernelConsider all (K) k-mers that occur in the training setEncode each peptide as a vector v ∈ RKvj=(1 if p contains the jthk-mer0 otherwiseor vjcan encode the frequency of the jthk-mer in the peptide.Other string kernels: Mismatch kernel, Substitution kernel,Restricted gappy kernel, String subsequence kernel, Sparse SpatialSample (SSS) kernelNCompositional FeaturesConsider physicochemicalproperties of each peptidesequenceHydropathy, Antigenicity,Structure preference etc.Average property over entirepeptideMap each peptide to a scalarv ∈ RA/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.53.8 -3.9 1.9 2.8 -.6 -0.8 -0.7 -0.9 -1.3 4.24.5 3.8 1.8 1.9 -4.5 -0.8 -3.2 -1.3 -0.6 2.8Amino Acid Hydropathy0.44}A R N DC QEGH I. . .NLocal Compositional FeaturesPhysicochemical features canbe useful but are globalEpitope is only a subset ofthe peptideConsider a sliding window of a given length wMove window along the peptide from left to rightAverage values over windowConcatenate output to represent the peptideNOrthogonal EncodingOrthonormal representation proposed by Qian1988Map each amino acid aj∈ pito a 20 long bit-vector vjxi= v0v1. . . vk−1for an amino acid of length kNProperty EncodingOrthogonality is not actually important in our applicationReplace the indicator vector by something more informativee.g. a row from a BLOSUM or PAM matrixNAAIndex EncodingThe Amino Acid Index (AAIndex) (Kawashima2008) compiles agrowing list of different phyiscochemical and biochemical propertiesof amino acids . . . 544 to date!Is it possible to make use of all this information?Use non-linear factor matrix of AAIndex (Nanni2010)NStructural FeaturesConsider how well IgG and a peptide “fit” together.NResults Tablevs. ensembleFeatures AUROC AUPR ∆AUROC ∆AUPRk-spectrum 0.85 0.70 -0.043 -0.072Sparse Spatial Sample 0.87 0.73 -0.023 -0.042Nonlinear Fisher Mat. 0.86 0.69 -0.024 -0.082Statistical Analysis Mat. 0.85 0.67 -0.025 -0.102BLOSUM Encoding 0.86 0.70 -0.024 -0.072Local Composition∗0.88 0.74 -0.013 -0.032Structure 0.74 0.53 -0.153 -0.242ensemble 0.893 0.7722ndPlace 0.892 0.766 -0.001 -0.0063rdPlace 0.864 0.691 -0.029 -0.0814thPlace 0.855 0.689 -0.038 -0.083∗using various physicochemical featuresNPerformance Curves ROCNPerformance Curves P/RNConclusionsIMany good features existIThey capture some non-overlapping informationIEnsemble solutions, used properly, are effectiveIStructure features are hard to computeIMuch room for improvement hereISimple features should not be discountedIThe local composition feature was the best single classifierIWe didn’t encounter anyone using this in the literature!NThanksFunding:NIH grant 1R21AI085376 and NSF grant 0849899 to C.K.For many interesting and useful conversations:Geet Duggal, Darya Filippova, Justin Malin, Guillaume Marçais,Saket Navlakha, Emre


View Full Document

UMD CMSC 423 - Epitope Prediction with Sequence and Structure Features

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Epitope Prediction with Sequence and Structure Features
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Epitope Prediction with Sequence and Structure Features and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Epitope Prediction with Sequence and Structure Features 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?