DOC PREVIEW
U of I CS 466 - Neural Networks for Protein Structure Prediction

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Neural Networks for ProteinStructure PredictionBrown, JMB 1999CS 466Saurabh SinhaOutline• Goal is to predict “secondary structure”of a protein from its sequence• Artificial Neural Network used for thistask• Evaluation of prediction accuracyWhat is Protein Structure?http://academic.brooklyn.cuny.edu/biology/bio4fv/page/3d_prot.htmhttp://matcmadison.edu/biotech/resources/proteins/labManual/images/220_04_114.pngProtein Structure• An amino acid sequence “folds” into acomplex 3-D structure• Finding out this 3-D structure is acrucial and challenging task• Experimental methods (e.g., X-raycrystallography) are very tedious• Computational predictions are apossibility, but very difficultWhat is “secondary structure”?http://www.wiley.com/college/pratt/0471393878/student/structure/secondary_structure/secondary_structure.gif“Strand” “Helix”http://www.npaci.edu/features/00/Mar/protein.jpg“Strand”“Helix”Secondary structure prediction• Well, the whole 3-D “tertiary” protein structuremay be hard to predict from sequence• But can we at least predict the secondarystructural elements such as “strand”, “helix” or“coil”?• This is what this paper does• .. and so do many other papers (it is a hardproblem !)A survey of structure prediction• The most reliable technique is “comparativemodeling”– Find a protein P whose amino acid sequence isvery similar to your “target” protein T– Hope that this other protein P does have a knownstructure– Predict a similar structure similar to that of P, aftercarefully considering how the sequences of P andT differA survey of structure prediction• Comparative modeling fails if we don’t have asuitable homologous “template” protein P for ourprotein T• “Ab initio” tertiary methods attempt to predict thestructure without using a protein structure– Incorporate basic physical and chemical principles into thestructure calculation– Gets very hairy, and highly computationally intensive• The other option is prediction of secondary structureonly (i.e., making the goal more modest)– These may be used to provide constraints for tertiarystructure predictionSecondary structure prediction• Early methods were based on stereochemicalprinciples• Later methods realized that we can do betterif we use not only the one sequence T (oursequence), but also a family of “relatedsequences”• Search for sequences similar to T, build amultiple alignment of these, and predictsecondary structure from the multiplealignment of sequenceWhat’s multiple alignmentdoing here ?• Most conserved regions of a proteinsequence are either functionally important orburied in the protein “core”• More variable regions are usually on surfaceof the protein,– there are few constraints on what type of aminoacids have to be here (apart from bias towardshydrophilic residues)• Multiple alignment tells us which portions areconserved and which are nothttp://bio.nagaokaut.ac.jp/~mbp-lab/img/hpc.pnghydrophobic coreWhat’s multiple alignmentdoing here ?• Therefore, by looking at multiple alignment,we could predict which residues are in thecore of the protein and which are on thesurface (“solvent accessibility”)• Secondary structure then predicted bycomparing the accessibility patternsassociated with helices, strands etc.• This approach (Benner & Gerloff) mostlymanual• Today’s paper suggest an automated methodThe PSI-PRED algorithm• Given an amino-acid sequence, predictsecondary structure elements in the protein• Three stages:1. Generation of a sequence profile (the“multiple alignment” step)2. Prediction of an initial secondary structure(the neural network step)3. Filtering of the predicted structure (anotherneural network step)Generation of sequence profile• A BLAST-like program called “PSI-BLAST”used for this step• We saw BLAST earlier -- it is a fast way tofind high scoring local alignments• PSI-BLAST is an iterative approach– an initial scan of a protein database using thetarget sequence T– align all matching sequences to construct a“sequence profile”– scan the database using this new profile• Can also pick out and align distantly relatedprotein sequences for our target sequence TThe sequence profile looks like this• Has 20 x M numbers• The numbers are log likelihood of each residue at each positionPreparing for the second step• Feed the sequence profile to an artificialneural network• But before feeding, do a simply“scaling” to bring the numbers to 0-1scale! x "11+ e#xIntro to Neural nets(the second and third steps ofPSIPRED)Artificial Neural Network• Supervised learning algorithm• Training examples. Each example has alabel– “class” of the example, e.g., “positive” or“negative”– “helix”, “strand”, or “coil”• Learns how to predict the class of anexampleArtificial Neural Network• Directed graph• Nodes or “units” or “neurons”• Edges between units• Each edge has a weight (not known apriori)Layered ArchitectureInput here is a four-dimensional vector. Each dimension goesinto one input unithttp://www.akri.org/cognition/images/annet2.gifLayered Architecturehttp://www.geocomputation.org/2000/GC016/GC016_01.GIF(units)What a unit (neuron) does• Unit i receives a total input xi from theunits connected to it, and produces anoutput yi = fi(xi) where fi() is the “transferfunction” of unit i! xi= wijyj+ wij "N #{i}$yi= fi(xi) = fiwijyj+ wij "N #{i}$% & ' ' ( ) * * wi is called the “bias” of the unitWeights, bias and transfer functionUnit takes n inputsEach input edge has weight wiBias bOutput aTransfer function f()Linear, Sigmoidal, or otherWeights, bias and transfer function• Weights wij and bias wi of each unit are“parameters” of the ANN.– Parameter values are learned from input data• Transfer function is usually the same forevery unit in the same layer• Graphical architecture (connectivity) isdecided by you.– Could use fully connected architecture: all units inone layer connect to all units in “next” layerWhere’s the algorithm?• It’s in the training of parameters !• Given several examples and their labels: thetraining data• Search for parameter values such that outputunits make correct predictions on the trainingexamples• “Back-propagation” algorithm– Read up more on neural nets if you are interestedBack to PSIPRED …Step 2• Feed the sequence profile to the input


View Full Document

U of I CS 466 - Neural Networks for Protein Structure Prediction

Download Neural Networks for Protein Structure Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Neural Networks for Protein Structure Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Neural Networks for Protein Structure Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?