DOC PREVIEW
UW-Madison ECE 539 - Analyzing Promoter Sequences with Multilayer Perceptrons

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Analyzing Promoter Sequences with Multilayer PerceptronsBackground (DNA)Slide 3BackgroundProblemOne SolutionNeural Network ConfigurationSlide 8Neural Network InputsNeural Network TrainingHidden Nodes vs. Classification RateScaled Input vs. Classification RateCompared to OthersConclusionReferencesAnalyzing Promoter Sequences with Multilayer PerceptronsGlenn WalkerECE 539Background (DNA)Deoxyribonucleic acid (DNA) is a long molecule made up of combinations of four smaller molecules (base pairs): adenine (A), cytosine (C), guanine (G), thymine (T). These four molecules are combined in an order unique to each living organism. The order of the molecules contains the information to make all the parts necessary for any organism to survive.AGTCAATTGAGACCGATTAGAGATTTCAGTTAACTCTGGCTAATCTCTAADNA is two-stranded and complementaryBackground (DNA)Genes are sections of DNA that can contain from a few hundredbase-pairs to tens of thousands. Genes contain instructions onhow to make proteins -- molecules necessary for building andmaintaining organisms.Three different genes on piece of DNA“junk” DNABackgroundPromoters are sequences of DNA to which RNA polymerase canbind and begin transcription of a gene. Transcription is the process of making a complementary copy of the DNA which is thentranslated into a protein.promotersequenceactual gene informationRNA polymerase binds hereand begins transcriptionProblem•Knowing gene locations is desirable for medical reasons•One way to find genes is to look for promoter regions•How do we find promoter regions?One Solution•Promoter regions are highly conserved -- different regions often contain similar patterns•We can train neural networks to recognize promoter regions•We choose a multilayer perceptronNeural Network Configuration•The multilayer perceptron (MLP) is a very common neuralnetwork configuration•We used a MLP with 3 layers -- an input, output, and hiddenlayerNumber of: Inputs Hidden Output1115/584,8,16,20,24,28,32Neural Network Configuration•Two ways of presenting input were tried -- one used 58 inputs and the other 115•Different numbers of hidden nodes were tried to find the optimally structured neural network•Only one output was used to indicate whether the input was a promoter sequence or not (1 or 0, respectively)Neural Network InputsThe inputs consisted of 106 sets of 57 bases of DNA. 53 werepromoters and 53 were not. One of the input promoter sequences:TACTAGCAATACGCTTGCGTTCGGTGGTTAAGTATGTATAATGCGCGGGCTTGTCGTThe input was presented to the neural network in two ways:A 00C 01G 10T 11A 0.2C 0.4G 0.6T 0.8114 inputneurons57 inputneuronsNeural Network TrainingEach configuration was run 10 times. Within each of the 10 runs, 106 runs were performed. For each of these, 105 of the promoter sequences were used for training with the 106th used for testing. The testing sequences were changed for each of the 106 runs so that each sequence was the test sequence only once.Ten runs were necessary since weights for the MLP were initialized to random values which might have led to different classifications for the same input sequence.Hidden Nodes vs. Classification Rate646668707274767880824 8 12 16 20 24 28 32Hidden NodesClassification RateScaled Input vs. Classification Rate646668707274767880824 8 1 2 16 20 24 28 32Hidden NodesClassification RateCompared to OthersWalker (NN) 78%O’Neil (NN) 83%Towell (KBANN) > 90%O’Neil (Rule-based) 70%ID3 (Decision tree) 76%Conclusion•Not the best but not the worst•Using a hybrid technique would improve results•The MLP is a very useful tool for the field of bioinformaticsReferencesHarley, C. B. and Reynolds, R. P. 1987. Analysis of E. coli promoter sequences. Nucleic Acids Research, 15(5):2343-2361.O’Neill, M. C. 1991. Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Research, 19(2):313-318.Quinlan, J. 1986. Induction of decision trees. Machine Learning, 1:81-106.Towell, G. G., Shavlik, J. W., and Noordewier, M. O. 1990. Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks. AAAI-90,


View Full Document

UW-Madison ECE 539 - Analyzing Promoter Sequences with Multilayer Perceptrons

Documents in this Course
Load more
Download Analyzing Promoter Sequences with Multilayer Perceptrons
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analyzing Promoter Sequences with Multilayer Perceptrons and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analyzing Promoter Sequences with Multilayer Perceptrons 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?