DOC PREVIEW
UW-Madison ECE 539 - Predicting E. Coli Promoters Using SVM

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Predicting E. Coli Promoters Using SVMPurposeDatasetData preprocessingApproachResultsObservation/ConclusionPredicting E. Coli Predicting E. Coli Promoters Using SVMPromoters Using SVMDIEP MAI ([email protected])Course: CS/ECE 539 – Fall 2008Instructor: Prof. Yu Hen HuPurposePurposeBuild and train a SVM system to predict E. Coli promoters based on the given gene sequences.Example:Given a gene sequenceaagcaaagaaatgcttgactctgtagcgggaaggcgtattatgcacaccgccgcgccIs it an E. Coli promoter?For more theoretical information about E. Coli promoter:http://homepages.cae.wisc.edu/~ece539/data/gene/theory.txtDatasetDatasetData file is obtained fromhttp://homepages.cae.wisc.edu/~ece539/data/gene/data.txtDataset information:◦Number of instances: 106◦Attributes:Number of attributes: 57Type: Non-numeric nominal values (A, C, G, or T)◦Classes:Number of classes: 2Type: Positive (+1) or Negative (-1)Data preprocessingData preprocessingRandomly partition the dataset to TRAINSET and TESTSETRatio = TESTSET / (TRAINSET + TESTSET)Encode non-numeric attributesA  00012 = 110C  00102 = 210G  01002 = 410T  10002 = 810Scaling each feature to [-1, 1] to avoid the domination of large on small values.ApproachApproachRBF kernel is used  need to find “good” C (cost) and G (gamma) parameters.Parameter scanning:Set the range of C to [2-15, 25] and G to [2-15, 22]For each pair (C, G), use leave-one-out method to determine the pairs that yield high accuracy rates◦This process is repeated a few times; the pair that "often" produces high accuracy rates is more likely to be selected.Training/Testing:◦Use selected parameters and the whole TRAINSET to train the system.◦Use the trained system to predict the TRAINSET.preferred accuracy rate = 100%◦Use the trained system to predict the TESTSET.ResultsResultsConfiguration:◦Ratio of partitioning dataset = 1/5Split the dataset to 5 roughly equal sets; one is preserved as TESTSET◦K-fold = 15 (15 folds in total)◦Number of repetitions to select paras. = 10After running the system several times:Accuracy rate of the testing processOccr. freq.TRAINSET TESTSETOftenSometimesRare85/85 = 100%85/85 = 100%85/85 = 100%85/85 = 100%85/85 = 100%19/21 = 90.48%18/21 = 85.71%20/21 = 95.23%21/21 = 100%15/21 = 71.43%Training resultAccuracy rate “Best” (C, G)Avg. Best C G84.35%84.23%88.82%83.52%85.88%85.88%90.59%85.88%0.70711.18921.00001.18920.03710.03130.07430.0625Observation/ConclusionObservation/ConclusionSVM:◦For this dataset, the number of attributes is not large, the use of RBF kernel seems appropriate to map the feature to a higher dimensionScanning (C, G) takes a large amount of time. One of approaches to speed up this process:◦Split the range to “large” equal intervals◦Pick the interval that yields high accuracy rates◦Divide this range to smaller equal intervals◦RepeatK-fold method:◦The larger the number of folds is, the more time the process requires◦For this dataset, the number of instances is not large, large numbers of folds seem to work


View Full Document

UW-Madison ECE 539 - Predicting E. Coli Promoters Using SVM

Documents in this Course
Load more
Download Predicting E. Coli Promoters Using SVM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predicting E. Coli Promoters Using SVM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predicting E. Coli Promoters Using SVM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?