DOC PREVIEW
UW-Madison ECE 539 - Predicting body weight in chicken using SNP markers

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Title pageFinal ProjectBackgroundData and objectiveMethodsTuning parametersBaseline model and model comparisonResultsOptimal numbers of SNPs and centers for RBFOptimal number of SNPs for LMPrediction error on test dataECE 539, Fall 2008, Instructor: Prof. Yu Hen Hu Final ProjectPredicting body weight in chicken using SNP markers :application of generalized radial basis functionsNanye LongDept. of Animal [email protected] 11, 20081 / 8BackgroundBackgroundGenetic variants (e.g., single nucleotide polymorphisms, or SNPs) candetermine variation of phenotypes.Standard linear model assumes linear and additive relationshipbetween genetic variants and phenotype, which may not beappropriate.We need a tool to do non-linear mapping.2 / 8Data, objectiveHow do SNPs determine chicken body weight?d1702... … … … … …10170▪▪▪▪▪▪... … … … … …▪▪▪▪▪▪▪▪▪d20... … … … … …112d11... … … … … …201Body weightSNP10000... … … … … …SNP2SNP1ObservationSNP: input features, taking three values: 0 (aa), 1 (Aa), 2 (AA).∼ 10, 000 SNPsBody weight: a continuous outcome, from 170 chickens.Problem formulation: Rp→ R1, where p is number of SNPsWhy RBF?INonlinear approximationICan handle a large number of input features3 / 8Data, objectiveHow do SNPs determine chicken body weight?d1702... … … … … …10170▪▪▪▪▪▪... … … … … …▪▪▪▪▪▪▪▪▪d20... … … … … …112d11... … … … … …201Body weightSNP10000... … … … … …SNP2SNP1ObservationSNP: input features, taking three values: 0 (aa), 1 (Aa), 2 (AA).∼ 10, 000 SNPsBody weight: a continuous outcome, from 170 chickens.Problem formulation: Rp→ R1, where p is number of SNPsWhy RBF?INonlinear approximationICan handle a large number of input features3 / 8Methods Tuning parametersTuning parameters: m and kGeneralized Gaussian RBF:F (x) =kXi=1wiϕ(kx − tik), ϕ(kx − tik) = exp(−kx − tik2/2σ2)The number of basis functions < the number of data pointsm, number of SNPs used to computing distance in ϕRank all SNPs by ANOVA p-values, select top 25, 30, . . . , 100k, number of centers ti, i = 1, 2, . . . , kk-medoids clustering, try k = 2, 3, . . . , 100σ =dmax√2k, dmax: maximum distance between chosen centersChoose optimal values for m and k by 10-fold cross validation4 / 8Methods Tuning parametersTuning parameters: m and kGeneralized Gaussian RBF:F (x) =kXi=1wiϕ(kx − tik), ϕ(kx − tik) = exp(−kx − tik2/2σ2)The number of basis functions < the number of data pointsm, number of SNPs used to computing distance in ϕRank all SNPs by ANOVA p-values, select top 25, 30, . . . , 100k, number of centers ti, i = 1, 2, . . . , kk-medoids clustering, try k = 2, 3, . . . , 100σ =dmax√2k, dmax: maximum distance between chosen centersChoose optimal values for m and k by 10-fold cross validation4 / 8Methods Tuning parametersTuning parameters: m and kGeneralized Gaussian RBF:F (x) =kXi=1wiϕ(kx − tik), ϕ(kx − tik) = exp(−kx − tik2/2σ2)The number of basis functions < the number of data pointsm, number of SNPs used to computing distance in ϕRank all SNPs by ANOVA p-values, select top 25, 30, . . . , 100k, number of centers ti, i = 1, 2, . . . , kk-medoids clustering, try k = 2, 3, . . . , 100σ =dmax√2k, dmax: maximum distance between chosen centersChoose optimal values for m and k by 10-fold cross validation4 / 8Methods Tuning parametersTuning parameters: m and kGeneralized Gaussian RBF:F (x) =kXi=1wiϕ(kx − tik), ϕ(kx − tik) = exp(−kx − tik2/2σ2)The number of basis functions < the number of data pointsm, number of SNPs used to computing distance in ϕRank all SNPs by ANOVA p-values, select top 25, 30, . . . , 100k, number of centers ti, i = 1, 2, . . . , kk-medoids clustering, try k = 2, 3, . . . , 100σ =dmax√2k, dmax: maximum distance between chosen centersChoose optimal values for m and k by 10-fold cross validation4 / 8Methods Baseline, model comparisonBaseline, model comparisonBaseline model: linear re gression on SNPsF (xj) = xTjβIxTj: input SNP vector of the jth chicken, whose length also tunedIβ: a vector of weights associated with each SNPComparison between RBF and linear model: prediction on test setRep 1 Rep 2 Rep 3 Rep 4 Rep 5Test setTraining set, find optimal m and k via 10-fold CV5 / 8Methods Baseline, model comparisonBaseline, model comparisonBaseline model: linear re gression on SNPsF (xj) = xTjβIxTj: input SNP vector of the jth chicken, whose length also tunedIβ: a vector of weights associated with each SNPComparison between RBF and linear model: prediction on test setRep 1 Rep 2 Rep 3 Rep 4 Rep 5Test setTraining set, find optimal m and k via 10-fold CV5 / 8Results Optimal numbers of SNPs and centers for RBFRBF: optimal values for m and k0 20 40 60 80 10055 60 65 70 75 80Replication 1CV errorNumber of centers0 20 40 60 80 10060 70 80 90Replication 2CV errorNumber of centers0 20 40 60 80 10060 65 70 75 80Replication 3CV errorNumber of centers0 20 40 60 80 10055 60 65 70Replication 4CV errorNumber of centers0 20 40 60 80 10060 80 100 120 140 160 180Replication 5CV errorNumber of centers253035404550556065707580859095100Number of SNPs6 / 8Results Optimal number of SNPs for LMLinear model: optimal numbers of SNPs●●●●●●●●●●●●●●●●40 60 80 100100 120 140 160 180Replication 1CV errorNumber of SNPs●●●●●●●●●●●●●●●●40 60 80 100100 120 140 160Replication 2CV errorNumber of SNPs●●●●●●●●●●●●●●●●40 60 80 10090 100 110 120 130 140 150Replication 3CV errorNumber of SNPs●●●●●●●●●●●●●●●●40 60 80 100100 120 140 160Replication 4CV errorNumber of SNPs●●●●●●●●●●●●●●●●40 60 80 100100 120 140 160Replication 5CV errorNumber of SNPs7 / 8Results Prediction error on test dataPrediction error on test dataTable: Mean squared errors of test data predicted by RBF and linear model.Rep etit ion 1 2 3 4 5RBF 73.67 64.17 47.47 71.45 40.37Linear model 111.35 83.67 90.10 106.58 56.43RBF model exceeds substantially the linear model in generalizationperformance.“Optimal” values for m and k: w.r.t a range of reasonablecandidates. May not be truly optimal.8 /


View Full Document

UW-Madison ECE 539 - Predicting body weight in chicken using SNP markers

Documents in this Course
Load more
Download Predicting body weight in chicken using SNP markers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predicting body weight in chicken using SNP markers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predicting body weight in chicken using SNP markers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?