DOC PREVIEW
CMU CS 10810 - Differentially Expressed Genes

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

10-810: Advanced Algorithms and Models for Computational BiologyDifferentially Expressed GenesData analysis• Normalization• Combining results from replicates• Identifying differentially expressed genes• Dealing with missing values• Static vs. time seriesMotivation• In many cases, this is the goal of the experiment.• Such genes can be key to understanding what goes wrong / or get fixed under certain condition (cancer, stress etc.).• In other cases, these genes can be used as ‘features’ for a classifier.• These genes can also serve as a starting point for a model for the system being studied (e.g. cell cycle, phermone response etc.).Problems• As mentioned in the previous lecture, differences in expressionvalues can result from many different noise sources.• Our goal is to identify the ‘real’ differences, that is, differences that can be explained by the various errors introduced during the experimental phase.• Need to understand both the experimental protocol and take into account the underlying biology / chemistryHypothesis testing• A general way of identifying differentially expressed genes is by testing two hypothesis• Let gAdenote the mean expression of gene g under condition A (say healthy) and gBbe the mean expression under condition B (cancer). • In this case we can test the following hypotheses:H0(or the null hypothesis): gA= gBH1(or the alternative hypothesis): gA≠ gB• If we reject H0 then gene g has a different mean under the two conditions, and so is differentially expressedP-value• Using hypothesis testing we need determine our confidence in theresulting decision• This is done using a test statistics which indicates how strongly the data we observe supports our decision• A p-value (or probability value) measures how likely it is to see the data we observed under the null hypothesis• Small p-values indicate that it is very unlikely that the data was generated according to the null hypothesisExample: Measurements for one gene in 40 (20+20) experiments of two conditionsHypothesis testing: Log likelihood ratio test• If we have a probabilistic model for gene expression we can compute the likelihood of the data given the model.• In our case, lets assume that gene expression is normally distributed with different mean under the different conditions and the same variance.• Thus for the alterative hypothesis we have:and for the null hypothesis we have:• We can compute the estimated means and variance from the data (and thus we will be using the sample mean and sample variance)),(~),(~22σµσµBBAANyNy),(~),(~22σµσµNyNyBAExample meanBlue mean: -0.81Red mean: 0.84Combined mean: 0.02Data likelihood• Given our model, the likelihood of the data under the two hypothesis is:• We can also compute the ratio of the likelihoods (L(1)/L(0))• Intuitively, the higher this ratio the more likely it is that the data was indeed generated according to the alternative hypothesis (and thus the genes are differentially expressed).22222)(2)(2121)0(σµσµσπσπ−−∈−−∈∏∏=iiyBiyAieeL22222)(2)(2121)1(σµσµσπσπBiAiyBiyAieeL−−∈−−∈∏∏=Log likelihood ratio test• We use the log of the likelihood ratio, and after simplifying arrive it:• T is our test statistics, and in this case can be shown to be distributed as χ2 ∑∑∑∑∈∈∈∈−+−−+−=BiiAiiBiBiAiAiyyyyT2222)()()()(2µµµµDegrees of freedom• We are almost done …• We still need to determine one more value in order to use the test• Degrees of freedom for likelihood ratio tests depends on the difference in the number of free parameters• In this case, our free parameters are the mean and variance• Thus the difference is …• In this case, the difference is 1 (two means vs. one)Example: Log likelihood ratioT = 2*(64.3/37.1) = 3.46D.O.F = 1P-value = 0.06Limitations• We assumed a specific probabilistic model (Gaussian noise) whichmay not actually capture the true noise factors• We may need many replicates to derive significant results• Multiple hypothesis testingMultiple hypothesis testing• A p-value is meaningful when one test is carried out• However, when thousands of tests are being carried out, it is hard to determine the real significance of the results based on the p-value alone.• Consider the following two cases:• We need to correct for the multiple tests we are carrying out!we test 100 geneswe find 10 to be differentially expressed with a p-value < .01we test 1000 geneswe find 10 to be differentially expressed with a p-value < .01Bonferroni Correction • Bonferroni Correction is a simple and widely used method to correct for multiple hypothesis testing• Using this approach, the significance value obtained is divided by the number of tests carried out.• For example, if we are testing 1000 genes and are interested in a (gene specific) p-value of 0.05 we will only select genes with a p-value of 0.05/1000 = 0.00005 = 5*10-5• Motivation: If• ThennHpassesTspecificpiα<)|(0α<)|(0HpassesTsomepiBonferroni Correction• The Bonferroni Correction is very conservative• Using it may lead to missing important genes• Other methods rely on the false discovery rate (FDR) as we discuss for SAMSAM – Significance Analysis of Microarray• Relies on repeats.• Avoid using fold change alone.• Use permutations to determine the false discovery rate.Data• Many gene were assigned negative values• Many where expressed at low levels• Noise is larger for genes expressed at low levels.Relative difference021)()(ˆ)(ˆ)(sisixixid+−=• Where x1and x2are the observed means and s(i) is the observed standard deviation. • S0is chosen so that d(i) is consistent across the different expression levels.Different comparisons of repeated experiments.Identifying differentially expressed genes• Using the normalized d(i) we can detect differentially expressed genes by selecting a cutoff above (or below for negative values) which we will declare this gene to be differentially expressed. • However selecting the cutoff is still a hard problem.• Solution: use the False Discovery Rate (FDR) to choose the best cutoff.False Discovery Rate• Percentage of genes wrongly identifies / total gene identified.• What is the difference between this and a p-value ?P-value: probability under the null hypothesis for observing this valueDetermining the FDR • A permutation based method.• Use all 36 permutations (why 36


View Full Document

CMU CS 10810 - Differentially Expressed Genes

Download Differentially Expressed Genes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Differentially Expressed Genes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Differentially Expressed Genes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?