CMU CS 10810 - normLec - D2728304

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10810> normLec

DOC PREVIEW

CMU CS 10810 - normLec

School name Carnegie Mellon University

Course Cs 10810- Computational Genomics

Pages 40

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Normalization10-810 /02-710Computational GenomicsGene Expression AnalysisComputational Computational BiologicalBiologicalexperiment experiment selectionselectionarray design, array design, number of repeatsnumber of repeatsExperimental Designdiff. expressed diff. expressed genesgenesnormalization, miss. normalization, miss. value estimationvalue estimationData Analysisfunctional functional assignment, assignment, response response programsprogramsclustering, clustering, classificationclassificationPattern RecognitionModelregulatory regulatory networksnetworksinformation fusioninformation fusionExperiment designA number of computational issues should be addressed:• Selecting short subsequences for oligo arrays to minimize cross hybridizations• Determining the number of replicates for each sample• Sampling rates for time series experimentsTypical experiment: replicateshealthy cancerTechnical replicates: same sample using multiple arraysDye swap: reverse the color code between arraysClinical replicates: samples from different individualsMany experiments have all three kinds of replicatesData analysis• Normalization• Combining results from replicates• Identifying differentially expressed genes• Dealing with missing values• Static vs. time seriesData analysis• Normalization• Combining results from replicates• Identifying differentially expressed genes• Dealing with missing values• Static vs. time seriesNormalizing across arrays• Consider the following two sets of values:Lets put them together …The first step in the analysis of microarray data in a given experiment is to normalize between the different arrays.•Simple assumption: mRNA quantity is the same for all arrays• Where n and T are the total number of genes and arrays, respectfully. Mjis known as the sample mean • Next we transform each value to make all arrays have the same mean:Normalizing between arrays∑∑====TjjnijjMTMynMi1111MMyyjjiji+−=ˆNormalizing the meanVariance normalization• In many cases normalizing the mean is not enough.• We may further assume that the variance should be the same for each array• Implicitly we assume that the expression distribution is the same for all arrays(though different genes may change in each of the arrays) • Here Vjis the sample variance.• Next, we transform each value as follows:∑∑===−=TjjnijjjVTVMynVi1121)(1()jjjijiVVMMyy+−=ˆNormalizing mean and varianceTypical experiment: ratioshealthy cancer• In many experiments we are interested in the ratio between two samples• For example- Cancer vs. healthy- Progression of disease (ratio to time point 0)Transformation• While ratios are useful, they are not symmetric.• If R = 2*G then R/G = 2 while G/R = ½• This makes it hard to visualize the different changes• Instead, we use a log transform, and focus on the log ratio: • Empirical studies have also shown that in microarray experiments the log ratio of (most) genes tends to be normally distributediiiiiGRGRy logloglog −==Normalizing between array: Locally weighted linear regression• Normalizing the mean and the variance works well if the variance is independent of the measured value.• However, this is not the case in gene expression.• For microarrays it turns out that the variance is value dependent.Locally weighted linear regression• Instead of computing a single mean and variance for each array, we can compute different means and variances for different expression values.• Given two arrays, R and G we plot on the x axis the (log) of their intensity and on the y axis their ratio• We are interested in normalizing the average (log) expression ratio for the different intensity valuesComputing local mean and variance• Settingmay work, however, it requires that many genes have the same x value, which is usually not the case• Instead, we can use a weighted sum where the weight is propotional to the distance of the point from x:∑∑==−==xxixxiiixmykxvykxm2))((1)(1)(∑∑∑∑−==iiiiiiiiiixmyxwxwxvyxwxwxm2))()(()(1)()()(1)(())()()()(ˆxvVMxmxyxy+−=Determining the weights• There are a number of ways to determine the weights• Here we will use a Gaussian centered at x, such thatσ2is a parameter that should be selected by the usereixxixw222)(21)(σσπ−=Locally weighted regression: ResultsOriginal valuesnormalized valuesOther normalization methods• If you are not comfortable with the equal mRNA assumption, thereare other possible normalization methods:• We can use genes known as ‘house keeping genes’. These genes are assumed to be expressed at similar levels regardless of the condition the cell is in.• Alternatively, we can use ‘controls’ . These are sequences that are manually inserted into the sample with known quantities (this is mainly useful for oligo arrays).Using spike controls• Suppose we have m raw measurements of spiked controls per chip and T chip experiments altogether• We need to construct a model over these observationsthat disentangles the experiment dependent scaling and theunderlying (supposedly fixed) control levelsxmT………xm1..........x1T……….x11We can try to learn the parameters of a model that attempts to disentangles the experiment dependent scaling and the underlying(fixed) control levels :Here:• xijis the j’th measurement for control i• miis the fixed control amount• rjis the unknown experiment dependent scaling• eijis random multiplicative noiseDetermining the underlying expression111iiiermx =...TiTiTiermx =Log-transform all the variablesAfter the transformation we can express the model in the simple formObservation = Model + noiseLog transform111iiiyερµ++=11111log,log,log,log1iiiiiermxyi====ερµ11loglogloglog1iiiermx ++=Noise model• We make some additional assumptions about the model• Noise (ε) is independent across controls / experiments• The noise is Gaussian (original multiplicative noise is log-normal)• The noise variance does not depend on the experiment but may depend on the specific spiked control),0(~,2111iiiiNyiσεερµ++=Maximum likelihood estimate• Maximum likelihood estimate (MLE) is a general and powerful techniques for fitting parameters of a probabilistic model.• Given a parametric model (for example, Gaussian noise) and observed data, we look for the set of parameters (in our case, mean and variance) that maximize the likelihood of the model.• If we observe data D, then we look for parameters that

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

CMU CS 10810 - normLec

Sign up for free to view:

Please select your school