Unformatted text preview:

1From Promoter Sequence to Expression: A Probabilistic FrameworkBy Eran Segal, Yoseph Barash, Itamar Simon, Nir Friedman, and Daphne KollerPresented at RECOMB 2002Eugene KeBioinformatics ProgramMay 29, 2002Key PointsBiology is in the post-genomic era.We can sequence the whole genome or DNA library of organisms.The challenge now is to understand how DNA works on a detailed level.This paper attempts to model the mechanics of gene expression. The model is ambitious as it incorporates data from a multiple of experimental sources.2Central Dogma of Molecular BiologyDNA is a long text composed from a 4-letter alphabet (A,C,G,T).Genes are the meaningful portions of DNA.DNA is converted into messenger RNA (mRNA) via transcription.mRNA is used to build proteins, via translation.Proteins perform all the work in the cell.Different cell types perform different functions.Therefore, cell types must have different proteins.Transcription Factors (TFs)Proteins that bind to DNA are called TFs.TF binding must be specific.Where do TFs bind? Before encoding portion of genes.Close enough to affect expression.Sites of binding are called promoters.A promoter region is the sequence before a gene, where promoter(s) are.3Measuring Gene ExpressionExpression level represents the amount of mRNA present in a cell.One DNA array can measure thousands of genes simultaneously.Each array is lined with DNA “probes,” for each specific gene.mRNA from a cell is extracted from cells and placed on array.If a DNA probe responds, corresponding gene is being expressed.Clustering Expression LevelsUsing expression data, cluster similarly expressed genes.Genes probably have related function.After clustering, we can search promoter regions of clusters.Genes in a cluster are affected by same TFs, therefore will havesame promoters.Search promoter regions for similar strings, which is called motifs.A motif is a putative promoter.Identify probable TFs using motifs.4Finding Promoters Via SequenceWe now the entire genome of some organisms.We can search directly for motifs.Step 1:Search promoter regions of known genes, find motifs.Step 2:Group genes by similar motifs.Logic is that if genes are controlled by same TFs, the genes will have similar promoters.Step 3:Using databases of known transcription factors, search for probable matches.Step 4:Experimentally verify using expression levels of multiple TF combinations.Experimentally Finding Binding SitesLocalization arrays measure DNA-protein binding.They are similar to DNA arrays.Run two experimentsRatio of intensities show true binding.However…Only indicates if TF can bind to promoter, not if TF actually does.Very noisy5Authors’ GoalsAnalyze two different types of information simultaneouslyExpression dataSequence dataBoth methods are trying to answer the same question:What genes are co-regulated by the same transcription factors?Logically, it is advantageous to combine data.Expression data provides gene expression with respect to time.Sequence data provides hints whether a TF binds to a gene.By combining data, it should be possible to determine whether a transcription factor regulates a gene AND under what context.Probabilistic Relational Model (PRM)PRM is an organizational tool.Separate expression and sequence data.Method of relating expression and sequence data.GeneSequence dataLocalization dataExperiment & ExpressionExpression dataR(t) = Hidden Variablewhether a transcription factor regulates a gene6Understanding the gene objectsTranscription Factor tImplicitEnumerated and known at beginning. t1…tmDescribed by a Position Specific Scoring Matrix (PSSM)Gene object giContains a promoter region, divided into individual bases S1..Sn.Contains a Regulates variable R(tj)Whether a TF tjregulates a geneR(tj)value for every TFMay contain Localization variable L(tj)for a TF tjOrganizing expression dataDNA array aEach array has multiple clusters, called ACluster.Each array comes from a specific phase of the cell cycle, denoted by PhaseSpecific to data setExpression eContains expression levels of a gene cluster.Level isthe expression level of under a specific context.Array describes the parent experiment.Genecorrelates the gene and expression level.7Expression ModelExpression level depends on three factorsGene clusterCell-cycle phaseTF regulation, R(t)Dependency is modeled as tree-structured conditional distributionsContext specific effects, i.e. phasesCombinatorial interactions, such as not R(Swi6) and not R(Fkh2)Expression levels are shown at leavesUnivariate Gaussian distributionsUnderstanding the Expression ModelFor all genes cluster 3, when they not in the S phase, and are not bound by TF Swi6 nor TF Fkh1 have an expression level centered at 0.2.8Position Specific Scoring Matrix (PSSM)Binding sites are “degenerate”Specific but not absolutely so.Some mutations in the motif are acceptable while others are not.Some position in the motif are highly conserved.In diagram at left, height of letter represents degree of conservation.PSSM models acceptable TF binding sites.Each position is represented by a probability of being A,C,G, or T.PSSM is a 4xN matrix, where N is the length of the motif.L(t) is noisy evidence concerning R(t)Localization data is labeled as g.L(t).Experimental data gives a p-value for each L(t).If R(t) is true, we want L(t) to be small.This means we have high confidence as L(t) is a p-valueIf R(t) is false, we want L(t) to be largeData is due to background noise, we have low confidence.We assume the probability distribution function is:p is the experimental p-valuew is an arbitrary weighting factorc is a normalizing constant equal to In otherwords, L(t) is a noisy sensor, used only as “guidance” for R(t).wpcetruetRptLpdf−=== ))(|)((wewc−−=19Expression Model LearningTwo main goals of the expression modelLearn distributions of expression levelsLearn qualitative aspects of the tree structureTree StructureScoring FunctionData Set, Tree Structure,Gaussian distribution parameters, Greedy local searchTrim operation removes nodesSplit operation adds nodesSequence ModelIn essence, a


View Full Document

UCSD CSE 254 - From Promoter Sequence to Expression

Download From Promoter Sequence to Expression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view From Promoter Sequence to Expression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view From Promoter Sequence to Expression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?