DOC PREVIEW
Stanford CS 374 - Regulatory Motif Finding

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Regulatory Motif Finding CS374 Fall 2004 Lecture 17, 11/23/04Lecturer: Samuel Perlman Scribe: Daniel WoodsRegulatory Motif FindingBased on the following papers:1. Liu J, Gupta, Liu X, Mayerhofere, Lawrence “Statistical Models for Biological Sequence Motif Discovery”,Fifth Carnegie Melon Workshop on Case Studies in Bayesian Statistics, held September 2001.2. Blanchette and Tompa “Discovery of Regulatory Elements by a Computational Method for PhylogeneticFoodprinting”, Genome Research, Vol. 12, Issue 5:739-748, 2002.Additional References:3. CS262 Lecture Slides1 Background1.1 Transcription RegulationProteins are generated according to DNA sequences, which are identical in every cell of an organism.However, the same proteins are not found uniformly throughout all cells of an organism. Transcription istherefore not occurring at the same rate for all genes in all cells. Somehow, transcription is being“regulated.”Transcription requires the presence of a specific set of transcriptions factors, which are specific to aparticular transcription region. Transcription factors (if they are found in the cell) bind to regulatoryregions. If all are present and bind appropriately, transcription can begin. Some types of regulatoryregions are:TATA-box – This type is generally found about 30 bp upstream (i.e. neighboring the end wheretranscription begins) from the point where transcription begins.Promoter (proximal) – These are generally found within about 300 bp of the transcription region,upstream.1Regulatory Motif Finding CS374 Fall 2004 Lecture 17, 11/23/04Lecturer: Samuel Perlman Scribe: Daniel WoodsEnhancer (distal) – These regions are found on the order of 1000bp away from the transcription region and(because of the great distance) rely on folding of the DNA strand to bring it near enough to thetranscription region that a transcription factor attached here can cause transcription to begin.These elements form a logic sequence which will only allow transcription to occur under the desiredcircumstances.1.1 MotifsA motif is not a sequence, but rather a classification of sequences. A regulatory motif would be a motifdescribing sequences that perform regulation in a particular way. Motifs can be expressed probabilistically(graphically) or with extended single-letter nucleotide codes.Figure 1 - Left: a graphical probabilistic representation of a motif (taken fromftp://ftp.ncifcrf.gov/pub/delila/hawaii.fig1.ps). Right: A set of extended single-letter nucleotide codes whichcould be used to describe a motif.The left side of Figure 1 shows a graphical probabilistic depiction of a motif, as it would be generated fromthe individual sequences shown above it. Each column of this graph shows a stack of the characters thatSymbol MeaningA AdenineG GuanineC CytosineT ThymineU UracilY pYrimidine (C or T)R puRine (A or G)W “Weak” (A or T)S “Strong” (C or G)K “Keto” (T or G)M “aMino” (C or A)B Not A (C or G or T)D Not B (A or G or T)H Not G (A or C or T)V Not T (A or C or G)X,N,? Unknown (A or C or G or T)2Regulatory Motif Finding CS374 Fall 2004 Lecture 17, 11/23/04Lecturer: Samuel Perlman Scribe: Daniel Woodsappear at this location, with a relative height proportional to the frequency with which they appears in thesequences that the motif describes.The overall height of each column is given by D(i) in the following equation:||122)(log)(||log)(AkkkipipAiDWhere |A| is the length of the alphabet (4 in the case of DNA, but 20 if this model is applied to proteinsequences) and )(ipkis the probability of letter k occurring at location i according to the data. This valueis equal to the amount of information (in bits) represented at that position. In this case, 2 bits of data willbe represented by a position having identical values in all sequences.The scheme represented on the right side of Figure 1 conveys much less information about probabilitiesbecause it treats all positions equally which have the same possible letters, according to the data. Forexample, positions -8 and -4 from the data on the left side of Figure 1 each contain only the letters A, C,and T, however in very different distributions. Using the scheme on the right side would represent bothdistributions with the letter H. This is much more compact, but loses a lot of information. 2 Statistical Method of Finding Regulatory Motifs (First Paper)2.1 Data UsedTwo types of types of data are required for the methodology described in this paper.2.1.1 Identifying Transcription Factor Binding SitesThe sites at which transcription factor binding occurs can be located by a method similar to that used tosequence DNA itself—using a gel shift assay. This is done in two phases—one indicating whichtranscription factors bind to a particular sequence, and one indicating where the binding occurs.In order to determine which transcription factors attach to a particular DNA fragment, they are incubatedtogether and then placed in a gel. The speed at which the fragment moves under a voltage differential willbe affected by whether its mass has been increased by the binding of transcription factors. For a particularfragment, it is then recorded which transcription factors attached to it.For a particular DNA fragment and a particular transcription factor known to bind to it, the next step is todetermine where the binding occurs. Once they have in incubated sufficiently for binding to occur, theDNA is chemically degraded in a way that causes brakeage where transcription factors are not attached.When the resulting strands are subjected to the voltage differential across the gel, locations correspondingto binding sites will be absent in the range of velocities.2.1.2 Evidence from Cross-Species Comparisons and Microarray Analyses.This aspect suggests that genes that behave similarly in response to various treatments are likely to beproduced due to the same transcription factors. It is highly likely, then, that there are regulatory regions,associated with the generating DNA sequences for each gene, that are of the same motif. 3Regulatory Motif Finding CS374 Fall 2004 Lecture 17, 11/23/04Lecturer: Samuel Perlman Scribe: Daniel Woods2.2 Statistical AnalysisThis paper suggests a statistical analysis


View Full Document

Stanford CS 374 - Regulatory Motif Finding

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Regulatory Motif Finding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regulatory Motif Finding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regulatory Motif Finding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?