DOC PREVIEW
UCSD CSE 182 - Protein Sequence Analysis

This preview shows page 1-2-3-4-5-6-45-46-47-48-49-50-51-92-93-94-95-96-97 out of 97 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 97 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE182-L7Domain analysis via profilesPsi-BLAST ideaPsi-BLAST speedRepresentation 3: HMMsQUIZ!The generative modelA simple Profile HMMProfile HMMs can handle gapsExampleProfile HMMsFormallyTwo solutionsViterbi Algorithm for HMMProfile HMM membershipSummaryProtein Domain databasesGene FindingGeneEukaryotic gene structureTranslationSlide 22Gene FeaturesGene identificationExpressed Sequence TagsEST sequencingEST SequencingProjectProject Extra creditComputational Gene FindingGene Finding: The 1st generationCoding versus Non-codingGeneralizingCoding versus non-codingCoding vs. non-coding regionsCoding differential for 380 genesOther SignalsCoding region can be detectedSlide 39The second generation of Gene findingPowerPoint PresentationHMMs and gene findingThe Viterbi AlgorithmSlide 44An HMM for Gene structureGeneralized HMMs, and other refinementsLength distributions of Introns & ExonsGeneralized HMM for gene findingForward algorithm for gene findingHMMs and Gene findingDNA SignalsSplice signalsPWMsMDDSlide 55MDD for Donor sitesDe novo Gene prediction: SumaryHow many genes do we have?Alternative splicingComparative methodsSlide 61Slide 62Slide 63Slide 64A geometric approachChoosing between Introns and ExonsSlide 67Slide 68Slide 69Combining SignalsSlide 71Combining signals using D.P.Gene finding reformulatedSlide 74Optimum labeling using D.P. (Viterbi)Optimum parse of the geneSlide 77Slide 78Slide 79Slide 80Slide 81Slide 82Slide 83Slide 84Slide 85Slide 86Slide 87Slide 88Slide 89Slide 90Slide 91Gene prediction: SummarySlide 93Slide 94Slide 95Comparative gene finding toolsDatabasesCSE182-L7Protein Sequence Analysis using HMMs, Gene FindingDomain analysis via profiles•Given a database of profiles of known domains/families, we can query our sequence against each of them, and choose the high scoring ones to functionally characterize our sequences.•What if the sequence matches some other sequences weakly (using BLAST), but does not match any known profile?Psi-BLAST idea• Iterate:–Find homologs using Blast on query–Discard very similar homologs–Align, make a profile, search with profile.–Why is this more sensitive?Seq Db--In the next iteration, the red sequence will be thrown out.--It matches the query in non-essential residuesPsi-BLAST speed•Two time consuming steps.1. Multiple alignment of homologs2. Searching with Profiles.1. Does the keyword search idea work?•Multiple alignment:–Use ungapped multiple alignments only •Pigeonhole principle again: –If profile of length m must score >= T–Then, a sub-profile of length l must score >= lT|/m–Generate all l-mers that score at least lT|/M–Search using an automatonRepresentation 3: HMMs•Building good profiles relies upon good alignments.–Difficult if there are gaps in the alignment.–Psi-BLAST/BLOCKS etc. work with gapless alignments. •An HMM representation of Profiles helps put the alignment construction/membership query in a uniform framework.•Also allows for position specific gap scoring.VQUIZ!•Question:• your ‘friend’ likes to gamble. •He tosses a coin: HEADS, he gives you a dollar. TAILS, you give him a dollar.•Usually, he uses a fair coin, but ‘once in a while’, he uses a loaded coin. •Can you say what fraction of the times he loads the coin?The generative model•Think of each column in the alignment as generating a distribution.•For each column, build a node that outputs a residue with the appropriate distribution 0.710.14Pr[F]=0.71Pr[Y]=0.14A simple Profile HMM•Connect nodes for each column into a chain. Thie chain generates random sequences.•What is the probability of generating FKVVGQVILD?•In this representation–Prob [New sequence S belongs to a family]= Prob[HMM generates sequence S]•What is the difference with Profiles?Profile HMMs can handle gaps•The match states are the same as on the previous page. •Insertion and deletion states help introduce gaps.•A sequence may be generated using different paths.Example•Probability [ALIL] is part of the family?•Note that multiple paths can generate this sequence. –M1I1M2M3–M1M2I2M3•In order to compute the probabilities, we must assign probabilities of transition between statesA L - LA I V LA I - LProfile HMMs•Directed Automaton M with nodes and edges. –Nodes emit symbols according to ‘emission probabilities’–Transition from node to node is guided by ‘transition probabilities’ •Joint probability of seeing a sequence S, and path P–Pr[S,P|M] = Pr[S|P,M] Pr[P|M]–Pr[ALIL AND M1I1M2M3| M] = Pr[ALIL| M1I1M2M3,M] Pr[M1I1M2M3| M]•Pr[ALIL | M] = ?Formally•The emitted sequence is S=S1S2…Sm•The path traversed id P1P2P3..•ej(s) = emission probability of symbol s in state Pj•Transition probability T[j,k] : Probability of transitioning from state j to state k.•Pr(P,S|M ) = eP1(S1) T[P1,P2] eP2(S2) ……•What is Pr(S|M)?Two solutions•An unknown (hidden) path is traversed to produce (emit) the sequence S.•The probability that M emits S can be either –The sum over the joint probabilities over all paths.•Pr(S|M) = ∑P Pr(S,P|M)–OR, it is the probability of the most likely path•Pr(S|M) = maxP Pr(S,P|M)•Both are appropriate ways to model, and have similar algorithms to solve them.Viterbi Algorithm for HMM•Let Pmax(i,j|M) be the probability of the most likely solution that emits S1…Si, and ends in state j (is it sufficient to compute this?)•Pmax(i,j|M) = max k Pmax(i-1,k) T[k,j] ej(Si) (Viterbi)•Psum(i,j|M) = ∑ k (Psum(i-1,k) T[k,j]) ej(Si)A L - LA I V LA I - LProfile HMM membership•We can use the Viterbi/Sum algorithm to compute the probability that the sequence belongs to the family.•Backtracking can be used to get the path, which allows us to give an alignmentA L - LA I V LA I - LPath: M1 M2 I2 M3 A L I LSummary•HMMs allow us to model position specific gap penalties, and allow for automated training to get a good alignment.•Patterns/Profiles/HMMs allow us to represent families and foucs on key residues•Each has its advantages and disadvantages, and needs special algorithms to query efficiently.Protein Domain databases•A number of databases capture proteins (domains) using various representations•Each domain is also associated with structure/function information, parsed from the literature.•Each database has specific query mecahnisms that allow us to compare our seqeunces against them, and assign


View Full Document

UCSD CSE 182 - Protein Sequence Analysis

Download Protein Sequence Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Protein Sequence Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Protein Sequence Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?