Protein Sequencing and IdentificationMotivationHistoryPeptide FragmentationBreaking Protein into Peptides and Peptides into Fragment IonsN- and C-terminal PeptidesTerminal peptides and ion typesSlide 8Slide 9Slide 10Slide 11Peptide sequencing problemSlide 13Theoretical spectrumMatch between Spectra and the Shared Peak CountPeptide Sequencing ProblemVertices of Spectrum GraphEdges of Spectrum GraphPathsPath Scorep(P, s)Slide 22Protein Sequencing and IdentificationMotivation•Want to know which proteins are present in the cell•Protein identification: Given a protein sample, does it match some protein in a database ?•Protein sequencing: No database. Directly find the sequence of the protein sample.History•First protein sequencing done by Nobel laureate Fred Sanger–broke the insulin protein into pieces (“peptides”)–sequenced each resulting fragment separately–reconstructed entire insulin sequence by fragment assemblyPeptide Fragmentation•Peptides tend to fragment along the backbone.•Fragments can also loose neutral chemical groups like NH3 and H2O.H...-HN-CH-CO . . . NH-CH-CO-NH-CH-CO-…OHRi-1RiRi+1H+Prefix Fragment Suffix FragmentCollision Induced DissociationBreaking Protein into Peptides and Peptides into Fragment Ions•Proteases, e.g. trypsin, break protein into peptides.•A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece.•Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones.•Mass Spectrometer measure mass/charge ratio of an ion.N- and C-terminal PeptidesN-terminal peptidesC-terminal peptidesTerminal peptides and ion typesPeptideMass (D) 57 + 97 + 147 + 114 = 415PeptideMass (D) 57 + 97 + 147 + 114 – 18 = 397withoutN- and C-terminal PeptidesN-terminal peptidesC-terminal peptides415 486 30115457 71185332429N- and C-terminal PeptidesN-terminal peptidesC-terminal peptides415 486 30115457 71185332429N- and C-terminal Peptides415 486 30115457 71185332429N- and C-terminal Peptides415 486 30115457 71185332429Reconstruct peptide from the set of masses of fragment ions (mass-spectrum)Peptide sequencing problem•A = {a1, a2, … a20} : set of amino acids, each with mass m(ai)•Peptide P = p1…pn is a sequence of amino acids, with parental mass m(P) = ∑im(pi)•Partial N-terminal peptide Pi = p1…pi with mass mi•Mass spectrum has the masses of all partial N-terminal peptides, determined experimentally –Ignoring C-terminal peptides for simplicityPeptide sequencing problem•A peptide may lose one or more smaller parts of itself (such as a water or an ammonia)•The Mass spectrometer measures mass of fragments that may not be the entire fragment Pi.•Assume k different ion losses possible.•Possible losses of mass: ∆ = {∂1, … ∂k}Theoretical spectrum•The theoretical spectum T(P) of a peptide P can be calculated by subtracting all possible mass losses ∂1…∂k from masses of all partial peptides of P•Each partial peptide generates k masses in the theoretical spectrumMatch between Spectra and the Shared Peak Count•The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC)•In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks•Match between experimental and theoretical spectra is defined similarlyPeptide Sequencing ProblemGoal: Find a peptide with maximal match between an experimental and theoretical spectrum.Input:–S: experimental spectrum–Δ: set of possible ion types–m: parent massOutput: –P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the bestVertices of Spectrum Graph•Masses of potential N-terminal peptides•Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk}•Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides•Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}Edges of Spectrum Graph•Two vertices with mass difference corresponding to an amino acid A:–Connect with an edge labeled by A•Gap edges for di- and tri-peptidesPaths•Path in the labeled graph spell out amino acid sequences•There are many paths, how to find the correct one?•We need scoring to evaluate pathsPath Score•p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq}•p(P, s) = the probability that peptide P generates a peak s•Scoring = computing probabilities•p(P,S) = πsєS p(P, s)p(P, s)•What is the probability that peptide P will produce a fragment mass s ?•Each ion type ∂i has some probability of occurring, written as qi•A peptide has all k peaks with probability •and no peaks with probability•Suppose that a partial peptide Pi produces ions ∂1…∂l and does not produce ions ∂l+1…∂k ∏=kiiq1∏=−kiiq1)1(p(P, s)•Then p(P,s) =•A peptide also produces a ``random noise'' with uniform probability qR in any position.• € qii=1l∏ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟(1− qi)i= l +1k∏ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟€ qiqRi=1l∏ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟(1− qi)1− qRi= l +1k∏ ⎛ ⎝ ⎜ ⎞ ⎠
View Full Document