View Full Document


Unformatted text preview:

Liquid Chromatography Mass Spectrometry Based Proteomics Biological and Technological Aspects Yuliya V Karpievitch Ashoka D Polpitiya Gordon A Anderson Richard D Smith Alan R Dabney Pacific Northwest National Laboratory Richland WA 99352 Texas A M University Department of Statistics College Station TX 77843 Corresponding author Abstract Mass spectrometry based proteomics has become the tool of choice for identifying and quantifying the proteome of an organism Though recent years have seen a tremendous improvement in instrument performance and the computational tools used significant challenges remain and there are many opportunities for statisticians to make important contributions In the most widely used bottomup approach to proteomics complex mixtures of proteins are first subjected to enzymatic cleavage the resulting peptide products are separated based on chemical or physical properties and analyzed using a mass spectrometer The two fundamental challenges in the analysis of bottom up MS based proteomics are 1 Identifying the proteins that are present in a sample and 2 Quantifying the abundance levels of the identified proteins Both of these challenges require knowledge of the biological and technological context that gives rise to observed data as well as the application of sound statistical principles for estimation and inference We present an overview of bottom up proteomics and outline the key statistical issues that arise in protein identification and quantification 1 Introduction The 1990s marked the emergence of genome sequencing and deoxyribonucleic acid DNA microarray technologies giving rise to the omics era of research Proteomics is the logical continuation of the widely used transcriptional profiling methodology Wilkins et al 1996 Proteomics involves the study of multiprotein systems in an organism the complete protein complement of its genome with the aim of understanding distinct proteins and their roles as a part of a larger networked system This is a vital component of modern systems biology approaches where the goal is to characterize the system behavior rather than the behavior of a single component Measuring messenger ribonucleic acid mRNA levels as in DNA microarrays alone does not necessarily tell us much about the levels of corresponding proteins in a cell and their regulatory behavior since proteins are subjected to many post translational modifications and other modifications by environmental agents Proteins are responsible for the structure energy production communications movements and division of all cells and are thus extremely important to a comprehensive understanding of systems biology While genome wide microarrays are ubiquitous proteins do not share the same hybridization properties of nucleic acids In particular interrogating many proteins at the same time is difficult due to the need for having an antibody developed for each protein as well as the different binding conditions optimal for the proteins to bind to their corresponding antibodies Protein microarrays are thus not widely used for whole proteome screening Two dimensional gel electrophoresis 2 DE can be used in differential expression studies by comparing staining patterns of different gels Quantitation of proteins using 2 DE has been limited due to the lack of robust and reproducible methods for detecting matching and quantifying spots as well as some physical properties of the gels Ong and Mann 2005 Although efforts have been made to provide methods for spot detection and quantification Morris et al 2008 2DE is not currently the most widely used technology for protein quantitation in complex mixtures Meanwhile mass spectrometry MS has proven effective for the characterization of proteins and for the analysis of complex protein samples Nesvizhskii et al 2007 Several MS methods for interrogating the proteome have been developed Surface Enhanced Laser Desorption Ionization SELDI Tang et al 2004 Matrix Assisted Laser Desorption Ionization MALDI Karas et al 1987 coupled with time of flight TOF or other instruments and gas chromatography MS GC MS or liquid chromatography MS LC MS SELDI and MALDI do not incorporate on line separation during MS analysis thus separation of complex mixtures needs to be performed beforehand MALDI is widely used in tissue imaging Caprioli et al 1997 Cornett et al 2007 Stoeckli et al 2001 GS MS or LCMS allow for online separation of complex samples and thus are much more widely used in highthroughput quantitative proteomics Here we focus on the most widely used bottom up approach to MS based proteomics LC MS In LC MS based proteomics complex mixtures of proteins are first subjected to enzymatic cleavage then the resulting peptide products are analyzed using a mass spectrometer this is in contrast to topdown proteomics which deals with intact proteins and is limited to simple protein mixtures Han et al 2008 A standard bottom up experiment has the following key steps Figures 1 3 a extraction of proteins from a sample b fractionation to remove contaminants and proteins that are not of interest especially high abundance house keeping proteins that are not usually indicative of the disease being studied c digestion of proteins into peptides d post digestion separations to obtain a more homogeneous mixture of peptides and e analysis by MS The two fundamental challenges in the analysis of MS based proteomics data are then the identification of the proteins present in a sample and the quantification of the abundance levels of those proteins There are a host of informatics tasks associated with each of these challenges Figures 4 6 The first step in protein identification is the identification of the constituent peptides This is carried out by comparing observed features to entries in a database of theoretical or previously identified peptides Figure 5 In tandem mass spectrometry denoted by MS MS a parent ion possibly corresponding to a peptide is selected in MS1 for further fragmentation in MS2 Resulting fragmentation spectra are compared to fragmentation spectra in a database using software like SEQUEST Eng et al 1994 Mascot Perkins et al 1999 or X Tandem Alternatively high resolution MS instruments can be used to obtain extremely accurate mass measurements and these can be compared to mass measurements in a database of peptides previously identified with high confidence via MS MS Pasa Tolic et al 2004 using the same software tools above In either case a statistical

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...

Join to view AOAS0905-008R2A0 and access 3M+ class-specific study document.

We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view AOAS0905-008R2A0 and access 3M+ class-specific study document.


By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?