DOC PREVIEW
Mass Spectrometry Data

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Identification ofN-Glycan Serum Markers Associated withHepatocellular Carcinoma from Mass Spectrometry DataZhiqun Tang,†Rency S. Varghese,†Slavka Bekesova,†Christopher A. Loffredo,†Mohamed Abdul Hamid,‡Zuzana Kyselova,§Yehia Mechref,§Milos V. Novotny,§Radoslav Goldman,†and Habtom W. Ressom*,†Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington,D.C. 20057, Minia University and Viral Hepatitis Research Laboratory, NHTMRI, Cairo, Egypt, and NationalCenter for Glycomics and Glycoproteomics, Department of Chemistry, Bloomington, IndianaReceived May 5, 2009Glycocylation represents the most complex and widespread post-translational modifications in humanproteins. The variation of glycosylation is closely related to oncogenic transformation. Therefore,profiling of glycans detached from proteins is a promising strategy to identify biomarkers for cancerdetection. This study identified candidate glycan biomarkers associated with hepatocellular carcinomaby mass spectrometry. Specifically, mass spectrometry data were analyzed with a peak selectionprocedure which incorporates multiple random sampling strategies with recursive feature selectionbased on support vector machines. Ten peak sets were obtained from different combinations of samples.Seven peaks were shared by each of the 10 peaksets, in which 7-12 peaks were selected, indicating58-100% of peaks were shared by the 10 peaksets. Support vector machines and hierarchical clusteringmethod were used to evaluate the performance of the peaksets. The predictive performance of theseven peaks was further evaluated by using 19 newly generated MALDI-TOF spectra. Glycan structuresfor four glycans of the seven peaks were determined. Literature search indicated that the structures ofthe four glycans could be found in some cancer-related glycoproteins. The method of this study issignificant in deriving consistent, accurate, and biological significant glycan marker candidates forhepatocellular carcinoma diagnosis.Keywords: hepatocellular carcinoma•glycan biomarker•biomarker discovery•mass spectrometry•support vector machine•recursive feature selection1. IntroductionAs the most complex and widespread post-translationalmodification (PTM),1glycosylation plays crucial roles duringdifferent oncogenetic processes.2-4Many important tumormarkers, such as CEA,5CA125,6and PSA,7-9are glycoproteinswith altered glycan profiles in cancer. As one of the mostcommon types of malignant tumor, hepatocellular carcinoma(HCC) is difficult to diagnose due to the highly heterogenicnature of the disease and has a low survival rate oncediagnosed.10The popular method to diagnose HCC is tomeasure a serum glycoprotein marker alpha-fetoprotein (AFP).However, this marker has limited sensitivity (41-65%).11Thissensitivity could be improved by measuring several highlyspecific glycoprotein markers.11Therefore, there is an urgentneed to discover additional markers associated with HCC forthe early diagnosis.Currently, glycan marker discovery by analyzing mass spec-trometry (MS) data presents great potential to identify a panelof biomarkers relevant for early diagnosis of heterogenicdiseases with improved accuracies.12-15However, this ap-proach is characterized by high dimensionality and complexpatterns with a substantial amount of noise arising frommeasurement deviation, disease heterogeneity, and biologicalvariability. A robust computational method is required toidentify markers relevant to a particular problem from the MSdata sets. Machine learning methods, especially neural net-works and support vector machines (SVM), provide potentialapplication in the marker selection from MS data. Usingshallow feature selection method and Bayesian neural network(BNN) classifier, 99% sensitivity and 98% specificity werereached on the SELDI-TOF MS data to identify ovarian cancerusing 2-fold cross validation (CV).12Information gain and SVMclassifier were used for prion disease diagnosis from MALDI-FTMS data and yielded 72% sensitivity and 73% specificity byleave-one-out cross validation (LOOCV).13t-test and severalclassification methods such as discriminant analyses, k-nearestneighbor analysis, and SVM were used for identifying ovariancancer cases from normal patients, and SVM have resulted inthe lowest error rates.15These methods utilized a filter strategy* To whom correspondence should be addressed. Habtom W. Ressom,Department of Oncology, Lombardi Comprehensive Cancer Center, Geor-getown University, Suite 173, Building D, 4000 Reservoir Road NW, Wash-ington, D.C. 20057. Phone: 202-687-2283. Fax: 202-687-0227. E-mail:[email protected].†Georgetown University.‡Minia University and Viral Hepatitis Research Laboratory.§National Center for Glycomics and Glycoproteomics.104 Journal of Proteome Research 2010, 9, 104–112 10.1021/pr900397n  2010 American Chemical SocietyPublished on Web 09/18/2009which identifies relevant peaks independent of the classifiers.In the feature selection from machine learning classifiers,another popular strategy is wrapper method, where classifiersthat built from different peak subsets evaluate the goodnessof peak subsets by such criteria as CV error rate or accuracyfrom the validation data set; the wrapper method presentedgood performance and a stable feature subset when appliedin microarray gene discovery16and can be extended into MSpeak selection. Mahadevan et al. built a feature selectionmethod known as recursive feature elimination-support vectormachine (RFE-SVM) to separate pneumonia from healthypeople by mass spectrometry. They obtained an overall ac-curacy of 84-96% by 4-fold CV and 87-97% by LOOCV,providing much better predictive performance when comparedwith multivariate analysis methods.17In our previous studies, we utilized ant colony optimizationcombined with support vector machines (ACO-SVM) peakselection to identify biomarkers for HCC diagnosis14,18-21usingmatrix-assisted laser desorption/ionization time-of-flight massspectrometry (MALDI-TOF MS). Six peptide markers wereidentified that yielded 100% sensitivity and 91% specificity inan independent test set,14,21and two of them were found tobe fragments of Complement C3 and C4.14,21Six to 10 glycanmarkers were found to be associated with HCC with 87-93%sensitivity and 89-100% specificity.14,18-20Two glycans withpermethylated molecular weight at 2040 and 4502 were identi-fied in most of these studies.14,19,20In this study, we apply


Mass Spectrometry Data

Download Mass Spectrometry Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mass Spectrometry Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mass Spectrometry Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?