DOC PREVIEW
Chromosome Identification Using Hidden Markov Models

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chromosome Identification Using Hidden MarkovModels: Comparison with Neural Networks, SingularValue Decomposition, Principal Components Analysis,and Fisher Discriminant AnalysisJohn M. Conroy, Tamara G. Kolda, Dianne P. O’Leary, and Timothy J. O’LearyCenter for Computing Sciences (JMC), Institute for Defense Analyses, Bowie, Maryland; and ComputationalSciences and Mathematics Research Department (TGK), Sandia National Laboratories, Livermore, California; andComputer Science Department and Institute for Advanced Computer Studies (DPO), University of Maryland,College Park, Maryland; and Department of Cellular Pathology (TJO), Armed Forces Institute of Pathology,Washington, DCSUMMARY:The analysis of G-banded chromosomes remains the most important tool available to the clinical cytogeneticist.The analysis is laborious when performed manually, and the utility of automated chromosome identification algorithms has beenlimited by the fact that classification accuracy of these methods seldom exceeds about 80% in routine practice. In this study, weuse four new approaches to automated chromosome identification — singular value decomposition (SVD), principal componentsanalysis (PCA), Fisher discriminant analysis (FDA), and hidden Markov models (HMM) — to classify three well-known chromo-some data sets (Philadelphia, Edinburgh, and Copenhagen), comparing these approaches with the use of neural networks (NN).We show that the HMM is a particularly robust approach to identification that attains classification accuracies of up to 97% fornormal chromosomes and retains classification accuracies of up to 95% when chromosome telomeres are truncated or smallportions of the chromosome are inverted. This represents a substantial improvement of the classification accuracy for normalchromosomes, and a doubling in classification accuracy for truncated chromosomes and those with inversions, as comparedwith NN-based methods. HMMs thus appear to be a promising approach for the automated identification of both normal andabnormal G-banded chromosomes. (Lab Invest 2000, 80:1629 –1641).Although the use of spectral karyotyping (Macvilleet al, 1997; Schrock et al, 1997; Veldman et al,1997) is redefining the role of G-banding in chromo-some analysis, analysis of chromosome banding pat-terns remains a cornerstone of karyotypic analysisboth for routine diagnosis and for application in suchtechniques as comparative genomic hybridization(Piper et al, 1995). Chromosome classification andanalysis is aided by the use of automated karyotypingsystems that yield a preliminary classification for eachchromosome, which may be corrected by hand asnecessary. Automated karyotyping relies upon acqui-sition of a digital image, followed by extraction ofchromosome features. Two general approaches tofeature extraction are employed: gray level encodingof each chromosome and more complex extraction ofdistinctive features. These features may then be usedin an algorithm that assigns the chromosome to one of24 classes (autosomes 1–22, X, and Y). A variety ofsuch algorithms has been proposed, based uponapproaches such as Bayesian analysis (Lundsteen etal, 1986), Markov networks (Granum and Thomason,1990; Guthrie et al, 1993), neural networks (NN)(Beksac et al, 1996; Errington and Graham, 1993;Graham et al, 1992; Jennings and Graham, 1993;Korning, 1995; Leon et al, 1996; Malet et al, 1992;Sweeney et al, 1994; Sweeney et al, 1997), and simplefeature matching (Piper and Granum, 1989). The re-ported classification accuracy varies surprisingly littleby approach. Most methods achieve approximately90% correct classification of the Copenhagen chro-mosome data set; commercial implementations typi-cally achieve approximately 80% correct classificationin routine use.Automated chromosome classification entails sev-eral steps. First, an image segmentation step is usedto create distinct images of each chromosome in ametaphase. Then, salient features of the chromosomeimage are extracted. Typically, gray level encoding isemployed to represent the chromosome by a vector ofReceived May 16, 2000.The work of Dianne O’Leary was supported by NSF Grant CCR 97-32022. The work of Tamara Kolda was supported by the Applied Mathe-matical Sciences Research Program, Office of Energy Research, U.S. De-partment of Energy, under contracts DE-AC05-96OR22464 withLockheed Martin Energy Research Corporation, and DE-AC04-94AL85000 with Sandia Corporation.Address reprint requests to: Timothy J. O’Leary, Department of CellularPathology, Armed Forces Institute of Pathology, 14th Street and AlaskaAvenue, NW, Washington, DC 20306-6000. Fax: 202 782 7623;E-mail: [email protected]/00/8011-1629$03.00/0LABORATORY INVESTIGATION Vol. 80, No. 11, p. 1629, 2000Copyright © 2000 by The United States and Canadian Academy of Pathology, Inc. Printed in U.S.A.Laboratory Investigation • November 2000 • Volume 80 • Number 111629gray level values, which are obtained by sampling atevenly spaced intervals along the chromosome’s me-dial axis. (See, for example, Errington and Graham,1993.) Different vectors may contain a different num-ber of samples, so vectors are typically stretched orcompressed to a fixed number of entries via constantinterpolation or downsampling. Because variations inlighting can cause the gray scale measurements tovary, all stretched vectors are normalized to Euclideanmagnitude 1. Figure 1 illustrates this stretching, andFigure 2 illustrates the variations in measured valuesfor chromosomes having the same identity. Chromo-some 1 is usually the easiest to identify; it is physicallythe longest chromosome, and the banding pattern isparticularly distinctive. The Y chromosome is amongthe hardest; it is physically relatively short, and thebanding pattern is often rather indistinct.Feature extraction provides an alternative to graylevel encoding. Piper and Granum (1989), for example,have proposed the use of 30 classification parametersderived from automated measurements. These fea-tures include the following:● physical length of the chromosome● location of the centromere (a narrowed region ofthe chromosome)● the area of the chromosome● the perimeter of the convex hull of thechromosome● the number of bands● inner products of the gray level values with variousbasis vectors resembling a set of wavelet “hat”functions.In summary, the problem is to assign an identity(1–22, X, or Y) to a chromosome, given a vectorcontaining its gray level


Chromosome Identification Using Hidden Markov Models

Download Chromosome Identification Using Hidden Markov Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chromosome Identification Using Hidden Markov Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chromosome Identification Using Hidden Markov Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?